Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CommonSubexpressionElimination takes extremely long to run #768

Open
justhecuke opened this issue Feb 27, 2023 · 4 comments
Open

CommonSubexpressionElimination takes extremely long to run #768

justhecuke opened this issue Feb 27, 2023 · 4 comments

Comments

@justhecuke
Copy link

Hey,

Our team tried to enable CommonSubexpressionEliminationPass but gave up on it when it took ~50m to run by itself.

After looking at the code, the likely cause of the runtime appears to be the tight CSE -> if !chage break -> else Constant Propagation -> Local DCE loop. We haven't actually profiled the thing but, considering that CSE should take only a few seconds to run, it seems likely.

Can I ask why the tight loop was added? Does it provide some important benefit on top of CSE?

For reference, it only appears to reduce our package size by a small amount, but its odd construction has raised questions we don't know the answers to. I suspect the small size reduction may be due to Proguard already performing this optimization, but I don't have hard proof of this quite yet.

@wsanville
Copy link
Contributor

The commit message on ffb4d1e might be helpful.

@justhecuke
Copy link
Author

Somewhat. The reasoning behind the loop is fair enough, I'm just unsure of the benefit and if the runtime is actually what they expect. Considering the number of times they appear to run the pass, I can't imagine that each run takes ~50m for them. Or, if it does, then they must be getting a lot more benefit out of it than I am.

@NTillmann
Copy link
Contributor

We have not found to be a problem in practice. But I could imagine some pathological code patterns where it does become a performance problem. Could you to try change the code of that loop, impose some fixed limit (maybe 10) for the iteration count? It should be semantically fine to break out of the loop early. Let us know if that solves your problem, and what you find as a reasonable limit, and then we can that as an option.

@justhecuke
Copy link
Author

justhecuke commented Mar 9, 2023

I've modified the code to only perform a single iteration with no Constant Propagation or LocalDCE and to replace the WeakTopologicalOrdering with a simple sort of DexMethods in Purity.cpp. Our local build time went down from ~1600s (~27 minutes) to ~200s of which ~1000s was due to the WTO. Adding a second iteration makes the time go up to 250s. Adding Constant Propagation and LocalDCE ups the time to ~700s.

EDIT: Our CI time went down from ~7550s (!) to ~80s in the minimal case (vs. ~200s locally).
EDIT2: Seems like in the CI build, WTO takes ~7450s (init_method_barriers takes ~7500s) and the rest of the pass takes ~100s. Seems like our CI just really doesn't like WTO. Probably memory thrashing of some sort going on.

Strangely, it seems like replacing the WTO with a simple sort has caused a correctness issue and I'm not sure how that happened. Adding in runtime asserts also caused an issue since some of the Pure Methods, such as Long.valueOf, are not actually pure and will fail == checks between old and new locations. I'm going to see if I can't modify the assertion to Object.equals() if the type is a ref.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants