Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Identical Code Folding (ICF) optimization #748

Open
ufarooq opened this issue Dec 14, 2022 · 4 comments
Open

Identical Code Folding (ICF) optimization #748

ufarooq opened this issue Dec 14, 2022 · 4 comments

Comments

@ufarooq
Copy link

ufarooq commented Dec 14, 2022

Is there any existing pass related to Identical Code Folding (ICF) optimization? Or is anyone working on related ideas internally at Meta or outside? I would love to hear about it and contribute if there is any effort. A couple of relevant resources are listed below.

  1. Identical Code Folding https://tetzank.github.io/posts/identical-code-folding/
  2. Safe ICF: Pointer Safe and Unwinding Aware Identical Code Folding in Gold, Sriraman Tallam, Cary Coutant, Xinliang David Li, Chris Demetriou

Bests,
Umar

@NTillmann
Copy link
Contributor

There is no direct equivalent today in Redex. Couple of thoughts on that topic:

  • It would be slightly problematic for performance reasons to substitute a method of one class with a method of another class, as that would (potentially) trigger additional class loads.
  • We cannot easily substitute arbitrary true virtual methods, as their existence in the class hierarchy matters.
  • When in the same class hierarchy, then *ClassMerging, VerticalMerging, and VirtualMerging-Passes together with DedupBlocks can and do successfully effectively deduplicate identical methods.
  • The InstructionSequenceOutliner eliminates code redundancy across methods.

In general, there don't seem to be a ton of truly identical methods when I looked at it. Most are really small, i.e. returning a constant, and then they either play a role in class hierarchy, or should get inlined. Where at the native code level various source-code patterns end up with identical machine code, e.g. fetching a field at some offset, and the v-tables entries could share pointers, this is not the case for Java code, where we have tons of unique field references and not just relative offsets.

While the DEX file format would allow to share "code item" pointers across methods, I've heard that some Android OS version insist that they never shared.

What I'd be interested in exploring some more is some kind of "similar method merging", where we combine similar functions, and pass an additional parameter to disambiguate which particular is requested, and then have some local branching to read one of several possible fields.

I am curious if you've seen particular code patterns that are you interested in eliminating.

@ssj933
Copy link
Contributor

ssj933 commented Dec 14, 2022

Redex currently don't have optimizations that do exact same thing as ICF, but does have optimizations that outline common codes across different methods https://github.com/facebook/redex/tree/main/opt/outliner or eliminate duplicated code within methods https://github.com/facebook/redex/tree/main/opt/cse

@ufarooq
Copy link
Author

ufarooq commented Dec 15, 2022

Thanks for the reply @NTillmann !

  • It would be slightly problematic for performance reasons to substitute a method of one class with a method of another class, as that would (potentially) trigger additional class loads.
  • We cannot easily substitute arbitrary true virtual methods, as their existence in the class hierarchy matters.

I agree, however, if we consider safe-ICF/non-virtual methods, I think ICF might deliver benefits. For example, the MethodDevirtualization pass has provided great benefits for us, all those methods could be one set of targets.

  • The InstructionSequenceOutliner eliminates code redundancy across methods.

Yes, it does, but in our case, we could not outline much of the methods (i.e., < 1k), seems like we have more similar methods than the same instruction sequences (please note that, unfortunately, ClassMerging, VerticalMerging, and VirtualMerging-Passes are not enabled in our config). Which made me think of ICF, and I think we can map it to "similar method merging" idea of yours. Have you tried to explore this direction in past? Otherwise, I can look into it more.

@NTillmann
Copy link
Contributor

I have played with the idea a few times; I have some hacky one-year old commits where I identify methods that are structurally identical, but may differ in their constant/method/field payloads. I can try to undust them if you are interested. But it's probably fine if you start fresh :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants