Integrating ReFT #1657

raven38 · 2024-04-16T09:44:26Z

Paper link: https://arxiv.org/abs/2404.03592

This PR integrates ReFT by creating a new tuner model type. Please see #1654.

Changes

I defined a new class ReFTModel as a subclass of LycorisTuner.
src/peft/tuners/reft/layer.py has the actual operations for ReFT methods.

Status

Discuss specification
Refactor implementation
Test implementation
Add documentation for methods.

BenjaminBossan

Thanks a lot for providing this PR to add REFT to PEFT. This is already looking really good, well done!

I added a few comments, please take a look.

General comments

Please run make style on your code.
We'll have to add some tests. Would you mind looking at the tests that other methods have and add the REFT tests in a similar manner? E.g., check out the tests added by Add LayerNorm tuning model #1301.
Ideally, we have some docs and examples before merging this.
In case that you have copied code from pyreft/pyvene, please add comments with references.
IIUC, the implemented method is LoReFT specifically. This should be mentioned. We may also want to rename the names to LoReFT, except if you think this could be extended in the future to include more ReFT methods.
Speaking of names, in the descriptions, you write "REFT" but the authors spell it as "ReFT", let's adjust this. When it comes to variable names, there is no need to change those (so ReftLayer etc. can stay as is).

Hyper params

The authors describe 4 important hyper parameters in 4.1:

number of prefix positions
number of suffix positions
set of layers to target
whether to tie intervention parameters

IIUC, 3 exists via target_modules. What about 1, 2, and 4?

src/peft/peft_model.py

src/peft/tuners/reft/config.py

BenjaminBossan · 2024-04-16T15:10:33Z

src/peft/tuners/reft/config.py

+    alpha: int = field(default=8, metadata={"help": "Reft alpha"})
+    loc: Optional[Union[List[int], int]] = field(default=None, metadata={"help": "token locations are applied ReFT"})


These two don't have a docstring entry. Also, the description "token locations are applied ReFT" needs to be explained better.

BenjaminBossan · 2024-04-16T15:13:31Z

src/peft/tuners/reft/layer.py

+
+    def create_adapter_parameters(self, adapter_name: str, r: int, shape: Tuple[int, ...]):
+        rotate_layer = LowRankRotateLayer(self.out_features, r)
+        self.reft_R[adapter_name] = torch.nn.utils.parametrizations.orthogonal(rotate_layer, orthogonal_map='cayley')


Is this redundant with torch.nn.init.orthogonal_ above?

torch.nn.init.orthogonal_ ensures that the tensor is orthogonal at once, and torch.nn.utils.parametrizations.orthogonal ensures that the tensor is orthogonal during the training.
I checked the impelementation of torch.nn.utils.register_parametrization. It calls the a parameterization function (orthogonal_) in the initilization process. Thus, torch.nn.init.orthogonal_ is redundant.

So IIUC, we can remove one of the invocations?

Yes, we can remove one of them.

src/peft/tuners/reft/layer.py

BenjaminBossan · 2024-04-16T15:29:28Z

src/peft/tuners/reft/layer.py

+        self._active_adapter = adapter_name
+        self.update_layer(adapter_name, r, alpha, rank_dropout, module_dropout, init_weights, **kwargs)
+
+    def _get_delta_activations(


Is this used anywhere?

This is not used. Since LycorisTuner defines _get_delta_activations as abstractmethod, I define _get_delta_activations to avoid raising errors.

I see. In that case, it should probably not be an abstract method on LycorsLayer to begin with, but let's not bother with this here. I'd just raise a NotImplementedError then, instead of having dead code.

BenjaminBossan · 2024-04-16T15:29:38Z

src/peft/tuners/reft/layer.py

+            adapter_name, r, alpha, rank_dropout, module_dropout, init_weights, use_effective_conv2d, **kwargs
+        )
+
+    def _get_delta_activations(


Is this used anywhere?

This is not used. Since LycorisTuner defines _get_delta_activations as abstractmethod, I define _get_delta_activations to avoid raising errors.

src/peft/tuners/reft/config.py

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

BenjaminBossan · 2024-04-19T09:19:46Z

@raven38 We just merged VeRA, which resulted in a bunch of merge conflicts. They should be very straightforward to resolve, let me know if you have any questions. Also ping me once this is ready for the next review.

raven38 · 2024-05-21T00:53:55Z

I think the code is ready for the review.

HuggingFaceDocBuilderDev · 2024-05-27T12:04:54Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

BenjaminBossan · 2024-05-27T12:10:26Z

@raven38 Thanks for the updates, I'm currently reviewing the PR, but it'll take some time. Meanwhile, could you please run make style on your code so that the CI passes?

raven38 · 2024-05-27T12:26:11Z

I modified codes so that make style passes, but make style still raises the following error. Should it be fixed by this PR?

examples/boft_controlnet/test_controlnet.py:40:1: E402 Module level import not at top of file
Found 1 error.
make: *** [style] Error 1

BenjaminBossan · 2024-05-27T12:45:56Z

Hmm, code quality checks still fail:

ruff src tests examples docs scripts docker
ruff format --check src tests examples docs scripts docker
Would reformat: src/peft/tuners/reft/config.py
Would reformat: src/peft/tuners/reft/layer.py
Would reformat: src/peft/tuners/reft/model.py
Would reformat: tests/test_custom_models.py
Would reformat: tests/test_stablediffusion.py
Would reformat: tests/testing_common.py
6 files would be reformatted, 159 files already formatted

This does not appear to be related to anything in test_controlnet, which is independent of your PR. Is it possible that you use a different ruff version? The CI uses v0.2.2.

BenjaminBossan

Thanks for all the updates. This looks very promising, and I don't think we need any major changes.

Before merging, we should add examples and docs, but we can tackle this after we have streamlined the implementation.

Regarding the intervention locations, your implementation departs from the pyreft implementation. Here, it is basically a hyper-parameter that is passed to the config. That means that for a given model, these values are fixed, right? In pyreft, it is part of the dataset, so it can vary from sample to sample.

I wonder if we should not stick with the pyreft implementation. This means that we can remove this parameter completely from PEFT. Instead, we can check the arguments passed to forward. IIUC, a pyreft dataset would contain an additional key, intervention_locations. If this is passed, we could use it, otherwise intervene on all positions. This would also mean that the PEFT implementation could use the datasets from pyreft, right? Maybe I'm missing something and there is a disadvantage in my suggestion, please LMK.

Moreover, in the paper, the authors discuss a variant with tied weights. AFAICT, there is currently no option to configure tied vs untied. Would it be possible to implement this?

BenjaminBossan · 2024-05-27T12:47:00Z

src/peft/tuners/reft/__init__.py

@@ -0,0 +1,20 @@
+# Copyright 2023-present the HuggingFace Inc. team.


Suggested change

# Copyright 2023-present the HuggingFace Inc. team.

# Copyright 2024-present the HuggingFace Inc. team.

Same for the other newly created modules.

BenjaminBossan · 2024-05-27T12:51:38Z

src/peft/tuners/reft/layer.py

+
+
+def parse_positions(positions: str):
+    # Code borrow from https://github.com/stanfordnlp/pyreft/pyreft/dataset.py


No shame in saying this is "copied", unless you plan to give the code back ;-)

Also, let's add a perma-link: https://github.com/stanfordnlp/pyreft/blob/432dc64d01ff80002c6e83bfdaeb55710bf6a80a/pyreft/dataset.py#L53

BenjaminBossan · 2024-05-27T12:52:36Z

src/peft/tuners/reft/layer.py

+        dropout: float,
+        **kwargs,
+    ) -> None:
+        """Internal function to create loha adapter


Suggested change

"""Internal function to create loha adapter

"""Internal function to create loreft adapter

Some arguments are not described below, most notably loc. Could you please update the docstring?

BenjaminBossan · 2024-05-27T13:00:31Z

src/peft/tuners/reft/layer.py

+        self.reft_A[adapter_name] =  torch.nn.Linear(self.out_features, r)
+
+    def merge(self, safe_merge: bool = False, adapter_names: Optional[list[str]] = None) -> None:
+        pass


So merging is not supported, right? If we just pass when the user calls model.merge(), they might get the impression that it worked. Instead, we should raise a ValueError that merging is not possible with LoReFT. Same for unmerge.

BenjaminBossan · 2024-05-27T13:01:47Z

src/peft/tuners/reft/layer.py

+        self.last_n[adapter_name] = last_n
+        self.alpha[adapter_name] = alpha
+        self.scaling[adapter_name] = alpha / r
+        self.dropout[adapter_name] = torch.nn.Dropout(dropout)


Let's use nn.Identity if dropout is 0.

BenjaminBossan · 2024-05-27T13:09:30Z

src/peft/tuners/reft/layer.py

+                    loc = torch.cat([torch.arange(first_n), torch.arange(result.shape[1]-last_n, result.shape[1])])
+                    selected_results = torch.gather(result, 1, loc)
+                    rotated_base = rotate_layer(selected_results)
+                    offset = torch.matmul((learned_source(selected_results) - rotated_base), rotate_layer.weight)


Same argument as above regarding forward.

BenjaminBossan · 2024-05-27T13:11:06Z

src/peft/tuners/reft/config.py

+        r (`int`): LoReFT rank.
+        alpha (`int`): LoReFT alpha.
+        loc (`Optional[str]`):
+            Token locations are applied LoReFT.


This needs much better explanation, ideally with a few examples. Also, it should be mentioned that this only really makes sense for language models, right?

BenjaminBossan · 2024-05-27T13:21:27Z

src/peft/tuners/reft/layer.py

+
+    def __repr__(self) -> str:
+        rep = super().__repr__()
+        return "reft." + rep


Suggested change

return "reft." + rep

return "loreft." + rep

In case we add more ReFT layers in the future. This is not consistent with the module name, but I think that's not crucial.

BenjaminBossan · 2024-05-27T13:22:56Z

src/peft/tuners/reft/model.py

+
+class LoReftModel(LycorisTuner):
+    """
+    LoReFT model from a pretrained model.


Let's add links to the paper and the reference repo.

BenjaminBossan · 2024-05-27T13:26:02Z

src/peft/tuners/reft/config.py

+        default=False,
+        metadata={"help": "Whether to share the ReFT parameters between blocks or not."},
+    )
+


So right now, LoReFT has no init_weights argument, unlike almost all other adapters. I think it would be useful to have this. I think you mentioned somewhere that by default, LoReFT is not initialized to be the identity transform. In that case, we could set the default value to False (which in PEFT means not identity), but it would still be nice for users to have the option to initialize as identity by passing init_weights=True.

frankaging · 2024-05-30T08:48:05Z

hey! thanks again for the PR (i am one of the authors of the ReFT paper). i skim through the code change, it looks very promising.

two quick questions:

with the current peft config, could we target any residual stream with loreft? if so, what will the config look like?
if i add a loreft adaptor in and given we only intervene on the prompt tokens, during the decoding process, the adaptor will not have an effect right?

thanks!

BenjaminBossan · 2024-05-30T08:59:07Z

@frankaging Thanks a lot for taking a look. Regarding your questions, I'll let @raven38 provide the final answer. Personally, I think 2 is right and I don't quite understand 1, maybe you can give an example?

frankaging · 2024-05-30T18:54:06Z

@frankaging Thanks a lot for taking a look. Regarding your questions, I'll let @raven38 provide the final answer. Personally, I think 2 is right and I don't quite understand 1, maybe you can give an example?

Thanks for the update! For (1), i was wondering: for example, when targeting the attention output matrix for LoRA of a Llama model, we might have a config like this:

from peft import LoraConfig, get_peft_model

peft_config = LoraConfig(
    r=4, lora_alpha=32, target_modules=["o_proj"], layers_to_transform=[15],
    use_rslora=True, lora_dropout=0.05, bias="none", task_type="CAUSAL_LM"
)
model = get_peft_model(model, peft_config)

where target_modules=["o_proj"]. Since for LoReFT, we want to target the residual stream (transformer block/layer output), what should we put for the target_modules in that case? Thanks!

BenjaminBossan · 2024-05-31T12:18:54Z

Since for LoReFT, we want to target the residual stream (transformer block/layer output), what should we put for the target_modules in that case? Thanks!

I see, I think I get what you mean. Good question. I think as is, it would not work. We would have to target the attention module itself (the type depending on what kind of attention is being used), targeting a Linear layer would not work. I think this is what you're getting at, right?

If you have some pointers on how this is implemented in pyreft/pyvene, please share.

frankaging · 2024-05-31T18:09:19Z

Since for LoReFT, we want to target the residual stream (transformer block/layer output), what should we put for the target_modules in that case? Thanks!

I see, I think I get what you mean. Good question. I think as is, it would not work. We would have to target the attention module itself (the type depending on what kind of attention is being used), targeting a Linear layer would not work. I think this is what you're getting at, right?

If you have some pointers on how this is implemented in pyreft/pyvene, please share.

Thanks for your reply. Yes, we need to target the whole block module itself (e.g., model.layers[15]). Will this be doable? If not, I think it is still valuable to have the current form where we target submodules (e.g., attention output or MLP output).

With pyreft or pyvene, we use register_module_forward_hook to place a callback to the existing computation graph, instead of overwriting current modules with the additional adapter modules. This is quite different, so it might not be that helpful here.

raven38 and others added 23 commits April 10, 2024 06:38

Add REFT

26d7b78

Merge branch 'main' of github.com:raven38/peft

3d3f76c

Fix

8c1acfa

Fix typo

eb7aec7

Update

194fb59

Update

7665f3f

Update

80540e9

Update

18f9d15

Update

07f5ee9

Update

c662423

Update

5e9f198

Update

3eb36e6

Update

91031b7

Dont use dropout

e7de201

Update

ddeb867

save reft model

0e32768

Fix

981411d

Update

e614131

Save non contiguous model

8853134

debug

aa16ed2

Remove comment outline

66c5821

add loc

001eae5

Merge remote-tracking branch 'upstream/main'

a458c56

BenjaminBossan requested changes Apr 16, 2024

View reviewed changes

raven38 and others added 5 commits April 18, 2024 15:16

Update src/peft/tuners/reft/config.py

6a4abcc

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

Update src/peft/tuners/reft/layer.py

c2dc90b

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

Fix

87a645a

make style

1bf0de1

Add test for ReFT model

4682c54

raven38 added 18 commits May 21, 2024 05:05

Add constant

e0d845f

fix

93b00d4

peft type

dee7683

fix

d8191d5

fix

59b66e6

fix

018b7ba

fix

751cd72

fix

5ad021c

fix

5462006

fix

3eab3c5

fix

d76598b

fix

fd2ffb7

fixs

3518e7e

fix

f0c99ae

fix

ba3193b

fix

d502762

fix

0343245

remove init_weights argument from LoReFT

968c44c

run make style

b24c40a

BenjaminBossan requested changes May 27, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrating ReFT #1657

Integrating ReFT #1657

raven38 commented Apr 16, 2024 •

edited

BenjaminBossan left a comment

BenjaminBossan Apr 16, 2024

BenjaminBossan Apr 16, 2024

raven38 Apr 18, 2024

BenjaminBossan May 2, 2024

raven38 May 21, 2024

BenjaminBossan Apr 16, 2024

raven38 Apr 18, 2024

BenjaminBossan May 2, 2024

BenjaminBossan Apr 16, 2024

raven38 Apr 18, 2024

BenjaminBossan commented Apr 19, 2024

raven38 commented May 21, 2024

HuggingFaceDocBuilderDev commented May 27, 2024

BenjaminBossan commented May 27, 2024

raven38 commented May 27, 2024

BenjaminBossan commented May 27, 2024

BenjaminBossan left a comment

BenjaminBossan May 27, 2024

BenjaminBossan May 27, 2024

BenjaminBossan May 27, 2024

BenjaminBossan May 27, 2024

BenjaminBossan May 27, 2024

BenjaminBossan May 27, 2024

BenjaminBossan May 27, 2024

BenjaminBossan May 27, 2024

BenjaminBossan May 27, 2024

BenjaminBossan May 27, 2024

frankaging commented May 30, 2024 •

edited

BenjaminBossan commented May 30, 2024

frankaging commented May 30, 2024

BenjaminBossan commented May 31, 2024

frankaging commented May 31, 2024

		alpha: int = field(default=8, metadata={"help": "Reft alpha"})
		loc: Optional[Union[List[int], int]] = field(default=None, metadata={"help": "token locations are applied ReFT"})

		@@ -0,0 +1,20 @@
		# Copyright 2023-present the HuggingFace Inc. team.

	# Copyright 2023-present the HuggingFace Inc. team.
	# Copyright 2024-present the HuggingFace Inc. team.



		def parse_positions(positions: str):
		# Code borrow from https://github.com/stanfordnlp/pyreft/pyreft/dataset.py

	"""Internal function to create loha adapter
	"""Internal function to create loreft adapter

Integrating ReFT #1657

Are you sure you want to change the base?

Integrating ReFT #1657

Conversation

raven38 commented Apr 16, 2024 • edited

Changes

Status

BenjaminBossan left a comment

Choose a reason for hiding this comment

General comments

Hyper params

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BenjaminBossan commented Apr 19, 2024

raven38 commented May 21, 2024

HuggingFaceDocBuilderDev commented May 27, 2024

BenjaminBossan commented May 27, 2024

raven38 commented May 27, 2024

BenjaminBossan commented May 27, 2024

BenjaminBossan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

frankaging commented May 30, 2024 • edited

BenjaminBossan commented May 30, 2024

frankaging commented May 30, 2024

BenjaminBossan commented May 31, 2024

frankaging commented May 31, 2024

raven38 commented Apr 16, 2024 •

edited

frankaging commented May 30, 2024 •

edited