Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model wrapper with local fine-tuning #169

Closed
wants to merge 23 commits into from

Conversation

zyzhang1130
Copy link
Contributor

@zyzhang1130 zyzhang1130 commented Apr 19, 2024


name: Pull Request
about: Create a pull request

Description

Added features to download models from hugging face model hub/load local hugging face model and fine-tune loaded model with hugging face dataset. Model loading and fine-tuning can happen both at the initialization stage and after the agent has been initialized (see README in agentscope/examples/conversation_with_agent_with_finetuned_model for details). Major changes to the repo include creating the example script conversation_with_agent_with_finetuned_model.py, adding a new model wrapper HuggingFaceWrapper in agentscope/examples/conversation_with_agent_with_finetuned_model/huggingface_model.py, and creating a new agent type Finetune_DialogAgent in 'agentscope/examples/conversation_with_agent_with_finetuned_model/finetune_dialogagent.py'. All changes are done in a new example directory agentscope/examples/conversation_with_agent_with_finetuned_model.

To test, run agentscope/examples/conversation_with_agent_with_finetuned_model/conversation_with_agent_with_finetuned_model.py by following the instructions in the README in the same directory.

Checklist

Please check the following items before code is ready to be reviewed.

  • Code has passed all tests
  • Docstrings have been added/updated in Google Style
  • Documentation has been updated
  • Code is ready for review

…d local hugging face model and finetune loaded model with hugging face dataset

Added features to download models from hugging face model hub/load local hugging face model and finetune loaded model with hugging face dataset. Model loading and fine-tuning can happen both at the initialization stage and after the agent has been initialized (see README in `agentscope/examples/load_finetune_huggingface_model` for details). Major changes to the repo include creating the example script `load_finetune_huggingface_model`, adding a new model wrapper `HuggingFaceWrapper`, and creating a new agent type Finetune_DialogAgent. All changes are done in a new example directory `agentscope/examples/load_finetune_huggingface_model`.
made customized hyperparameters specification available from `model_configs` for fine-tuning at initialization, or through `fine_tune_config` in `Finetune_DialogAgent`'s `fine_tune` method after initialization
Copy link
Collaborator

@ZiTao-Li ZiTao-Li left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the inline comments. Besides, you can run pre-commit run --all-files to check if your coding style satisfies our repo's requirement.

# Decode the generated tokens to a string
generated_text = self.tokenizer.decode(outputs[0][input_ids.shape[1]:], skip_special_tokens=True)

return ModelResponse(text=generated_text, raw={'model_id': self.model_id})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be consistent with the other model wrappers, the 'raw' should contain the generated_text as well. Also, the

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. What was the second point?

@zyzhang1130
Copy link
Contributor Author

Need to disable E203 for pre-commit run. It automatically reformats `examples/load_finetune_huggingface_model/huggingface_model.py' line147 by adding whitespace before ':' and then flags it as an error.

Copy link
Collaborator

@ZiTao-Li ZiTao-Li left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

examples/load_finetune_huggingface_model/README.md Outdated Show resolved Hide resolved
examples/load_finetune_huggingface_model/README.md Outdated Show resolved Hide resolved
examples/load_finetune_huggingface_model/README.md Outdated Show resolved Hide resolved
Copy link
Collaborator

@ZiTao-Li ZiTao-Li left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a better practice, please create new branches (in your forked repo) for the updates on reformatting examples tasks (one branch for one example). Please do not let the changes affects each other.

@ZiTao-Li ZiTao-Li changed the title Updated pull request (all changes made are now in agentscope/examples/load_finetune_huggingface_model) Model wrapper with local fine-tuning May 6, 2024
Copy link
Collaborator

@ZiTao-Li ZiTao-Li left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Description of this PR seems outdated, please update it accordingly.

@zyzhang1130
Copy link
Contributor Author

The Description of this PR seems outdated, please update it accordingly.

Done.

Copy link
Collaborator

@DavdGao DavdGao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see inline comments, and:

  1. It's not clear who is responsible to fine-tune the model, the agent or model wrapper? Since the new model configuration has all required arguments for fine-tuning, why don't we just implement the fine-tuning functionality all in the new model wrapper class (e.g. the constructor function)? So that all other agents in AgentScope can re-use the model wrapper and fine-tune their local models without modifications.

self.model = AutoModelForCausalLM.from_pretrained(
model_id,
token=self.huggingface_token,
device_map="auto",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the model is loaded by device_map as auto, will there be any conflict between the attribute self.device (and device argument in the constructor)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I presumed fine-tuning of LLMs happens on gpus, since it is (as far as I'm aware) infeasible to do on cpu. self.device is meant for device for inference only.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is updated. If the user didn't specify device in model_configs, device_map="auto" by default; otherwise device_map is set to the user-specified device. (see 8685213)

"config_name": "my_custom_model",
# Or another generative model of your choice.
# Needed from loading from Hugging Face.
"model_id": "openlm-research/open_llama_3b_v2",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using the argument model_name_or_path like what transformers library does in from_pretrained function rather than providing two arguments?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed according to suggestion.

],
)

# # alternatively can load `model_configs` from json file
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggested to remove lines 56-59

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to give the user the flexibility to specify the arguments needed in a json file, but also wanted to provide some explanation which cannot be done with json file, so I have two identical model_configs, one in agentscope/examples/conversation_with_agent_with_finetuned_model/conversation_with_agent_with_finetuned_model.py and the other in agentscope/examples/conversation_with_agent_with_finetuned_model/configs/model_configs.json.

)

dialog_agent.load_model(
model_id="openlm-research/open_llama_3b_v2",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The design is a little strange, and the interface should be more concise.

  1. We have already specified the model configuration in agentscope.init and providemodel_config_name in line 66, why we have to repeat to set model_id in line 70 and 74?
  2. Similar issue with dataset name in line 40 and 86.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an optional step where the user can choose to load another model after the agent has been instantiated if needed, depending on their use cases. Added a comment to clarify its purpose.

time_string = now.strftime("%Y-%m-%d_%H-%M-%S")

# Specify the filename
log_name_temp = model.config.name_or_path.split("/")[-1]
Copy link
Collaborator

@DavdGao DavdGao May 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should support user customized saving directory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a new argument 'output_dir for the users to customize saving directory. By default will save to the save example directory if left unspecified.

@zyzhang1130
Copy link
Contributor Author

Please see inline comments, and:

1. It's not clear who is responsible to fine-tune the model, the agent or model wrapper? Since the new model configuration has all required arguments for fine-tuning, why don't we just implement the fine-tuning functionality all in the new model wrapper class (e.g. the constructor function)? So that all other agent in AgentScope can re-use the model wrapper and fine-tune their local models without modifications.

Reloading and fine-tuning models after an agent has been instantiated need to introduce a new method to the agent class. If this is not required, this model wrapper in principle supports loading and fine-tuning for other types of agents.

@DavdGao
Copy link
Collaborator

DavdGao commented May 6, 2024

Please see inline comments, and:

1. It's not clear who is responsible to fine-tune the model, the agent or model wrapper? Since the new model configuration has all required arguments for fine-tuning, why don't we just implement the fine-tuning functionality all in the new model wrapper class (e.g. the constructor function)? So that all other agent in AgentScope can re-use the model wrapper and fine-tune their local models without modifications.

Reloading and fine-tuning models after an agent has been instantiated need to introduce a new method to the agent class. If this is not required, this model wrapper in principle supports loading and fine-tuning for other types of agents.

Yes, and my question is that why we need to reloading and fine-tuning models within the agent? In my view, once we create a huggingface model wrapper, we have already decided to fine-tune the model. So that the training can be finished automatically within the constructor function of the model wrapper as follows?

class HuggingFaceWrapper(ModelWrapperBase):
     def __init__(self, *args, **kwargs): 
         # load model
         self.model = self.load_model(xxx)

         # fine tuning
         self.fine_tune_model()

    # ...

In this way, the agent doesn't need to do anything, and developers only need to set their model configuration.

@zyzhang1130
Copy link
Contributor Author

zyzhang1130 commented May 6, 2024

Please see inline comments, and:

1. It's not clear who is responsible to fine-tune the model, the agent or model wrapper? Since the new model configuration has all required arguments for fine-tuning, why don't we just implement the fine-tuning functionality all in the new model wrapper class (e.g. the constructor function)? So that all other agent in AgentScope can re-use the model wrapper and fine-tune their local models without modifications.

Reloading and fine-tuning models after an agent has been instantiated need to introduce a new method to the agent class. If this is not required, this model wrapper in principle supports loading and fine-tuning for other types of agents.

Yes, and my question is that why we need to reloading and fine-tuning models within the agent? In my view, once we create a huggingface model wrapper, we have already decided to fine-tune the model. So that the training can be finished automatically within the constructor function of the model wrapper as follows?

class HuggingFaceWrapper(ModelWrapperBase):
     def __init__(self, *args, **kwargs): 
         # load model
         self.model = self.load_model(xxx)

         # fine tuning
         self.fine_tune_model()

    # ...

In this way, the agent doesn't need to do anything, and developers only need to set their model configuration.

I have a long-term use case scenario in mind, where there might be new data becoming available after the agent has been deployed (i.e., continual learning setting). In this case, the agent might need to be fine-tuned (potentially multiple times) after deployment. One such example we touched on during the discussion is fine-tuning the agents from their interaction traces in a multi-agent setup. Perhaps I can rename the agent class to better reflect this use case?

@DavdGao
Copy link
Collaborator

DavdGao commented May 6, 2024

Please see inline comments, and:

1. It's not clear who is responsible to fine-tune the model, the agent or model wrapper? Since the new model configuration has all required arguments for fine-tuning, why don't we just implement the fine-tuning functionality all in the new model wrapper class (e.g. the constructor function)? So that all other agent in AgentScope can re-use the model wrapper and fine-tune their local models without modifications.

Reloading and fine-tuning models after an agent has been instantiated need to introduce a new method to the agent class. If this is not required, this model wrapper in principle supports loading and fine-tuning for other types of agents.

Yes, and my question is that why we need to reloading and fine-tuning models within the agent? In my view, once we create a huggingface model wrapper, we have already decided to fine-tune the model. So that the training can be finished automatically within the constructor function of the model wrapper as follows?

class HuggingFaceWrapper(ModelWrapperBase):
     def __init__(self, *args, **kwargs): 
         # load model
         self.model = self.load_model(xxx)

         # fine tuning
         self.fine_tune_model()

    # ...

In this way, the agent doesn't need to do anything, and developers only need to set their model configuration.

I have a long-term use case scenario in mind, where there might be new data becoming available after the agent has been deployed (e.g., continual learning setting). In this case, the agent might need to be fine-tuned (potentially multiple times) after deployment. Perhaps I can rename the agent class to better reflect this use case?

Okay, I understand your consideration. For me, agent.model.fine_tune() in continual learning setting is also acceptable. Whatever, please ensure the other agents can directly use this huggingface model configuration without modifying their code. Others plz see inline comments.

@zyzhang1130 zyzhang1130 deleted the branch modelscope:main May 21, 2024 09:33
@zyzhang1130 zyzhang1130 deleted the main branch May 21, 2024 09:33
@zyzhang1130 zyzhang1130 restored the main branch May 21, 2024 09:34
@zyzhang1130
Copy link
Contributor Author

Further update of this pull request can be found here as it was moved to a new branch #240

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants