Merge pull request #27 from stanford-oval/refactored-codebase-sync

stanford-oval · Apr 23, 2024 · 5d21c39 · 5d21c39
2 parents 72d85c9 + 85b6479
commit 5d21c39
Show file tree

Hide file tree

Showing 28 changed files with 3,571 additions and 9,505 deletions.
diff --git a/README.md b/README.md
@@ -1,18 +1,28 @@
 # STORM: Synthesis of Topic Outlines through Retrieval and Multi-perspective Question Asking
 
-This repository contains the code for our NAACL 2024 paper [Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models](https://arxiv.org/abs/2402.14207) by [Yijia Shao](https://cs.stanford.edu/~shaoyj), [Yucheng Jiang](https://yucheng-jiang.github.io/), Theodore A. Kanell, Peter Xu, [Omar Khattab](https://omarkhattab.com/), and [Monica S. Lam](https://suif.stanford.edu/~lam/).
+<p align="center">
+| <a href="http://storm.genie.stanford.edu"><b>Research preview</b></a> | <a href="https://arxiv.org/abs/2402.14207"><b>Paper</b></a> | <b>Documentation (WIP)</b> |
+
+
+**Latest News** 🔥
+
+- [2024/04] We release refactored version of STORM codebase! We define [interface](src/interface.py) for STORM pipeline and reimplement STORM-wiki (check out [`src/storm_wiki`](src/storm_wiki)) to demonstrate how to instantiate the pipeline. We provide API to support customization of different language models and retrieval/search integration.
 
 ## Overview [(Try STORM now!)](https://storm.genie.stanford.edu/)
+
 <p align="center">
   <img src="assets/overview.png" style="width: 90%; height: auto;">
 </p>
 STORM is a LLM system that writes Wikipedia-like articles from scratch based on Internet search.
 
 While the system cannot produce publication-ready articles that often require a significant number of edits, experienced Wikipedia editors have found it helpful in their pre-writing stage.
 
-**Try out our [live demo](https://storm.genie.stanford.edu/) to see how STORM can help your knowledge exploration journey and please provide feedback to help us improve the system 🙏!**
+**Try out our [live research preview](https://storm.genie.stanford.edu/) to see how STORM can help your knowledge exploration journey and please provide feedback to help us improve the system 🙏!**
+
+
+
+## How STORM works
 
-## Research Before Writing
 STORM breaks down generating long articles with citations into two steps:
 1. **Pre-writing stage**: The system conducts Internet-based research to collect references and generates an outline.
 2. **Writing stage**: The system uses the outline and references to generate the full-length article with citations.
@@ -24,41 +34,128 @@ STORM identifies the core of automating the research process as automatically co
 1. **Perspective-Guided Question Asking**: Given the input topic, STORM discovers different perspectives by surveying existing articles from similar topics and uses them to control the question-asking process.
 2. **Simulated Conversation**: STORM simulates a conversation between a Wikipedia writer and a topic expert grounded in Internet sources to enable the language model to update its understanding of the topic and ask follow-up questions.
 
-Based on the separation of the two stages, STORM is implemented in a highly modular way (see [engine.py](src/engine.py)) using [dspy](https://github.com/stanfordnlp/dspy).
+Based on the separation of the two stages, STORM is implemented in a highly modular way using [dspy](https://github.com/stanfordnlp/dspy).
+
 
 
-## Setup
+## Getting started
 
-**We view STORM as an example of automated knowledge curation. We are working on enhancing our codebase to increase its extensibility. Stay tuned!**
+### 1. Setup
 
-Below, we provide a quick start guide to run STORM locally to reproduce our experiments.
+Below, we provide a quick start guide to run STORM locally.
 
 1. Install the required packages.
    ```shell
    conda create -n storm python=3.11
    conda activate storm
    pip install -r requirements.txt
    ```
-2. Set up OpenAI API key and [You.com search API](https://api.you.com/) key. Create a file `secrets.toml` under the root directory and add the following content:
+2. Set up OpenAI API key (if you want to use OpenAI models to power STORM) and [You.com search API](https://api.you.com/) key. Create a file `secrets.toml` under the root directory and add the following content:
     ```shell
     # Set up OpenAI API key.
-    OPENAI_API_KEY=<your_openai_api_key>
+    OPENAI_API_KEY="your_openai_api_key"
     # If you are using the API service provided by OpenAI, include the following line:
     OPENAI_API_TYPE="openai"
     # If you are using the API service provided by Microsoft Azure, include the following lines:
     OPENAI_API_TYPE="azure"
-    AZURE_API_BASE=<your_azure_api_base_url>
-    AZURE_API_VERSION=<your_azure_api_version>
+    AZURE_API_BASE="your_azure_api_base_url"
+    AZURE_API_VERSION="your_azure_api_version"
     # Set up You.com search API key.
-    YDC_API_KEY=<your_youcom_api_key>
+    YDC_API_KEY="your_youcom_api_key"
     ```
 
-## Paper Experiments
-The FreshWiki dataset used in our experiments can be found in [./FreshWiki](FreshWiki).
 
+### 2. Running STORM-wiki locally
+
+Currently, we provide example scripts under [`examples`](examples) to demonstrate how you can run STORM using different models.
+
+**To run STORM with `gpt` family models**: Make sure you have set up the OpenAI API key and run the following command.
+
+```
+python scripts/run_storm_wiki_gpt.py \
+    --output_dir $OUTPUT_DIR \
+    --do-research \
+    --do-generate-outline \
+    --do-generate-article \
+    --do-polish-article
+```
+- `--do-research`: if True, simulate conversation to research the topic; otherwise, load the results.
+- `--do-generate-outline`: If True, generate an outline for the topic; otherwise, load the results.
+- `--do-generate-article`: If True, generate an article for the topic; otherwise, load the results.
+- `--do-polish-article`:  If True, polish the article by adding a summarization section and (optionally) removing duplicate content.
+
+**To run STORM with `mistral` family models on local VLLM server**: have a VLLM server running with the `Mistral-7B-Instruct-v0.2` model and run the following command.
+
+```
+python scripts/run_storm_wiki_mistral.py \
+    --url $URL \
+    --port $PORT \
+    --output_dir $OUTPUT_DIR \
+    --do-research \
+    --do-generate-outline \
+    --do-generate-article \
+    --do-polish-article
+```
+- `--url` URL of the VLLM server.
+- `--port` Port of the VLLM server.
+
+
+
+## Customize STORM 
+
+### Customization of the Pipeline
+
+STORM is a knowledge curation engine consisting of 4 modules:
+
+1. Knowledge Curation Module: Collects a broad coverage of information about the given topic.
+2. Outline Generation Module: Organizes the collected information by generating a hierarchical outline for the curated knowledge.
+3. Article Generation Module: Populates the generated outline with the collected information.
+4. Article Polishing Module: Refines and enhances the written article for better presentation.
+
+The interface for each module is defined in `src/interface.py`, while their implementations are instantiated in `src/storm_wiki/modules/*`. These modules can be customized according to your specific requirements (e.g., generating sections in bullet point format instead of full paragraphs).
+
+:star2: **You can share your customization of `Engine` by making PRs to this repo!**
+
+### Customization of Retriever Module
+
+As a knowledge curation engine, STORM grabs information from the Retriever module. The interface for the Retriever module is defined in [`src/interface.py`]('src/interface.py'). Please consult the interface documentation if you plan to create a new instance or replace the default search engine API. By default, STORM utilizes the You.com search engine API (see `YouRM` in [`src/rm.py`](src/rm.py)).
+
+:star2: **PRs for integrating more search engines/retrievers are highly appreciated!**
+
+### Customization of Language Models
+
+STORM provides the following language model implementations in [`src/lm.py`](src/lm.py):
+
+- `OpenAIModel`
+- `ClaudeModel`
+- `VLLMClient`
+- `TGIClient`
+- `TogetherClient`
+
+:star2: **PRs for integrating more language model clients are highly appreciated!**
+
+:bulb: **For a good practice,** 
+
+- choose a cheaper/faster model for `conv_simulator_lm` which is used to split queries, synthesize answers in the conversation.
+- if you need to conduct the actual writing step, choose a more powerful model for `article_gen_lm`. Based on our experiments, weak models are bad at generating text with citations.
+- for open models, adding one-shot example can help it better follow instructions.
+
+Please refer to the scripts in the [`examples`](examples) directory for concrete guidance on customizing the language model used in the pipeline.
+
+## Replicate NAACL2024 result
+
+Please switch to the branch `NAACL-2024-code-backup` 
+
+<details>
+  <summary>Show me instructions</summary>
+
+### Paper Experiments
+
+The FreshWiki dataset used in our experiments can be found in [./FreshWiki](FreshWiki).
+
 Run the following commands under [./src](src).
 
-### Pre-writing Stage
+#### Pre-writing Stage
 For batch experiment on FreshWiki dataset:
 ```shell
 python -m scripts.run_prewriting --input-source file --input-path ../FreshWiki/topic_list.csv  --engine gpt-4 --do-research --max-conv-turn 5 --max-perspective 5
@@ -79,7 +176,7 @@ python -m scripts.run_prewriting --input-source console --engine gpt-4 --max-con
 The generated outline will be saved in `{output_dir}/{topic}/storm_gen_outline.txt` and the collected references will be saved in `{output_dir}/{topic}/raw_search_results.json`.
 
 
-### Writing Stage
+#### Writing Stage
 For batch experiment on FreshWiki dataset:
 ```shell
 python -m scripts.run_writing --input-source file --input-path ../FreshWiki/topic_list.csv --engine gpt-4 --do-polish-article --remove-duplicate
@@ -94,37 +191,35 @@ python -m scripts.run_writing --input-source console --engine gpt-4 --do-polish-
 
 The generated article will be saved in `{output_dir}/{topic}/storm_gen_article.txt` and the references corresponding to citation index will be saved in `{output_dir}/{topic}/url_to_info.json`. If `--do-polish-article` is set, the polished article will be saved in `{output_dir}/{topic}/storm_gen_article_polished.txt`. 
 
-## Customize the STORM Configurations
+### Customize the STORM Configurations
 We set up the default LLM configuration in `LLMConfigs` in [src/modules/utils.py](src/modules/utils.py). You can use `set_conv_simulator_lm()`,`set_question_asker_lm()`, `set_outline_gen_lm()`, `set_article_gen_lm()`, `set_article_polish_lm()` to override the default configuration. These functions take in an instance from `dspy.dsp.LM` or `dspy.dsp.HFModel`.
 
-:bulb: **For a good practice,** 
-- choose a cheaper/faster model for `conv_simulator_lm` which is used to split queries, synthesize answers in the conversation.
-- if you need to conduct the actual writing step, choose a more powerful model for `article_gen_lm`. Based on our experiments, weak models are bad at generating text with citations.
 
-
-## Automatic Evaluation
+### Automatic Evaluation
 
 In our paper, we break down the evaluation into two parts: outline quality and full-length article quality.
 
-### Outline Quality
+#### Outline Quality
 We introduce *heading soft recall* and *heading entity recall* to evaluate the outline quality. This makes it easier to prototype methods for pre-writing.
 
 Run the following command under [./eval](eval) to compute the metrics on FreshWiki dataset:
 ```shell
 python eval_outline_quality.py --input-path ../FreshWiki/topic_list.csv --gt-dir ../FreshWiki --pred-dir ../results --pred-file-name storm_gen_outline.txt --result-output-path ../results/storm_outline_quality.csv
 ```
 
-### Full-length Article Quality
+#### Full-length Article Quality
 [eval/eval_article_quality.py](eval/eval_article_quality.py) provides the entry point of evaluating full-length article quality using ROUGE, entity recall, and rubric grading. Run the following command under `eval` to compute the metrics:
 ```shell
 python eval_article_quality.py --input-path ../FreshWiki/topic_list.csv --gt-dir ../FreshWiki --pred-dir ../results --gt-dir ../FreshWiki --output-dir ../results/storm_article_eval_results --pred-file-name storm_gen_article_polished.txt
 ```
 
-### Use the Metric Yourself
+#### Use the Metric Yourself
 The similarity-based metrics (i.e., ROUGE, entity recall, and heading entity recall) are implemented in [eval/metrics.py](eval/metrics.py).
 
 For rubric grading, we use the [prometheus-13b-v1.0](https://huggingface.co/kaist-ai/prometheus-13b-v1.0) introduced in [this paper](https://arxiv.org/abs/2310.08491). [eval/evaluation_prometheus.py](eval/evaluation_prometheus.py) provides the entry point of using the metric.
 
+</details>
+
 ## Contributions
 If you have any questions or suggestions, please feel free to open an issue or pull request. We welcome contributions to improve the system and the codebase!
 
@@ -140,4 +235,3 @@ Please cite our paper if you use this code or part of it in your work:
       booktitle={Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)}
 }
 ```
-
diff --git a/examples/run_storm_wiki_gpt.py b/examples/run_storm_wiki_gpt.py
@@ -0,0 +1,113 @@
+"""
+STORM Wiki pipeline powered by GPT-3.5/4 and You.com search engine.
+You need to set up the following environment variables to run this script:
+    - OPENAI_API_KEY: OpenAI API key
+    - OPENAI_API_TYPE: OpenAI API type (e.g., 'openai' or 'azure')
+    - AZURE_API_BASE: Azure API base URL if using Azure API
+    - AZURE_API_VERSION: Azure API version if using Azure API
+    - YDC_API_KEY: You.com API key
+
+Output will be structured as below
+args.output_dir/
+    topic_name/  # topic_name will follow convention of underscore-connected topic name w/o space and slash
+        conversation_log.json           # Log of information-seeking conversation
+        raw_search_results.json         # Raw search results from search engine
+        direct_gen_outline.txt          # Outline directly generated with LLM's parametric knowledge
+        storm_gen_outline.txt           # Outline refined with collected information
+        url_to_info.json                # Sources that are used in the final article
+        storm_gen_article.txt           # Final article generated
+        storm_gen_article_polished.txt  # Polished final article (if args.do_polish_article is True)
+"""
+
+import os
+import sys
+from argparse import ArgumentParser
+
+sys.path.append('./src')
+from lm import OpenAIModel
+from storm_wiki.engine import STORMWikiRunnerArguments, STORMWikiRunner, STORMWikiLMConfigs
+from utils import load_api_key
+
+
+def main(args):
+    load_api_key(toml_file_path='secrets.toml')
+    lm_configs = STORMWikiLMConfigs()
+    openai_kwargs = {
+        'api_key': os.getenv("OPENAI_API_KEY"),
+        'api_provider': os.getenv('OPENAI_API_TYPE'),
+        'temperature': 1.0,
+        'top_p': 0.9,
+        'api_base': os.getenv('AZURE_API_BASE'),
+        'api_version': os.getenv('AZURE_API_VERSION'),
+    }
+
+    # STORM is a LM system so different components can be powered by different models.
+    # For a good balance between cost and quality, you can choose a cheaper/faster model for conv_simulator_lm 
+    # which is used to split queries, synthesize answers in the conversation. We recommend using stronger models
+    # for outline_gen_lm which is responsible for organizing the collected information, and article_gen_lm
+    # which is responsible for generating sections with citations.
+    conv_simulator_lm = OpenAIModel(model='gpt-3.5-turbo', max_tokens=500, **openai_kwargs)
+    question_asker_lm = OpenAIModel(model='gpt-3.5-turbo', max_tokens=500, **openai_kwargs)
+    outline_gen_lm = OpenAIModel(model='gpt-4-0125-preview', max_tokens=400, **openai_kwargs)
+    article_gen_lm = OpenAIModel(model='gpt-4-0125-preview', max_tokens=700, **openai_kwargs)
+    article_polish_lm = OpenAIModel(model='gpt-4-0125-preview', max_tokens=4000, **openai_kwargs)
+
+    lm_configs.set_conv_simulator_lm(conv_simulator_lm)
+    lm_configs.set_question_asker_lm(question_asker_lm)
+    lm_configs.set_outline_gen_lm(outline_gen_lm)
+    lm_configs.set_article_gen_lm(article_gen_lm)
+    lm_configs.set_article_polish_lm(article_polish_lm)
+
+    engine_args = STORMWikiRunnerArguments(
+        output_dir=args.output_dir,
+        max_conv_turn=args.max_conv_turn,
+        max_perspective=args.max_perspective,
+        search_top_k=args.search_top_k,
+    )
+    runner = STORMWikiRunner(engine_args, lm_configs)
+
+    topic = input('Topic: ')
+    runner.run(
+        topic=topic,
+        do_research=args.do_research,
+        do_generate_outline=args.do_generate_outline,
+        do_generate_article=args.do_generate_article,
+        do_polish_article=args.do_polish_article,
+    )
+    runner.post_run()
+    runner.summary()
+
+
+if __name__ == '__main__':
+    parser = ArgumentParser()
+    # global arguments
+    parser.add_argument('--output-dir', type=str, default='./results/gpt',
+                        help='Directory to store the outputs.')
+    parser.add_argument('--max_thread_num', type=int, default=3,
+                        help='Maximum number of threads to use. The information seeking part and the article generation'
+                             'part can speed up by using multiple threads. Consider reducing it if keep getting '
+                             '"Exceed rate limit" error when calling LM API.')
+    # stage of the pipeline
+    parser.add_argument('--do-research', action='store_true',
+                        help='If True, simulate conversation to research the topic; otherwise, load the results.')
+    parser.add_argument('--do-generate-outline', action='store_true',
+                        help='If True, generate an outline for the topic; otherwise, load the results.')
+    parser.add_argument('--do-generate-article', action='store_true',
+                        help='If True, generate an article for the topic; otherwise, load the results.')
+    parser.add_argument('--do-polish-article', action='store_true',
+                        help='If True, polish the article by adding a summarization section and (optionally) removing '
+                             'duplicate content.')
+    # hyperparameters for the pre-writing stage
+    parser.add_argument('--max-conv-turn', type=int, default=3,
+                        help='Maximum number of questions in conversational question asking.')
+    parser.add_argument('--max-perspective', type=int, default=3,
+                        help='Maximum number of perspectives to consider in perspective-guided question asking.')
+    parser.add_argument('--search-top-k', type=int, default=3,
+                        help='Top k search results to consider for each search query.')
+    # hyperparameters for the writing stage
+    parser.add_argument('--retrieve-top-k', type=int, default=3,
+                        help='Top k collected references for each section title.')
+    parser.add_argument('--remove-duplicate', action='store_true',
+                        help='If True, remove duplicate content from the article.')
+
+    main(parser.parse_args())