Skip to content

RagDatasetGenerator seems ti generate more Q/A pairs than requested #11704

Answered by dosubot bot
vecorro asked this question in Q&A
Discussion options

You must be logged in to vote

Hey @vecorro! Nice to see you back around here. Let's take a look at your issue with the RagDatasetGenerator.

The behavior you're observing is due to the way the RagDatasetGenerator generates and processes questions. When you set num_questions_per_chunk to 2, this parameter influences the question_gen_query that instructs the LLM to generate a certain number of questions per chunk of text. However, the actual number of questions generated can exceed this parameter due to several reasons:

  1. Question Generation Process: The LLM generates questions based on the question_gen_query, which might not strictly adhere to the requested number of questions (num_questions_per_chunk). The language mo…

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by vecorro
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
1 participant