The objective of this competition is to build tools to assist Kaggle developers.
In this competition, we are asked to create notebooks that demonstrate how to use the Gemma LLM to accomplish one or more of the following developer-oriented tasks:
- Answer common questions about the Kaggle platform.
- Explain or teach basic data science concepts.
- Summarize Kaggle Solution write-ups.
- Explain or teach concepts from Kaggle Solution write-ups.
- Answer common questions about the Python programming language.
This notebook guides you through performing "1. Answer common questions about the Kaggle platform"
task for the competition. As this task requires specific knowledge of Kaggle, we need precise information about Kaggle. To do so, I have created a dataset, "Kaggle Docs", collecting data from kaggle.com/docs. To make things easier for the model, the data is curated to have Question-Answer pair format, but if you are interested, the raw data is also available. We will use this dataset to fine-tune Gemma LLM to answer questions about the Kaggle platform.
Fun fact: This notebook is backend-agnostic, supporting TensorFlow, PyTorch, and JAX. However, the best performance can be achieved from JAX
. Utilizing KerasNLP and Keras allows us to choose our preferred backend. Explore more details on Keras.
Note: Code is available in kaggle.