Fine-Tuning CodeLlama on Gretel's Synthetic Text-to-SQL Dataset using Amazon SageMaker JumpStart
Introduction
In our previous blog post, we introduced Gretel's release of a high-quality, purely synthetic Text-to-SQL dataset, available on Hugging Face here. The dataset was generated using the powerful Gretel Navigator model and mimics the complexities and variety of real-world SQL queries. As we continue to explore the potential of synthetic data, this follow-up post delves into the application of this dataset for fine-tuning Large Language Models (LLMs), specifically using the CodeLlama models on Amazon SageMaker Jumpstart. Our goal is to demonstrate the efficacy of synthetic data in enhancing LLMs and AI capabilities for complex tasks such as Text-to-SQL parsing.
Fine-Tuning CodeLlama Using Amazon SageMaker Jumpstart
In this blog post, we use the SageMaker Python SDK to instruction-based fine-tune the CodeLlama-7B and CodeLlama-13B models. We start by downloading the dataset from Hugging Face and export the training portion into a JSONL format.
Each record in the dataset is a pairing of a natural language query with its corresponding SQL query, including metadata detailing the database schema, domain descriptions, and specific tasks.
To fine-tune CodeLlama, we created the following instruction prompt template. This template guides the model on what to generate and provides it with contextual knowledge about the database schema and the nature of the SQL task at hand.
With the dataset and template prepared, we establish a SageMaker session and upload them to Amazon S3.
The process was executed using SageMaker's JumpStart feature using the following hyperparameters, fine-tuning the model for one epoch:
Now, we are ready to fine-tune the CodeLlama model:
The model fine-training process starts and takes about 3.5 hours per epoch. Once the model is trained, we can deploy the fine-tuned model to an endpoint directly from the estimator.
Evaluation on the BIRD Benchmark
We assessed the impact of fine-tuning the CodeLlama model on the BIRD benchmark—an evaluation platform for large-scale cross-domain, text-to-SQL. The benchmark tests across various difficulties: simple, moderate, and challenging SQL queries, revealing the model's comprehensive performance improvements. We conducted the experiments on the BIRD development dataset, containing 1,534 Text-to-SQL pairs, using the two evaluation metrics:
- Execution Accuracy (EX), a metric measuring if the result of executing the generated SQL query matches the gold value.Â
- Valid Efficiency Score (VES), an integrated metric assessing both accuracy of execution results (EX) and the execution efficiency of generated SQL queries.Â
Here’s how the base and fine-tuned models compared on the DEV set by including knowledge evidence (see [1] for more details):
These results show that the fine-tuned model significantly performed better across both CodeLlama models on the EX and VES metrics. This underscores the potential of synthetic datasets in fine-tuning LLMs for specialized tasks, such as Text-to-SQL.Â
[1] Li, J., Hui, B., Qu, G., Yang, J., Li, B., Li, B., ... & Li, Y. (2024). Can llm already serve as a database interface? a big bench for large-scale database grounded text-to-sqls. Advances in Neural Information Processing Systems, 36.
Ready to try yourself?
The SageMaker notebook for this blog post is available here.
You can find Gretel on the AWS Marketplace. If you have any questions about Gretel’s platform or would like to learn more about how synthetic data can help your business, send us a note.
‍
About the authors
Gretel:
- Maarten Van Segbroeck, Yev Meyer, Marjan Emadi, Dhruv Nathawani, Lipika Ramaswamy, Kendrick Boyd, Matthew Grossman, Piotr Mlocek, Drew Newberry
AWS:
- Qiong (Jo) Zhang, PhD, is a Senior Partner SA at AWS, specializing in AI/ML. Her current areas of interest include federated learning, distributed training, and generative AI. She holds 30+ patents and has co-authored 100+ journal/conference papers. She is also the recipient of the Best Paper Award at IEEE NetSoft 2016, IEEE ICC 2011, ONDM 2010, and IEEE GLOBECOM 2005.
- Shashi Raina is a Senior Partner Solutions Architect at Amazon Web Services (AWS), where he specializes in supporting generative AI (GenAI) startups. With close to 6 years of experience at AWS, Shashi has developed deep expertise across a range of domains, including DevOps, analytics, and generative AI.