SQL-Code-Summarization

Used the Llama-2-13B-chat-GGml and Chat-Gpt model for the task. Llama 2 is a set of pre-trained and fine-tuned text models optimized for dialogue tasks, ranging from 7 billion to 70 billion parameters. It surpasses open-source chat models on most metrics and matches closed-source models in human evaluations for usefulness and safety. Llama.cpp is a C/C++ implementation designed to run the LLaMA model with 4-bit integer quantization on MacBook, supporting various architectures and libraries. GGML, a C library for machine learning, aids in distributing large language models (LLMs) by enabling efficient execution on consumer hardware through quantization. The Hugging Face community offers quantized models, including variations based on the GGLM library, like Llama-2-13B-chat-GGML.

In this project, I also utilized the GPT-3.5 turbo model via its API to streamline the generation of summaries for SQL queries, facilitating efficient access to its capabilities without extensive infrastructure setup. Additionally, I integrated Lang Chain with the GPT-3.5 model to refine and enhance the quality and coherence of the generated summaries, ensuring they meet the specific requirements and standards of the SQL query summarization task. This synergistic approach empowered me to harness the full potential of the GPT-3.5 turbo model while improving the overall quality and fluency of the summaries. It's important to note that accessing these models through the OpenAI platform incurs charges based on API usage.

Dataset Used

The SQL queries utilized in my study were sourced from the WikiSQL dataset. This dataset encompasses a collection of 87,726meticulously annotated SQL query and corresponding natural language question pairs. The queries are segmented into distinct sets for training (comprising 61,297 examples), development (with 9,145 examples), and testing (totaling 17,284 examples). This resource is particularly valuable for tasks involving natural language inference within relational databases. Specifically, I exclusively employed the test subset of queries to provide input for the Llama-2 model, thereby obtaining the model-generated outputs.

Running Guide

I've included the Google Colab notebook (i.e., .ipynb file) in the repository. Simply run the file step by step. Additionally, I've thoroughly commented the notebook, providing guidance for each step. It's worth noting that I've upgraded to a Google Colab Pro subscription to accommodate higher GPU needs. If you're accessing the notebook, please consider using the Colab Pro version for optimal performance.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
Comparing_the_results_of_Chat_gpt_and_Llama.ipynb		Comparing_the_results_of_Chat_gpt_and_Llama.ipynb
README.md		README.md
Text_Summarization_Using_Chat_Gpt.ipynb		Text_Summarization_Using_Chat_Gpt.ipynb
Text_Summarization_Using_Llama2.ipynb		Text_Summarization_Using_Llama2.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SQL-Code-Summarization

Dataset Used

Running Guide

About

Uh oh!

Releases

Packages

Uh oh!

Languages

charan-d55/SQL-Code-Summarization

Folders and files

Latest commit

History

Repository files navigation

SQL-Code-Summarization

Dataset Used

Running Guide

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages