Olympians: Greek Gods Q&A Model

Overview

The Olympians project aims to enhance a large language model (LLM) to provide accurate and concise answers about Greek gods, such as Zeus and Athena, based on structured data. Using the Cohere API and LangChain, the project implements a few-shot learning approach to ensure the model delivers precise, short responses (up to 4-5 words in Persian) about Greek mythology. The model leverages data from a gods.json file and a few-shot prompt with example questions and answers to improve response quality.

Objective

The goal is to create a model that:

Provides accurate answers about Greek gods based on the provided dataset.
Responds in Persian, with answers limited to 4-5 words.
Returns "نمی‌دانم" (I don't know) for unanswerable questions.
Uses a few-shot learning approach to guide the model with relevant examples.
Ensures deterministic responses by minimizing randomness in the model's output.

Features

Data Integration: Loads Greek gods' data from gods.json, including names, descriptions, appearances, and features.
Few-Shot Learning: Uses a set of at least 10 example question-answer pairs to guide the model.
Semantic Example Selection: Dynamically selects the most relevant examples for each query using semantic similarity.
Concise Responses: Ensures answers are short (4-5 words) and in Persian, adhering to project constraints.
Deterministic Output: Configures the model with zero temperature to prioritize high-probability responses.
API Management: Handles API rate limits by pausing execution after every 8 requests.

Prerequisites

To run this project, you need:

Python 3.9 or higher
Libraries:
- langchain-core
- langchain-cohere
- langchain-chroma
- pandas
A valid Cohere API key (set via environment variable COHERE_API_KEY).
Files:
- gods.json: Contains Greek gods' data.
- questions.csv: Contains test questions for evaluation.

Installation

Clone or download the repository:
```
git clone <repository-url>
```
Navigate to the project directory:
```
cd <project-directory>
```

Install required Python libraries:

pip install langchain-core langchain-cohere langchain-chroma pandas

Ensure gods.json and questions.csv are in the data directory.

Set up your Cohere API key:

import os
os.environ["COHERE_API_KEY"] = "your-api-key"

Usage

Open the Olympians.ipynb notebook in Jupyter or a compatible environment (e.g., Google Colab).
Run the cells sequentially to:
- Load the gods.json data.
- Create a system prompt with gods' information and instructions.
- Define a few-shot prompt with example questions and answers.
- Configure the Cohere model with zero temperature.
- Process questions from questions.csv and generate answers.
- Save results in submission.csv, examples.csv, and model_configs.json.
To generate the final submission:
```
python -m notebook Olympians.ipynb
```
A result.zip file will be created, containing:
- Olympians.ipynb
- submission.csv (model answers)
- examples.csv (few-shot examples)
- model_configs.json (model parameters)

Code Structure

The Olympians.ipynb notebook is organized as follows:

API Key Setup: Configures the Cohere API key using getpass.
Data Loading: Reads gods.json into a Python dictionary using json.load.
System Prompt: Constructs a system message with gods' data and instructions for concise Persian responses.
Few-Shot Examples: Defines a list of 10 question-answer pairs for few-shot learning, covering various aspects of Greek gods (e.g., names, roles, symbols).
Prompt Template: Uses langchain_core.prompts to create a few-shot prompt with semantic example selection via SemanticSimilarityExampleSelector.
Model Configuration: Initializes the Cohere model with temperature=0 for deterministic outputs.
Chain Creation: Combines the prompt, model, and a string output parser using LangChain's | operator.
Question Processing: Reads questions from questions.csv, generates answers using chain.invoke, and saves them in a submission DataFrame.
Rate Limiting: Pauses execution for 65 seconds after every 8 requests to respect API limits.
Output Generation: Creates result.zip with the notebook, answers, examples, and model configurations.

Example Output

Sample content of submission.csv:

answer
زئوس
آتن
نمی‌دانم
قو
...

Sample content of examples.csv:

question,answer
خدای جنگ کیست؟,آرس
حیوان مقدس آپولون چیست؟,قو
آرس پسر چه کسانی است؟,زئوس و هرا
...

Constraints

Response Length: Answers must be 4-5 words max, or "نمی‌دانم" for unanswerable questions.
Language: All responses must be in Persian.
Determinism: Model temperature set to 0 to avoid creative or random outputs.
Evaluation: Model must correctly answer at least 85% of questions in questions.csv.
Few-Shot Examples: Must include at least 10 examples, excluding test questions and "نمی‌دانم" answers.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Olympians: Greek Gods Q&A Model

Overview

Objective

Features

Prerequisites

Installation

Usage

Code Structure

Example Output

Constraints

About

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data		data
Olympians.ipynb		Olympians.ipynb
README.markdown		README.markdown
examples.csv		examples.csv
model_configs.json		model_configs.json
submission.csv		submission.csv

Folders and files

Latest commit

History

Repository files navigation

Olympians: Greek Gods Q&A Model

Overview

Objective

Features

Prerequisites

Installation

Usage

Code Structure

Example Output

Constraints

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages