🌱 DataCrunch Final Round - Crop Price Prediction ("The Freezer Gambit")

📝 Introduction

This project implements a crop price prediction system for the DataCrunch Final Round competition ("Legacy of the Market King: The Freezer Gambit"). The goal is to predict weekly fresh crop prices four weeks ahead across various economic centers.

This solution utilizes a time series forecasting approach based on a LightGBM model trained on historical price and weather data. Feature engineering includes time-based features, price lag features, and price rolling window statistics. Hyperparameters were tuned using Optuna for optimal performance. The prediction system is served via a FastAPI application packaged within a Docker container.

Source Code: https://github.com/mehara-rothila/Data_Crunch-Xforce.git

🗂️ Project Structure

.
├── Datasets/             # Contains raw CSV data (copied into image for context)
├── deployment/           # Core application logic, model, features
│   ├── init.py
│   ├── data_loader.py    # Loads and merges price/weather data
│   ├── preprocessing.py  # Basic preprocessing (renaming, type conversion)
│   ├── feature_engineering.py # Creates time, lag, and rolling features
│   ├── model_trainer.py  # Trains model (incl. Optuna tuning), saves artifacts
│   ├── predictor.py      # Loads model/features, generates predictions
│   ├── lgbm_price_model.joblib  # Saved final trained LightGBM model
│   └── model_features.joblib # List of features used by the final model
├── main.py               # FastAPI application entry point & endpoint definitions
├── Dockerfile            # Instructions to build the Docker image
├── requirements.txt      # Python package dependencies
├── image_name.txt        # Docker image name/URI (from Docker Hub)
├── README.md             # This file
├── .gitignore            # Specifies intentionally untracked files for Git
└── Documentation.pdf     # Detailed solution documentation (To be created)
└── Presentation          # Presentation slides (To be created)

(Note: __pycache__ directories are generated by Python and ignored by git via .gitignore)

🔄 Data Pipeline & Feature Engineering

The data processing follows these steps within the deployment modules:

Loading (data_loader.py): Loads train_data.csv for price/weather, parses dates, merges datasets, and drops duplicate weather entries.
Preprocessing (preprocessing.py): Renames price column, converts identifiers (Region, Commodity, Type) to category dtype.
Feature Engineering (feature_engineering.py): Creates features:
- Time Features: Year, month, week, day of week, day of year, etc.
- Lag Features (Price): Price from 28, 35, 42 days prior.
- Rolling Window Features (Price): Mean and standard deviation over 7, 14, 28 days (using .shift(1) to prevent leakage).
NaN Handling: Rows with NaNs from feature generation are dropped before training.

🧠 Model Details

Model: LightGBM Regressor (lightgbm.LGBMRegressor).
Features Used: Combination of original preprocessed data and engineered features (time, price lags, price rolling windows). See deployment/model_features.joblib.
Top Features: Commodity, Region, Price Lags (42, 35, 28 days), Price Rolling Features (Mean/Std over 7, 14, 28 days), and original weather metrics are all important.
Hyperparameter Tuning: Optuna used (30 trials) to minimize validation RMSE.
Validation Strategy: Time-based split (Train up to 2043-04-30, Validate on May-June 2043).
Final Validation RMSE: ~20.06 (achieved on the time-based validation set with the tuned model).

🚀 Setup and Running (Docker)

This application is designed to run inside a Docker container. Docker allows packaging an application with all its dependencies (libraries, code, system tools) into a standardized unit, ensuring it runs consistently across different environments.

Prerequisites

Docker Installation: You need Docker installed and running on your computer. Docker acts as the engine to build and run containers.
- Windows/Mac: Download and install Docker Desktop from the official Docker website: https://www.docker.com/products/docker-desktop/
- Linux: Follow the instructions for your specific distribution: https://docs.docker.com/engine/install/
- After installation, ensure the Docker service/daemon is running (Docker Desktop usually starts it automatically). You can verify by opening a terminal or command prompt and typing docker --version. You should see a version number.
Project Files: You need all the project files (downloaded or cloned from GitHub) in a single directory on your computer.
Terminal/Command Prompt: You will need to run commands in your system's terminal (like Command Prompt or PowerShell on Windows, Terminal on Mac/Linux).

1. Get the Code (If necessary)

If you don't have the files locally, clone the repository using Git:

git clone [https://github.com/mehara-rothila/Data_Crunch-Xforce.git](https://github.com/mehara-rothila/Data_Crunch-Xforce.git)
cd Data_Crunch-Xforce

Make sure your terminal's current directory is the project root (the folder containing the Dockerfile).

2. Build the Docker Image

The Dockerfile in the project contains the recipe for creating the application image. Building the image packages the Python environment, libraries, code, model, and necessary data.

Command: Open your terminal in the project root directory and run:

docker build -t mehararothila/data-crunch-predictor:v1.1 .

Explanation:

docker build: The command to start the image build process.
-t mehararothila/data-crunch-predictor:v1.1: This assigns a memorable name (a "tag") mehararothila/data-crunch-predictor:v1.1 to the image you are building. This makes it easier to refer to later.
.: This crucial dot tells Docker to look for the Dockerfile in the current directory and use the contents of this directory as the "build context" (files to be potentially copied into the image).

Process: Docker will execute the steps in the Dockerfile sequentially. This involves downloading the base Python image, installing system dependencies (like libgomp1), installing all Python packages listed in requirements.txt, and copying your application code and data files into the image. This may take several minutes, especially the first time. Watch for any error messages. A successful build ends with messages like => exporting to image and => naming to ....

3. Run the Docker Container

Once the image is built successfully, you can run a container based on that image. This starts the FastAPI application inside the isolated container environment.

Command: In your terminal, run:

docker run -p 8000:8000 --rm --name predictor-app mehararothila/data-crunch-predictor:v1.1

Explanation:

docker run: The command to create and start a container from an image.
-p 8000:8000: This is the port mapping. It connects port 8000 on your host machine (your computer, the first 8000) to port 8000 inside the container (the second 8000, where the Uvicorn server is listening). This allows you to access the API from your browser using localhost:8000. Make sure port 8000 is not already used by another application on your host.
--rm: This is a cleanup flag. It tells Docker to automatically remove the container (but not the image) when it stops (e.g., when you press CTRL+C in the terminal).
--name predictor-app: Assigns a convenient name predictor-app to the running container instance, making it easier to manage if needed.
mehararothila/data-crunch-predictor:v1.1: Specifies the name of the image you want to run the container from (the one you built in the previous step).

Output: After running the command, you should see log output in your terminal, including lines from Uvicorn like:

INFO:     Started server process [1]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on [http://0.0.0.0:8000](http://0.0.0.0:8000) (Press CTRL+C to quit)

This indicates the API server is running successfully inside the container and is accessible. The terminal will remain attached to the container's logs.

Stopping the Container: To stop the server and the container, go back to the terminal where it's running and press CTRL+C. Because we used the --rm flag, the container will be automatically removed.

(Alternative Run Command using Docker Hub image): If you didn't build the image locally but want to run the one pushed to Docker Hub, use:

docker run -p 8000:8000 --rm --name predictor-app mehararothila/data-crunch-predictor:v1.1

🌟 Deployed Application Access

IMPORTANT DEPLOYMENT INFORMATION

The deployed API documentation (Swagger UI) can be accessed at the following URL:

http://api.mehara.io:8000/docs

Alternatively, you can use the direct IP address:

http://64.227.137.70:8000/docs

Note on Browser Warnings: Since the deployment uses standard HTTP on port 8000 (and not HTTPS with an SSL certificate), your browser will likely display a "Not Secure" warning page when you first access the link.

This is expected behavior. Please click the button or link labeled "Continue to site", "Advanced" -> "Proceed", or similar wording to access the API documentation page.

🔌 API Usage

With the container running, the prediction API is accessible on your local machine.

Interactive Testing (Swagger UI - Recommended)

The easiest way to interact with the API is through the automatically generated Swagger UI documentation.

Open Browser: Open your preferred web browser (Chrome, Firefox, Edge, etc.).
Navigate: Go to the address http://localhost:8000/docs.
Explore: You will see the API documentation page listing the available endpoints (/api/predict, /api/data/weather, /api/data/prices).
Test Prediction:
- Click on the POST /api/predict endpoint bar to expand it.
- Click the "Try it out" button on the right side.
- An editable "Request body" field will appear, pre-filled with an example. Modify the crop and region values if desired (use values known to be in the training data for meaningful results, e.g., "Cantaloupe", "Valhalla").
- Click the blue "Execute" button.
- Scroll down to see the "Server response". It will show the curl command equivalent, the request URL, and the response body (containing the predictions) or any error messages, along with the HTTP status code (e.g., 200 for success).

Command Line Testing (curl)

If you prefer using the command line and have curl installed:

curl -X POST "http://localhost:8000/api/predict" -H "accept: application/json" -H "Content-Type: application/json" -d '{"crop": "Cantaloupe", "region": "Valhalla"}'

-X POST: Specifies the HTTP POST method.
"http://localhost:8000/api/predict": The URL of the endpoint.
-H "accept: application/json": Header indicating the client accepts JSON responses.
-H "Content-Type: application/json": Header indicating the request body is JSON.
-d '{"crop": "Cantaloupe", "region": "Valhalla"}': The JSON data being sent in the request body.

The response JSON will be printed directly to your terminal.

📊 Evaluation Criteria Alignment

Accuracy (RMSE): Optimized to ~20.06 via feature engineering and Optuna tuning.
Resources: Uses efficient LightGBM, category dtypes, slim base image. Image size ~585MB (well under 8GB limit). RAM usage expected to be within <2GB limit.
Packaging: Delivered as a runnable Docker image with code and dependencies.
API: Implements the specified API endpoints.
Reproducibility: Uses fixed random seed for model training.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🌱 DataCrunch Final Round - Crop Price Prediction ("The Freezer Gambit")

📝 Introduction

🗂️ Project Structure

🔄 Data Pipeline & Feature Engineering

🧠 Model Details

🚀 Setup and Running (Docker)

Prerequisites

1. Get the Code (If necessary)

2. Build the Docker Image

3. Run the Docker Container

🌟 Deployed Application Access

🔌 API Usage

Interactive Testing (Swagger UI - Recommended)

Command Line Testing (curl)

📊 Evaluation Criteria Alignment

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Datasets		Datasets
deployment		deployment
.gitignore		.gitignore
DEPLOYMENT_INFO.txt		DEPLOYMENT_INFO.txt
Dockerfile		Dockerfile
README.md		README.md
Xforce-Report.pdf		Xforce-Report.pdf
Xforce.zip		Xforce.zip
Xforce_Presentation.pdf		Xforce_Presentation.pdf
image_name.txt		image_name.txt
main.py		main.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

🌱 DataCrunch Final Round - Crop Price Prediction ("The Freezer Gambit")

📝 Introduction

🗂️ Project Structure

🔄 Data Pipeline & Feature Engineering

🧠 Model Details

🚀 Setup and Running (Docker)

Prerequisites

1. Get the Code (If necessary)

2. Build the Docker Image

3. Run the Docker Container

🌟 Deployed Application Access

🔌 API Usage

Interactive Testing (Swagger UI - Recommended)

Command Line Testing (curl)

📊 Evaluation Criteria Alignment

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages