This project provides a RESTful API to manage and query weather data records. It allows users to retrieve weather records and statistics for different weather stations, ensuring robust data validation, error handling, and pagination.
- src/models.py: Contains SQLAlchemy models representing the weather records and statistics.
- src/load_data.py: Script to load weather data from
.txtfiles into the SQLite database.
- src/calculate_statistics.py: Script to calculate yearly weather statistics for each weather station and store them in the database.
- src/resources.py: Contains controllers for the API endpoints.
- src/app.py: Contains actual code to create API endpoints and swagger documentation.
- src/test_app.py: Contains unit tests to ensure the API endpoints function correctly and meet specified requirements.
To store the tab-separated text files containing weather data, I will use Amazon S3 (Simple Storage Service). S3 is ideal for storing large amounts of data due to its durability, scalability, and security.
For the data ingestion process, I will use AWS Lambda. I will write a Lambda function that reads data from the S3 bucket, processes it, and loads it into the database. This function will be triggered on a schedule using Amazon CloudWatch Events.
Instead of using SQLite, I will use Amazon RDS (Relational Database Service) with a database engine like PostgreSQL or MySQL. RDS is fully managed, making it easier to set up, operate, and scale a relational database in the cloud.
I will containerize my Flask API using Docker and deploy it using Amazon ECS (Elastic Container Service) or AWS Fargate. Fargate allows me to run containers without managing the underlying infrastructure. To expose the API endpoints, I will set up Amazon API Gateway.
I will use Amazon CloudWatch Events to schedule the Lambda function to run at specific intervals, automating the data ingestion process.
-
Store Text Files in S3
- Create an S3 bucket and upload all the tab-separated text files.
-
Data Ingestion with Lambda
- Write a Lambda function to read data from S3, process it, and insert it into an Amazon RDS database.
- Use Amazon CloudWatch Events to trigger the Lambda function on a schedule.
-
Set Up the Database in RDS
- Create an RDS instance with PostgreSQL or MySQL.
- Set up the necessary tables in the RDS instance to store the weather data.
-
Deploy Flask API with ECS or Fargate
- Create a Dockerfile for the Flask application.
- Build the Docker image and push it to Amazon ECR (Elastic Container Registry).
- Create an ECS cluster and set up a task definition using the Docker image.
- Use Fargate to run the containers.
-
API Gateway
- Set up Amazon API Gateway to expose the endpoints of the Flask application.
- Configure API Gateway to integrate with the ECS/Fargate service.
Below is the architecture diagram illustrating the AWS services used:
-
Clone the Repository
git clone https://github.com/ashishjoshi2605/colaberry-coding-assesment.git
cd colaberry-coding-assesment -
Create a Virtual Environment
python -m venv venv
-
Activate the Virtual Environment
- Windows:
- Open PowerShell as Administrator.
- Run the following command to allow script execution:
Set-ExecutionPolicy RemoteSigned - Go back to the terminal where you cloned the repository to activate the virtual environment:
venv\Scripts\activate
- macOS/Linux:
source venv/bin/activate
- Windows:
-
Install Requirements
pip install -r requirements.txt
-
Run Data Ingestion (This will take about 5 minutes to execute.)
python src/load_data.py
-
Perform Data Analytics Task
python src/calculate_statistics.py
-
Launch the API
python src/app.py
-
Access the Swagger UI
Open your web browser and navigate to
http://127.0.0.1:5000/apidocsto access the Swagger UI for the API documentation.
To run the unit tests and check code coverage:
coverage run -m unittest discover -s src -p "test_app.py"coverage report -m- Description: Retrieve weather records.
- Method:
GET - Parameters:
page(optional): Page number for pagination (integer).per_page(optional): Number of records per page (integer).date(optional): Filter by date inYYYY-MM-DDformat (string).station_id(optional): Filter by station ID (string).
- Responses:
200 OK: Successfully retrieved weather records.{ "total": 2, "page": 1, "per_page": 10, "items": [ { "id": 1, "date": "20230101", "max_temp": 250, "min_temp": 150, "precipitation": 0, "weather_station_id": "STATION1", "ingestion_timestamp": "2023-01-01T00:00:00" } ] }400 Bad Request: Invalid date format or illogical date.{ "error": "Invalid date format or illogical date. Use YYYY-MM-DD format." }404 Not Found: No records matching the criteria.{ "error": "No records matching this criteria found." }
- Description: Retrieve weather statistics.
- Method:
GET - Parameters:
page(optional): Page number for pagination (integer).per_page(optional): Number of records per page (integer).year(optional): Filter by year inYYYYformat (integer).station_id(optional): Filter by station ID (string).
- Responses:
200 OK: Successfully retrieved weather statistics.{ "total": 1, "page": 1, "per_page": 10, "items": [ { "id": 1, "year": 2023, "weather_station_id": "STATION1", "avg_max_temp": 25.0, "avg_min_temp": 15.0, "total_precipitation": 0.5 } ] }400 Bad Request: Invalid year format.{ "error": "Invalid year format. Use YYYY format." }404 Not Found: No records matching the criteria.{ "error": "No records matching this criteria found." }
Defines the SQLAlchemy models:
WeatherRecord: Represents weather records with fields for date, temperatures, precipitation, and station ID.WeatherStats: Represents weather statistics with fields for year, average temperatures, and total precipitation.
Loads weather data from .txt files into the SQLite database:
- Reads weather data files.
- Inserts data into the
WeatherRecordtable. - Ensures no duplicate entries are inserted.
Calculates and stores yearly weather statistics:
- Computes average maximum and minimum temperatures, and total precipitation for each weather station per year.
- Stores the results in the
WeatherStatstable.
Contains unit tests for the API endpoints:
- Tests validation of date and year formats, ensuring
400 Bad Requestresponses for invalid inputs. - Tests responses for valid inputs and empty results, ensuring
404 Not Foundand200 OKstatuses as appropriate. - Verifies pagination functionality.
- Achieves more than 80% code coverage.
This project ensures robust handling and querying of weather data, with comprehensive validation and error handling, and thorough unit tests to maintain code quality.
