I've addressed the ResQ Co Task's Data Analytics Engineering position questions within this repository.
This document provides instructions on how to setup and run this project.
Before you begin, ensure you have met the following requirements:
- You have installed the latest version of Python.
To install the necessary dependencies for this project, execute the following command:
pip install -r requirements.txtMetabase is used for creating dashboards in this project. You can set it up by visiting either of the following links and following the setup instructions provided in their documentation:
You can view the full project report at the link below.
This section provides a detailed description of the files in this project.
-
problemOne.py/problemOne.ipynb:
- Purpose: This file is used to create a table using the pandas library and save it as a database file.
- Language: Python
- Libraries Used: pandas
- Get results just by run python problemTwo.py
-
problemOneSql.py/problemOneSql.ipynb:
- Purpose: This file is used to create a view in the database. It establishes a connection between Python and SQLite and executes the query.
- Language: Python
- Libraries Used: sqlite3
- Get results just by run python problemTwo.py
- viewQuery.txt:
- Purpose: This text file contains the query that is used in Metabase to create the view.
- problemTwo.py.py/problemTwo.py.ipynb:
- Purpose: This file is used to create plots and reports.
- Language: Python
- Libraries Used: pandas
- Get report just by run python problemTwo.py
-
avgDailyVsAvgHolidayQuery.txt:
- Purpose: This text file contains the query that is used in Metabase to create a dashboard for comparing average daily sales and holidays.
-
avgDailyVsholidaySeriesQuery.txt:
- Purpose: This text file contains the query that is used in Metabase to create a dashboard for comparing average daily sales and series sales in holidays.
-
npurchasingQuery.txt:
- Purpose: This text file contains the query that is used in Metabase to create a dashboard for comparing the number of purchasing users on normal days and holidays.
-
sellingProvidersQuery.txt:
- Purpose: This text file contains the query that is used in Metabase to create a dashboard for comparing the number of providers on normal days and holidays.
This section provides instructions on how to automate the execution of your Python script and refresh your Metabase dashboard.
In production, we can integrate Airflow and Metabase to automate the creation and preparation of data, as well as the plotting of results.
Airflow, with its Directed Acyclic Graph (DAG) model, simplifies the orchestration and scheduling of transformations, ensuring accurate and timely execution. Each task in the pipeline is represented as a node in the DAG, and the dependencies between tasks are represented as edges. This allows for complex workflows to be visualized and managed effectively.
Meanwhile, Metabase offers an intuitive and user-friendly interface for visualizing the transformed data, enabling users to gain valuable insights effortlessly. By integrating these tools, you can enhance your data workflows, automate processes, and make informed decisions with ease.
To run your Python script periodically, you can use a cron job. Follow these steps:
- Open your terminal.
- Type
crontab -eto edit the cron table. - Add the following line to schedule your script to run at midnight (00:00) every day:
0 0 * * * /usr/bin/python3 /path/to/your/script.pyReplace /path/to/your/script.py with the actual path to your Python script.
To refresh your dashboard in Metabase, you can use the Auto-refresh feature. Follow the instructions provided in the Metabase documentation to set it up.
