A flexible Python-based analytics tool for performing statistical analysis and data visualisation using YAML configuration files.
- Statistical Analysis: Compute various statistics (mean, sum, median, standard deviation, etc.) across any variables
- Data Visualization: Generate multiple chart types including:
- Bar charts
- Line charts
- Histograms
- Box plots
- Scatter plots (2-way charts)
- Correlation heatmaps
- YAML Configuration: Define your analysis workflow in a simple, readable YAML file
- Flexible: Works with rectangular (tabular) data in CSV format
- Python 3.7 or higher
pip install -r requirements.txtThis will install:
- pandas (data manipulation)
- matplotlib (plotting)
- seaborn (statistical visualisation)
- pyyaml (YAML parsing)
- numpy (numerical operations)
Run the analytics tool with a configuration file and data file:
python analytics.py config.yaml sample_data.csvCreate a YAML configuration file to specify what analysis to perform. Here's an example:
# Output directory for charts
output_dir: output
# Statistical Analysis
statistics:
variables: [a, b, c, d, e]
operations:
- mean
- sum
- median
- std
# Charts to Generate
charts:
- type: bar
variables: [a, b, c]
title: "Variable Comparison"
- type: scatter
variables: [a, b]
title: "A vs B Scatter Plot"Your data should be in CSV format with column headers. Example:
a,b,c,d,e
49.67,74.84,15.79,29.76,28
48.62,61.31,41.73,34.00,22
...statistics:
variables: [a, b, c] # Variables to analyse (omit for all)
operations: # Statistical operations to perform
- mean # Average
- sum # Total sum
- median # Median value
- std # Standard deviation
- var # Variance
- min # Minimum value
- max # Maximum value
- count # Count of valuesCompare values across variables:
- type: bar
variables: [a, b, c, d]
title: "Comparison Chart"Show trends over data points:
- type: line
variables: [a, b]
title: "Trend Analysis"Display distribution of values:
- type: histogram
variables: [a]
title: "Distribution of Variable A"Show statistical distribution:
- type: box
variables: [a, b, c, d, e]
title: "Box Plot Comparison"Visualise relationship between two variables:
- type: scatter
variables: [a, b] # First variable is X-axis, second is Y-axis
title: "A vs B Relationship"Show correlation between variables:
- type: heatmap
variables: [a, b, c, d, e]
title: "Correlation Matrix"- Prepare your data: Create a CSV file with your variables
- Create configuration: Define what statistics and charts you want
- Run analysis:
python analytics.py config.yaml data.csv
- View results:
- Statistics are printed to the console
- Charts are saved in the output directory (default:
output/)
The repository includes sample data with variables a, b, c, d, e:
# Generate new sample data (optional)
python generate_sample_data.py
# Run analytics on sample data
python analytics.py config.yaml sample_data.csvStatistics are printed in a readable format:
==================================================
STATISTICAL ANALYSIS
==================================================
a:
------------------------------
mean : 50.1234
sum : 5012.3400
median : 49.5600
...
Charts are saved as PNG files in the output directory:
chart_1_bar.pngchart_2_scatter.png- etc.
Specify a custom output directory in your config:
output_dir: my_resultsCreate different configuration files for different analyses:
python analytics.py basic_stats.yaml data.csv
python analytics.py detailed_charts.yaml data.csv
python analytics.py correlation_analysis.yaml data.csv- Export your data as CSV with headers
- Note the variable names (column headers)
- Create a config file specifying your variables
- Run the analysis
pip install -r requirements.txtEnsure config and data files exist:
ls config.yaml sample_data.csvCheck your YAML syntax:
- Use spaces (not tabs) for indentation
- Ensure proper list formatting with brackets or dashes
- Validate variable names match your CSV columns
.
├── analytics.py # Main analytics tool
├── config.yaml # Sample configuration
├── sample_data.csv # Sample dataset
├── generate_sample_data.py # Data generator script
├── requirements.txt # Python dependencies
├── README.md # This file
└── output/ # Generated charts (created automatically)
This project is provided as-is for educational and analytical purposes.