A semantic layer for product metrics designed to streamline and automate analyst tasks through intelligent anomaly detection and automated alerting.
MetricWatch is a comprehensive metrics monitoring and anomaly detection system that helps data teams:
- Monitor product metrics across multiple dimensions (time, geography, platform)
- Detect anomalies using statistical analysis with configurable sensitivity
- Automate alerts through Slack and other integrations
- Generate visualizations automatically in Superset
- Orchestrate workflows via Airflow integration
- Track metric dependencies through a hierarchical configuration system
- Statistical anomaly detection using configurable sigma thresholds
- Multi-granularity analysis (hourly, daily, weekly, monthly)
- Sliding window analysis with customizable window sizes
- Dimensional slicing by country, platform, and custom attributes
- Automatic chart generation in Apache Superset
- Pre-configured dashboards for metric monitoring
- Integration with ClickHouse for high-performance analytics
- Slack bot integration for real-time notifications
- Dependency-aware alerting (alerts on root cause metrics)
- Configurable alert thresholds and sensitivity
- Airflow DAG templates for automated metric processing
- Scheduled anomaly detection runs
- Integration with existing data pipelines
- YAML-based metric configuration
- Dependency tracking between metrics
- Formula-based calculated metrics
- Support for lagged metrics and retention analysis
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β ClickHouse β β MetricWatch β β Superset β
β (Data Store) βββββΊβ (Core Engine) βββββΊβ (Visualization) β
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β
βΌ
ββββββββββββββββββββ
β Integrations β
β βββββββββββββββ β
β β Airflow β β
β βββββββββββββββ β
β βββββββββββββββ β
β β Slack β β
β βββββββββββββββ β
ββββββββββββββββββββ
- Python 3.11+
- ClickHouse database (with ready to use marts) β Slack Bot (optional, for chat interface)
- Apache Superset (optional, for visualizations)
- Apache Airflow (optional, for orchestration)
-
Clone the repository
git clone <repository-url> cd metric_watch
-
Install dependencies
pip install -r requirements.txt
-
Configure environment variables
# ClickHouse Configuration export CH_HOST=your-clickhouse-host export CH_USER=your-username export CH_PASSWORD=your-password # Superset Configuration (optional) export SS_URL=http://your-superset-instance export SS_USER_NAME=your-superset-username export SS_PASSWORD=your-superset-password export SS_DATABASE_ID=your-database-id # Airflow Configuration (optional) export AIRFLOW_URL=http://your-airflow-instance
-
Set up database schema
Follow the instructions in DATABASE_SCHEMA.md to create the required database structure.
# Build the Docker image
docker build -t metricwatch .
# Run the container
docker run -e CH_HOST=your-host -e CH_USER=your-user -e CH_PASSWORD=your-password metricwatchDefine your metrics in metrics_hierarchy.yml:
metrics:
- name: new_registered_users
description: "New registered users"
depends_on:
- new_visitors
- 1h_registration_rate
granularities:
- hour:
window_size: 168 # 7 days
sigma_n: 3 # 3 standard deviations
slices:
- platform:
- iOS
- Android
- web
- day:
window_size: 28 # 28 days
sigma_n: 3
slices:
- country:
- USA
- Germany
- Poland- name: Unique identifier for the metric
- description: Human-readable description
- depends_on: List of parent metrics for dependency tracking
- formula: Optional calculation formula for derived metrics
- granularities: Time-based analysis configurations
- window_size: Number of time periods for historical comparison
- sigma_n: Standard deviation threshold for anomaly detection
- slices: Dimensional breakdowns for analysis
# Run anomaly detection for all metrics
python main.py
# Check specific metric
python main.py --metric=new_registered_users
# Check specific time range
python main.py --start=2024-01-01 --end=2024-04-01
# Check with specific granularity and parameters
python main.py --metric=new_registered_users --granularity=hour --window=168 --sigma=3# Monitor new user registrations with 3-sigma threshold
python main.py --metric=new_registered_users --granularity=day --sigma=3
# Check all metrics for the last week
python main.py --start=2024-01-01 --end=2024-01-07# Monitor iOS platform specifically
python main.py --metric=new_registered_users --slice=iOS
# Custom window size for seasonal analysis
python main.py --metric=new_registered_users --window=84 --sigma=2MetricWatch expects specific database tables and structure. See DATABASE_SCHEMA.md for detailed requirements including:
- User tracking tables (
app.users,app.events) - Subscription and sales data
- Platform and country dictionaries
- Time-series optimized schema for ClickHouse
metric_watch/
βββ metric_watch/ # Core library
β βββ charts.py # Superset integration
β βββ helpers.py # Anomaly detection logic
β βββ constants.py # Configuration constants
β βββ ...
βββ airflow_examples/ # Airflow DAG templates
βββ slack_bot/ # Slack integration
βββ templates/ # SQL query templates
βββ metrics_hierarchy.yml # Metrics configuration
βββ main.py # CLI entry point
- Define the metric in
metrics_hierarchy.yml - Create SQL template in
templates/if needed - Test the configuration:
python main.py --metric=your_new_metric
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Submit a pull request
- Anomaly Alerts: When metrics exceed sigma thresholds
- Dependency Alerts: When root cause metrics show issues
- Slack notifications with detailed metric context
- Log-based alerts for system monitoring
- ClickHouse Optimization: Uses columnar storage for fast analytics
- Sliding Windows: Efficient time-series analysis
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
Copyright 2025 Hyperskill
Author: Vladimir Klimov
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
For issues and questions:
- Create an issue in the repository
- Check the DATABASE_SCHEMA.md for setup help
- Review the example configurations in
airflow_examples/