CTE — Character Traits Evaluator

Try the Live Demo →

A reusable ML framework for behavioral self-tracking, personality trait extraction, and job-fit evaluation.

CTE provides a complete pipeline from raw behavioral data to actionable career insights. Whether you're tracking your own productivity patterns or building a workforce analytics tool, CTE offers the building blocks you need.

What CTE Does

Raw Behavioral Data → Clean Features → ML Models → Trait Profile → Job-Fit Score

Data Cleaning — Robust, deterministic parsing for messy self-tracking data
Feature Engineering — Temporal features, cyclical encodings, rolling statistics
Trait Extraction — Convert behavioral patterns into personality scores
Job-Fit Scoring — Match traits against job requirements with explainable verdicts

Quick Start (30 seconds)

Option A: Try the Demo Instantly

git clone https://github.com/deepakdeo/cte-project.git
cd cte-project
poetry install

# Launch the dashboard
PYTHONPATH=src poetry run streamlit run scripts/cte_app.py

Then click "🧪 Demo Mode → Load Demo Assets" in the sidebar to see the full experience.

Option B: Generate Your Own Demo Data

# Generate 90 days of synthetic behavioral data
poetry run python src/cte/synthetic.py --days 90 --out data/sample/my_data.csv

# Clean it
poetry run python src/cte/data.py --in data/sample/my_data.csv --out data/sample/my_data_clean.parquet

# Generate a persona
PYTHONPATH=src poetry run python scripts/generate_demo_persona.py

Option C: Use Docker

First, install Docker Desktop for your OS.

Then run:

docker compose up --build

Open http://localhost:8501 in your browser.

To stop: press Ctrl+C in the terminal.

Why Use CTE?

Use Case	How CTE Helps
Self-improvement	Track your patterns, understand what drives productivity
Career planning	Match your traits to job requirements before applying
Workforce analytics	Framework for trait-based team composition
Research	Reproducible pipeline for behavioral studies
Learning	Well-structured ML project demonstrating end-to-end skills

Core Features

Synthetic Data Generator

Generate realistic behavioral data for testing or demos:

from cte.synthetic import generate_synthetic_dataset

# Generate 90 days of data with temporal correlations
df = generate_synthetic_dataset(n_days=90, seed=42)

The generator creates realistic patterns including:

Weekday/weekend differences
Sleep → productivity correlations
Mood coherence with reflections
Social interaction patterns

Robust Data Cleaning

from cte.data import clean_csv

# Handles messy real-world data
clean_csv("raw_data.csv", "clean.parquet")

Header normalization (handles newlines, typos)
Deterministic date/time parsing
Duration parsing (7h38m → 7.63 hours)
Flexible boolean coercion (yes/y/true/1 → 1)
Social interaction encoding (positive/neutral/negative → +1/0/-1)

Feature Engineering

from cte.features import engineer_features

# Add lags, rolling stats, cyclical encodings
df_features = engineer_features(df_clean)

Lag features (t-1, t-2, t-3)
7-day rolling mean/std
Cyclical time encodings (sin/cos)
Day-of-week one-hots

Job-Fit Scoring

from cte.scoring import score_requirements

# Compare persona against job requirements
overall, match_ratio, risk, details, criticals = score_requirements(
    per_trait=persona["per_trait"],
    requirements=[
        {"trait": "communication", "required_level": "high"},
        {"trait": "teamwork", "required_level": "medium"},
    ]
)
# Returns: "Strong fit", 0.85, "low-risk", [...], []

Project Structure

cte-project/
├── src/cte/                    # Core Python package
│   ├── data.py                 # Data cleaning pipeline
│   ├── features.py             # Feature engineering
│   ├── synthetic.py            # Synthetic data generator
│   ├── nlp.py                  # Sentiment analysis
│   ├── requirements.py         # JD parsing (LLM + heuristic)
│   ├── scoring.py              # Job-fit scoring
│   └── report.py               # Report generation
├── scripts/
│   ├── cte_app.py              # Streamlit dashboard
│   ├── cte_cli.py              # CLI tool
│   └── generate_demo_persona.py # Demo persona generator
├── tests/                      # Unit tests (56 tests)
│   ├── test_synthetic.py
│   ├── test_data.py
│   └── test_scoring.py
├── notebooks/                  # Analysis notebooks
│   ├── 01_EDA.ipynb
│   ├── 02_Features.ipynb
│   ├── 03_Baselines.ipynb
│   └── ...
├── data/
│   └── sample/                 # Demo data (committed)
│       ├── synthetic_90d.csv
│       ├── demo_persona.json
│       └── sample_jd.txt
└── docs/                       # Additional documentation
    └── COLLECT_YOUR_DATA.md    # Data collection guide

Running Tests

poetry run pytest tests/ -v

# With coverage
poetry run pytest tests/ --cov=src/cte --cov-report=term-missing

Bring Your Own Data

CTE works with any behavioral tracking data that matches the schema. See docs/COLLECT_YOUR_DATA.md for:

Required columns and formats
Optional columns
Data collection tips
Integration with tracking apps

The Pipeline in Detail

1. Data Cleaning (`data.py`)

Transforms messy self-tracking exports into clean, typed data:

Raw Input	Cleaned Output
`"Jan 27, 2025"`	`2025-01-27` (datetime)
`"7h38m"` or `"7:38"`	`7.63` (float hours)
`"yes"`, `"Y"`, `"1"`	`1` (Int64)
`"positive"`	`1.0` (interaction score)

2. Feature Engineering (`features.py`)

Adds temporal and behavioral features:

Cyclical encoding: wakeup_time → wakeup_sin, wakeup_cos
Lags: productivity_lag1, sleep_lag2
Rolling stats: productivity_roll7_mean, productivity_roll7_std

3. Trait Extraction

Maps behavioral patterns to personality traits:

Behavior Pattern	Trait
High productivity + deep work	Focus
Consistent routines	Reliability
Recovery from low days	Resilience
Positive social interactions	Communication

4. Job-Fit Scoring (`scoring.py`)

Matches persona against requirements with configurable thresholds:

thresholds = {"low": 0.50, "medium": 0.60, "high": 0.70}
weights = {"low": 1.0, "medium": 1.2, "high": 1.5}

Returns explainable verdicts: Strong fit, Possible fit, Leaning no, Not a fit

Streamlit Dashboard

The interactive dashboard provides:

Dashboard: Radar charts, trait breakdown, performance trends
Job Evaluation: Paste a JD, get instant fit analysis
Daily Updates: Log daily performance, build evidence
Starter Mode: Create a persona from a 2-minute questionnaire

CLI Usage

# Evaluate job fit from command line
PYTHONPATH=src poetry run python scripts/cte_cli.py \
  --persona data/sample/demo_persona.json \
  --jd data/sample/sample_jd.txt

Deploy Your Own

Streamlit Cloud (Free)

Fork this repo
Go to share.streamlit.io
Connect your GitHub and select the repo
Set main file: scripts/cte_app.py
Deploy!

Your dashboard will be live at https://your-app.streamlit.app

Note: The app works without an OpenAI API key (uses keyword-based extraction). Users can optionally add their own API key in the sidebar for LLM-powered analysis.

Docker

First, install Docker Desktop for your OS.

docker compose up --build
# Open http://localhost:8501

Limitations & Responsible Use

Not a hiring tool: CTE is for self-reflection and career exploration, not employment decisions
Small sample sizes: Trait scores are estimates; treat with appropriate uncertainty
Privacy first: All processing runs locally; no data leaves your machine
Bias awareness: Self-reported data reflects perception, not objective reality

Tech Stack

Category	Technologies
Core	Python 3.11+, pandas, numpy, scikit-learn
NLP	transformers, vaderSentiment, OpenAI API
ML	XGBoost, statsmodels, SHAP
App	Streamlit, Plotly
Infra	Poetry, Docker, pytest

Roadmap

Contributing

Contributions welcome! Areas of interest:

Additional trait extraction methods
Integration with more data sources (Oura, Whoop, etc.)
Public dataset validation
UI/UX improvements

License

MIT — see LICENSE

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
.devcontainer		.devcontainer
.github/workflows		.github/workflows
.streamlit		.streamlit
assets		assets
data/sample		data/sample
docs		docs
notebooks		notebooks
scripts		scripts
src/cte		src/cte
tests		tests
.coverage		.coverage
.gitignore		.gitignore
.python-version		.python-version
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

License

deepakdeo/cte-project

Folders and files

Latest commit

History

Repository files navigation

CTE — Character Traits Evaluator

Try the Live Demo →

What CTE Does

Quick Start (30 seconds)

Option A: Try the Demo Instantly

Option B: Generate Your Own Demo Data

Option C: Use Docker

Why Use CTE?

Core Features

Synthetic Data Generator

Robust Data Cleaning

Feature Engineering

Job-Fit Scoring

Project Structure

Running Tests

Bring Your Own Data

The Pipeline in Detail

1. Data Cleaning (data.py)

2. Feature Engineering (features.py)

3. Trait Extraction

4. Job-Fit Scoring (scoring.py)

Streamlit Dashboard

CLI Usage

Deploy Your Own

Streamlit Cloud (Free)

Docker

Limitations & Responsible Use

Tech Stack

Roadmap

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

1. Data Cleaning (`data.py`)

2. Feature Engineering (`features.py`)

4. Job-Fit Scoring (`scoring.py`)

Packages