A command-line tool to find the most active cookie(s) for a specific day from a cookie log file.
This tool analyzes cookie log files to identify which cookie(s) had the most activity on a given date. It's designed to be efficient, handling large files through optimized algorithms and early-exit strategies.
Key Features:
- π Fast: Optimized for performance with early-exit when log is sorted by timestamp
- π Handles Ties: Returns all cookies with the maximum visit count
- π‘οΈ Robust: Comprehensive error handling and input validation
- π Well-Tested: 90% code coverage with 21 comprehensive tests
- π Production-Ready: Professional logging, type hints, and documentation
See QUICKSTART.md for a 5-minute setup guide.
# Install
pip install -r requirements.txt
# Run
python src/most_active_cookie.py -f test_inputs/cookie_log.csv -d 2018-12-09
# Output
AtY0laUfhglK3lC7- Installation
- Usage
- Algorithm & Design
- Testing
- Project Structure
- Code Quality
- Performance
- API Documentation
- Contributing
- License
- Python 3.9 or higher
- pip (Python package installer)
# Clone the repository
git clone https://github.com/shantanu747/Cookie-Analyzer.git
cd Cookie-Analyzer
# Create and activate virtual environment
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt- pytest (8.3.3): Testing framework
- pytest-cov (5.0.0): Test coverage reporting
- ruff (0.8.4): Fast Python linter and formatter
- mypy (1.13.0): Static type checker
python src/most_active_cookie.py -f <log_file> -d <date>Arguments:
-f, --file FILENAME: Path to the cookie log CSV file (required)-d, --date YYYY-MM-DD: Target date in YYYY-MM-DD format (required)-v, --verbose: Enable verbose logging (optional)--version: Show program version--help: Show help message
python src/most_active_cookie.py -f test_inputs/cookie_log.csv -d 2018-12-09Output:
AtY0laUfhglK3lC7
python src/most_active_cookie.py -f test_inputs/cookie_log.csv -d 2018-12-08Output (order may vary):
SAZuXPGUrfbcn5UA
4sMM2LxV07bPJzwf
fbcn5UAVanZf6UtG
python src/most_active_cookie.py -f test_inputs/cookie_log.csv -d 2018-12-09 -vShows detailed debug information including parsing details and decision points.
python src/most_active_cookie.py -f test_inputs/cookie_log.csv -d 2018-12-01Output: (empty - no cookies found for that date)
The input CSV file must have the following format:
cookie,timestamp
AtY0laUfhglK3lC7,2018-12-09T14:19:00+00:00
SAZuXPGUrfbcn5UA,2018-12-09T10:13:00+00:00
5UAVanZf6UtGyKVS,2018-12-09T07:25:00+00:00Requirements:
- Header row:
cookie,timestamp - Timestamps in ISO 8601 format with timezone:
YYYY-MM-DDTHH:MM:SS+00:00 - Sorted by timestamp (newest first) for optimal performance
- UTF-8 encoding
The tool uses a single-pass counting algorithm with early-exit optimization:
- Validation: Verify date format and file existence
- Streaming Parse: Read file line-by-line (memory efficient)
- Date Filtering: Extract date from timestamp and filter matches
- Counting: Use
Counterto track cookie occurrences - Early Exit: Stop scanning once we pass the target date (assumes sorted data)
- Tie Handling: Return all cookies with maximum count
- Best Case: O(k) where k = number of entries on target date (early exit)
- Worst Case: O(n) where n = total entries (target date at end or not found)
- Space: O(u) where u = unique cookies on target date
- O(1) insertions and lookups
- Clean, Pythonic API
- Built-in max/most_common operations
- Memory efficient for large files
- No need to load entire file into memory
- Streaming approach scales to any file size
- Exploits sorted nature of logs (newest first)
- Dramatically improves performance for recent dates
- Falls back gracefully if not sorted
- Cross-platform path handling
- Object-oriented interface
- Better type safety
βββββββββββββββββββββββββββββββββββββββββββ
β Command Line Interface β
β (parse_arguments, main function) β
βββββββββββββββββ¬ββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββ
β CookieLogAnalyzer Class β
β βββββββββββββββββββββββββββββββββββββ β
β β find_most_active_cookies() β β
β β - Validates date format β β
β β - Opens file in streaming mode β β
β β - Parses and counts cookies β β
β β - Returns results β β
β βββββββββββββββββββββββββββββββββββββ β
β βββββββββββββββββββββββββββββββββββββ β
β β _parse_line() β β
β β - Splits CSV line β β
β β - Validates format β β
β βββββββββββββββββββββββββββββββββββββ β
β βββββββββββββββββββββββββββββββββββββ β
β β _extract_date() β β
β β - Parses ISO 8601 timestamp β β
β β - Extracts date component β β
β βββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββ
β Logging System β
β - File logging (all levels) β
β - stderr handler (errors only) β
β - Structured format with timestamps β
βββββββββββββββββββββββββββββββββββββββββββ
The project includes a comprehensive test suite with 21 tests covering:
- β Happy path scenarios
- β Edge cases (ties, empty files, invalid data)
- β Error handling (missing files, invalid dates)
- β Performance considerations
- β Command-line interface
# Run all tests
pytest
# Verbose output
pytest -v
# With coverage report
pytest --cov=src --cov-report=html
# Run specific test class
pytest tests/test_most_active_cookie.py::TestCookieLogAnalyzer -v
# Run specific test
pytest tests/test_most_active_cookie.py::TestCookieLogAnalyzer::test_single_most_active_cookie -vCurrent coverage: 90%
Name Stmts Miss Cover
-----------------------------------------------
src/__init__.py 0 0 100%
src/most_active_cookie.py 117 12 90%
-----------------------------------------------
TOTAL 117 12 90%
View detailed coverage report:
pytest --cov=src --cov-report=html
open htmlcov/index.htmlThe project includes multiple test files for different scenarios:
| File | Rows | Purpose | Expected Result |
|---|---|---|---|
cookie_log.csv |
9 | Original sample | Tests basic functionality |
cookie_log_1.csv |
150 | Edge case testing | 3 cookies tied at 15 visits each |
cookie_log_2.csv |
1,000 | Performance test | Single day, full file scan |
cookie_log_3.csv |
10,000 | Scale test | 30 days, early-exit optimization |
Cookie-Analyzer/
βββ src/
β βββ __init__.py # Package initialization
β βββ most_active_cookie.py # Main application
β βββ most_active_cookie Logs/ # Log files (auto-created)
β βββ YYYYMMDD_HHMMSS.txt # Timestamped log files
βββ tests/
β βββ __init__.py # Test package initialization
β βββ test_most_active_cookie.py # Comprehensive test suite
βββ test_inputs/
β βββ cookie_log.csv # Original sample (9 rows)
β βββ cookie_log_1.csv # Edge case test (150 rows)
β βββ cookie_log_2.csv # Performance test (1,000 rows)
β βββ cookie_log_3.csv # Scale test (10,000 rows)
βββ requirements.txt # Python dependencies
βββ pyproject.toml # Project configuration
βββ QUICKSTART.md # Quick start guide
βββ README.md # This file
βββ .gitignore # Git ignore patterns
The codebase uses Python type hints throughout for better IDE support and static analysis:
def find_most_active_cookies(self, target_date: str) -> List[str]:
"""Find the most active cookie(s) for a specific date."""
...# Check code quality
ruff check .
# Auto-format code
ruff format .
# Type checking
mypy src/most_active_cookie.py- Docstrings: Google-style docstrings for all public functions
- Type Hints: Full type coverage for function signatures
- Error Handling: Specific exception types with descriptive messages
- Logging: Structured logging at appropriate levels (DEBUG, INFO, ERROR)
- Comments: Inline comments explaining complex logic
Tests conducted on M3 Mac:
| File Size | Records | Date Position | Time | Memory |
|---|---|---|---|---|
| 1 KB | 9 | Middle | <1 ms | <5 MB |
| 20 KB | 150 | Middle | <5 ms | <5 MB |
| 150 KB | 1,000 | All same day | ~15 ms | <10 MB |
| 1.5 MB | 10,000 | Early (day 15/30) | ~50 ms | <15 MB |
| 1.5 MB | 10,000 | Late (day 1/30) | ~80 ms | <15 MB |
Strengths:
- β Memory efficient (streaming, doesn't load entire file)
- β Early-exit optimization for sorted logs
- β O(1) lookups via Counter
- β Single-pass algorithm
Scalability:
- Handles 10,000+ records easily
- Linear time complexity O(n) worst case
- Constant memory per unique cookie O(u)
- Suitable for log files up to several GB
The main analyzer class for processing cookie log files.
CookieLogAnalyzer(log_file: Path)Parameters:
log_file(Path): Path to the cookie log CSV file
Raises:
FileNotFoundError: If log file doesn't exist
Example:
from pathlib import Path
from src.most_active_cookie import CookieLogAnalyzer
analyzer = CookieLogAnalyzer(Path("test_inputs/cookie_log.csv"))find_most_active_cookies(target_date: str) -> List[str]Find the most active cookie(s) for a specific date.
Parameters:
target_date(str): Date string in YYYY-MM-DD format (UTC)
Returns:
List[str]: List of cookie IDs with maximum visits. Empty list if no cookies found.
Raises:
ValueError: If date format is invalidIOError: If file cannot be read
Example:
cookies = analyzer.find_most_active_cookies("2018-12-09")
print(cookies) # ['AtY0laUfhglK3lC7']main() -> intMain entry point for the command-line tool.
Returns:
int: Exit code (0 for success, 1 for error)
Example:
import sys
from src.most_active_cookie import main
sys.argv = ['prog', '-f', 'cookie_log.csv', '-d', '2018-12-09']
exit_code = main()The application provides comprehensive logging for debugging and audit trails.
- File Logs:
src/most_active_cookie Logs/YYYYMMDD_HHMMSS.txt - Error Logs: stderr (for ERROR and CRITICAL levels)
| Level | Where | When |
|---|---|---|
| DEBUG | File only | Verbose mode (-v flag) - detailed parsing info |
| INFO | File only | Normal operations, results |
| WARNING | File only | Recoverable issues (malformed lines) |
| ERROR | File + stderr | Errors (file not found, invalid date) |
| CRITICAL | File + stderr | Unexpected exceptions |
2025-10-18 13:13:19,007 - __main__ - ERROR - Log file not found: missing.csv
Format: timestamp - logger_name - level - message
2025-10-18 12:47:49,884 - __main__ - INFO - Initialized analyzer with file: cookie_log.csv
2025-10-18 12:47:49,884 - __main__ - DEBUG - cookie_id: AtY0laUfhglK3lC7, parsed_date: 2018-12-09
2025-10-18 12:47:49,884 - __main__ - INFO - Found 1 cookie(s) with 2 occurrence(s) on 2018-12-09
# Clone and setup
git clone https://github.com/shantanu747/Cookie-Analyzer.git
cd Cookie-Analyzer
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
# Run tests
pytest -v
# Check code quality
ruff check .
mypy src/most_active_cookie.py- Follow PEP 8 style guide
- Add type hints to all functions
- Write docstrings for all public APIs
- Maintain test coverage above 85%
- Add tests for new features
- Update documentation
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes
- Run tests and quality checks
- Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
MIT License - See LICENSE file for details.
Shantanu Patil
- GitHub: @shantanu747
- Python community for excellent tooling (pytest, ruff, mypy)
- Contributors and users of this project
Need help? See QUICKSTART.md or open an issue.