A powerful Python tool for analyzing the structure of JSON Lines (JSONL) files, particularly useful for understanding log files, streaming data, and structured datasets.
- Line-by-Line Parsing: Efficiently processes large JSONL files without loading everything into memory
- Field Frequency Analysis: Tracks occurrence of fields across all JSON objects
- Temporal Analysis: Extracts and analyzes timestamp data, calculates durations
- Type Inference: Automatically detects and tracks data types for each field
- Nested Structure Support: Analyzes complex nested JSON structures
- Error Detection: Identifies and reports malformed JSON lines with line numbers
- Value Distribution: Tracks common values for key fields
# Clone the repository
git clone https://github.com/ai-psa/jsonl-analyzer.git
cd jsonl-analyzer
# No dependencies required - uses Python standard library only!
# Works with Python 3.6+python analyze_jsonl.py <jsonl_file>
# Example
python analyze_jsonl.py session_data.jsonl============================================================
JSONL FILE STRUCTURE ANALYSIS
============================================================
📊 SUMMARY
----------------------------------------
Total Lines: 71
Valid Json Lines: 71
Error Lines: 0
Unique Fields: 93
📨 MESSAGE TYPES
----------------------------------------
user: 35
assistant: 35
system: 1
🔍 MOST COMMON FIELDS (Top 10)
----------------------------------------
parentUuid: 71 occurrences (str, NoneType)
sessionId: 71 occurrences (str)
message.role: 71 occurrences (str)
message.content: 71 occurrences (str, list)
createdAt: 71 occurrences (str)
updatedAt: 71 occurrences (str)
⏱️ TEMPORAL ANALYSIS
----------------------------------------
First event: 2025-07-10T14:47:33.326000+00:00
Last event: 2025-07-10T15:00:44.015000+00:00
Duration: 13m 10s
Total events: 71
- Log Analysis: Analyze application logs, server logs, or any structured log files
- Data Pipeline Debugging: Understand the structure of data flowing through your pipelines
- ML Dataset Inspection: Quickly inspect machine learning datasets in JSONL format
- API Response Analysis: Analyze collected API responses
- Session Analysis: Perfect for analyzing conversation logs or session data
The analyzer tracks all data types encountered for each field:
message.content: (str, list) # Field contains both strings and lists
metadata.tokens: (int) # Field only contains integers
Automatically discovers and reports nested object structures:
response.data: ['id', 'name', 'tags', 'metadata']
response.data.metadata: ['created', 'updated', 'version']
For key fields like type, role, status, the analyzer tracks value distributions:
type:
- user: 35
- assistant: 35
- system: 1
The analyzer creates two outputs:
- Console Report: Human-readable analysis printed to console
- JSON Report: Detailed analysis saved as
<filename>_analysis.json
{"type": "user", "message": {"role": "user", "content": "Hello"}, "timestamp": "2025-07-10T12:00:00Z"}
{"type": "assistant", "message": {"role": "assistant", "content": "Hi there!"}, "timestamp": "2025-07-10T12:00:01Z"}
{"type": "system", "event": "session_end", "timestamp": "2025-07-10T12:00:05Z"}Contributions are welcome! Please feel free to submit a Pull Request.
MIT License - see LICENSE file for details
Created by AI-PSA - Part of the AI-PSA open source tools collection
For more tools and experiments, visit ai-psa.com