Skip to content

ai-psa/jsonl-analyzer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

JSONL Analyzer

A powerful Python tool for analyzing the structure of JSON Lines (JSONL) files, particularly useful for understanding log files, streaming data, and structured datasets.

Features

  • Line-by-Line Parsing: Efficiently processes large JSONL files without loading everything into memory
  • Field Frequency Analysis: Tracks occurrence of fields across all JSON objects
  • Temporal Analysis: Extracts and analyzes timestamp data, calculates durations
  • Type Inference: Automatically detects and tracks data types for each field
  • Nested Structure Support: Analyzes complex nested JSON structures
  • Error Detection: Identifies and reports malformed JSON lines with line numbers
  • Value Distribution: Tracks common values for key fields

Installation

# Clone the repository
git clone https://github.com/ai-psa/jsonl-analyzer.git
cd jsonl-analyzer

# No dependencies required - uses Python standard library only!
# Works with Python 3.6+

Usage

Command Line

python analyze_jsonl.py <jsonl_file>

# Example
python analyze_jsonl.py session_data.jsonl

Sample Output

============================================================
JSONL FILE STRUCTURE ANALYSIS
============================================================

📊 SUMMARY
----------------------------------------
Total Lines: 71
Valid Json Lines: 71
Error Lines: 0
Unique Fields: 93

📨 MESSAGE TYPES
----------------------------------------
user: 35
assistant: 35
system: 1

🔍 MOST COMMON FIELDS (Top 10)
----------------------------------------
parentUuid: 71 occurrences (str, NoneType)
sessionId: 71 occurrences (str)
message.role: 71 occurrences (str)
message.content: 71 occurrences (str, list)
createdAt: 71 occurrences (str)
updatedAt: 71 occurrences (str)

⏱️  TEMPORAL ANALYSIS
----------------------------------------
First event: 2025-07-10T14:47:33.326000+00:00
Last event: 2025-07-10T15:00:44.015000+00:00
Duration: 13m 10s
Total events: 71

Use Cases

  • Log Analysis: Analyze application logs, server logs, or any structured log files
  • Data Pipeline Debugging: Understand the structure of data flowing through your pipelines
  • ML Dataset Inspection: Quickly inspect machine learning datasets in JSONL format
  • API Response Analysis: Analyze collected API responses
  • Session Analysis: Perfect for analyzing conversation logs or session data

Advanced Features

Field Type Tracking

The analyzer tracks all data types encountered for each field:

message.content: (str, list)  # Field contains both strings and lists
metadata.tokens: (int)        # Field only contains integers

Nested Structure Analysis

Automatically discovers and reports nested object structures:

response.data: ['id', 'name', 'tags', 'metadata']
response.data.metadata: ['created', 'updated', 'version']

Common Value Distribution

For key fields like type, role, status, the analyzer tracks value distributions:

type:
  - user: 35
  - assistant: 35
  - system: 1

Output Files

The analyzer creates two outputs:

  1. Console Report: Human-readable analysis printed to console
  2. JSON Report: Detailed analysis saved as <filename>_analysis.json

Example JSONL Input

{"type": "user", "message": {"role": "user", "content": "Hello"}, "timestamp": "2025-07-10T12:00:00Z"}
{"type": "assistant", "message": {"role": "assistant", "content": "Hi there!"}, "timestamp": "2025-07-10T12:00:01Z"}
{"type": "system", "event": "session_end", "timestamp": "2025-07-10T12:00:05Z"}

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

MIT License - see LICENSE file for details

Author

Created by AI-PSA - Part of the AI-PSA open source tools collection


For more tools and experiments, visit ai-psa.com

About

Advanced JSONL file structure analyzer with comprehensive parsing capabilities

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages