| title | SPL Toolkit |
|---|---|
| layout | page |
| description | A robust, language-aware library for programmatic analysis and manipulation of Splunk SPL queries |
A robust, language-aware library for programmatic analysis and manipulation of Splunk SPL queries, written in Go with Python bindings.
- Field Mapping: Dynamic mapping of query fields from one schema to another using JSON configuration
- SPL Parsing: Robust SPL query parsing using ANTLR4 grammar with AST-based processing
- Discovery: Extract datamodels, datasets, lookups, sources, sourcetypes, and input fields from queries
- Token Stream Rewriting: Context-aware field mapping that preserves SPL syntax and semantics
- Python Bindings: Full Python API with C shared library integration
- Conditional Mapping: Basic rule-based field mappings with conditions
- Advanced Conditional Rules: Enhanced rule-based field mappings with complex conditions
- DataModel Mapping: Map between different datamodel structures (basic support available)
- Query Translation: Convert between raw searches and datamodel/tstats queries
- Index ↔ DataModel: Translate queries between index-based and datamodel-based approaches
- Auto-mapping: Generate mapping tables from two log representations of the same data
- Template-based: Auto-generate mappings from Splunk event templates
package main
import (
"fmt"
"github.com/delgado-jacob/spl-toolkit/pkg/mapper"
)
func main() {
// Create a new mapper
m := mapper.New()
// Load basic field mappings
mappingsJSON := `[
{"source": "src_ip", "target": "source_ip"},
{"source": "dst_ip", "target": "destination_ip"}
]`
m.LoadMappings([]byte(mappingsJSON))
// Map a query
query := "search src_ip=192.168.1.1 dst_port=80"
mappedQuery, err := m.MapQuery(query)
if err != nil {
panic(err)
}
fmt.Printf("Original: %s\n", query)
fmt.Printf("Mapped: %s\n", mappedQuery)
// Discover query information
info, err := m.DiscoverQuery(query)
if err != nil {
panic(err)
}
fmt.Printf("Input fields: %v\n", info.InputFields)
}from spl_toolkit import SPLMapper
# Create mapper with configuration
config = {
"version": "1.0",
"mappings": [
{"source": "src_ip", "target": "source_ip"},
{"source": "dst_ip", "target": "destination_ip"}
],
"rules": [
{
"id": "apache_logs",
"conditions": [
{"type": "sourcetype", "operator": "equals", "value": "access_combined"}
],
"mappings": [
{"source": "clientip", "target": "source_address"}
],
"enabled": True
}
]
}
mapper = SPLMapper(config=config)
# Map a query with context
query = "search sourcetype=access_combined clientip=192.168.1.1"
context = {"sourcetype": "access_combined"}
mapped = mapper.map_query_with_context(query, context)
print(f"Mapped: {mapped}")
# Discover query information
info = mapper.discover_query(query)
print(f"Source types: {info.source_types}")
print(f"Input fields: {info.input_fields}")# Clone repository
git clone https://github.com/delgado-jacob/spl-toolkit.git
cd spl-toolkit
# Setup development environment
make dev-setup
# Build Go library
make build
# Build Python bindings
make python-build
# Install Python package locally
make python-install
# Run tests to verify installation
make dev-testgo get github.com/delgado-jacob/spl-toolkit# Build Docker image
make docker-build
# Run with Docker
docker run --rm -v $(PWD):/workspace -w /workspace spl-toolkit:0.1.1 --help- Go 1.22+
- Python 3.8+ (for Python bindings)
- Make
- Git
{
"version": "0.1.1",
"name": "Basic Field Mappings",
"mappings": [
{"source": "src_ip", "target": "source_ip"},
{"source": "dst_ip", "target": "destination_ip"},
{"source": "src_port", "target": "source_port"}
]
}{
"version": "0.1.1",
"name": "Web Server Logs",
"mappings": [
{"source": "ip", "target": "client_ip"}
],
"rules": [
{
"id": "apache_combined",
"name": "Apache Combined Log Format",
"conditions": [
{
"type": "sourcetype",
"operator": "equals",
"value": "access_combined"
}
],
"mappings": [
{"source": "clientip", "target": "source_address"},
{"source": "status", "target": "http_status_code"},
{"source": "bytes", "target": "response_size"}
],
"priority": 1,
"enabled": true
},
{
"id": "nginx_access",
"name": "Nginx Access Logs",
"conditions": [
{
"type": "combination",
"operator": "and",
"children": [
{"type": "sourcetype", "operator": "equals", "value": "nginx_access"},
{"type": "field_exists", "field": "remote_addr", "operator": "exists"}
]
}
],
"mappings": [
{"source": "remote_addr", "target": "source_address"},
{"source": "request_status", "target": "http_status_code"}
],
"priority": 2,
"enabled": true
}
]
}The library can automatically discover and extract:
- DataModels:
| datamodel Network_Traffic→["Network_Traffic"] - Lookups:
| inputlookup ip_geo.csv→["ip_geo"] - Macros:
\get_indexes(security)`→["get_indexes"]` - Sources:
source="/var/log/apache2/access.log"→["/var/log/apache2/access.log"] - Sourcetypes:
sourcetype=access_combined→["access_combined"] - Input Fields: All field references required for the query to function
- Go 1.22+ (current project uses Go 1.22)
- Python 3.8+
- Make
- Git
# Clone and setup
git clone https://github.com/delgado-jacob/spl-toolkit.git
cd spl-toolkit
# Install dependencies
make dev-setup
# Run tests
make dev-test
# Build everything
make build-all# Go tests
make test
# Python tests
make python-test
# All tests
make dev-test
# With coverage
make test-coverage# Format code
make fmt
# Lint code
make lint
# Security scan
make securityThe SPL Toolkit uses a Grammar-First Architecture built on ANTLR4 for robust SPL parsing and analysis:
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Python API │ │ Go Library │ │ ANTLR4 Parser │
│ │ │ │ │ │
│ ┌───────────┐ │ │ ┌─────────────┐ │ │ ┌───────────┐ │
│ │ Mapper │ │◄──►│ │ Mapper │ │◄──►│ │ SPL Grammar │ │
│ └───────────┘ │ │ └─────────────┘ │ │ └───────────┘ │
│ │ │ │ │ │
│ ┌───────────┐ │ │ ┌─────────────┐ │ │ ┌───────────┐ │
│ │QueryInfo │ │ │ │ Discovery │ │ │ │ AST Tree │
│ └───────────┘ │ │ └─────────────┘ │ │ └───────────┘ │
└─────────────────┘ └──────────────────┘ └─────────────────┘
│ │ │
▼ ▼ ▼
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ C Bindings │ │Token Stream │ │ AST Listeners │
│ (cgo) │ │Rewriting Engine │ │ & Visitors │
└─────────────────┘ └──────────────────┘ └─────────────────┘
- ANTLR4 Grammar: Complete SPL language definition for accurate parsing
- AST-Based Processing: Uses listener patterns for robust language-aware analysis
- Token Stream Rewriting: Preserves query structure while applying field mappings
- Context-Aware Discovery: Distinguishes input fields from derived fields with hierarchical context tracking
- Python/Go Interop: C shared library bindings for cross-language functionality
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature - Make your changes and add tests
- Run tests:
make dev-test - Commit your changes:
git commit -am 'Add amazing feature' - Push to the branch:
git push origin feature/amazing-feature - Open a Pull Request
type(scope): subject
body
footer
Types: feat, fix, docs, style, refactor, test, chore
- Phase 1: Basic field mapping and discovery ✅
- Phase 2: Conditional rules and datamodel mapping 🚧 (Partially Complete)
- Phase 3: Query translation (raw ↔ datamodel/tstats)
- Phase 4: Auto-mapping from dual log representations
- Phase 5: Template-based auto-mapping
Current implementation benchmarks (Go 1.22 on modern hardware):
- Parse Query: ~100μs for typical queries using ANTLR4
- Apply Mappings: ~50μs for 100 field mappings with token stream rewriting
- Discovery: ~200μs for complex queries with full AST traversal
- Memory Usage: ~2MB base + ~10KB per mapping rule
- Test Coverage: 64.1% with comprehensive test suite
This project is licensed under the MIT License - see the LICENSE file for details.
- ANTLR4 for the parsing framework
- Clemens Sageder for the original SPL grammar
- The Splunk community for inspiration and requirements
Note: This is a defensive security tool designed for legitimate SPL query analysis and manipulation. It should not be used for malicious purposes.