A Python tool for downloading and processing content from Telegram channels. This script automates the retrieval of files from channels, handling of archives, extraction of text content, and processing of stealer logs.
- Parallel downloading from multiple Telegram channels
- Archive extraction with password support
- Deduplication of files across channels and within extracted content
- Text file processing and combination
- Stealer log processing
- Memory-efficient processing of large files
- Python 3.6+
- Dependencies listed in requirements.txt
The script requires the following external tools to be installed and available in your PATH:
tdl- Telegram Downloader CLI tool7z- 7-Zip archive extraction toolrdfind- Tool for finding duplicate filessort- GNU sort utility (included in most Linux distributions)
- Clone this repository
- Install the required Python dependencies:
pip install -r requirements.txt - Install the required external tools according to your operating system
- Configure your settings.json file (see Configuration section)
python telegram_processor.py --input <input_file.csv> --start <DD-MM-YYYY> --end <DD-MM-YYYY> [options]
--input: Path to CSV file with channel configurations--start: Start date in DD-MM-YYYY format--end: End date in DD-MM-YYYY format
--output-dir: Output directory for processed files (default: ./output)--download-dir: Directory for downloaded files (default: ./downloads)--settings: Path to settings JSON file (default: ./settings.json)--verbose: Show detailed output including tdl commands--process-only: Skip download phase and only process existing files--auto-clean: Automatically clean up after processing without prompting
The input CSV file must contain the following columns:
name: Display name for the channelchannel: Channel identifier (ID or username)password: (Optional) Password for archive extraction
Example:
name,channel,password
Channel1,@channel1,password123
Channel2,@channel2,
The settings.json file controls various aspects of the script's behavior:
{
"stealer_log_processor": {
"path": "/path/to/stealer-log-processor/main.py"
},
"tdl": {
"max_parallel_downloads": 1,
"reconnect_timeout": 0,
"export_channel_threads": 4,
"bandwidth_limit": 0,
"chunk_size": 128,
"excluded_extensions": [
"jpg", "gif", "png", "webp", "webm", "mp4"
],
"included_extensions": [
"zip", "rar", "7z", "txt", "csv"
]
},
"sort": {
"memory_percent": 30,
"max_parallel": 16,
"temp_dir": "/tmp"
},
"archive": {
"extract_patterns": [
"*.txt",
"*.csv",
"*pass*",
"*auto*"
],
"extract_timeout": 3600
}
}path: Path to the stealer log processor script
max_parallel_downloads: Maximum number of parallel downloadsreconnect_timeout: Timeout for reconnecting to Telegramexport_channel_threads: Thread count for export operationsbandwidth_limit: Limit bandwidth usage in KiB/s (0 means unlimited)excluded_extensions: File extensions to skip when downloading (e.g. jpg, gif, png, webp, webm, mp4)
memory_percent: Percentage of memory to use for sortingmax_parallel: Maximum parallel sort threadstemp_dir: Temporary directory for sort operations
extract_patterns: Patterns for files to extract from archivesextract_timeout: Timeout (seconds) for extraction operations
- Download Phase: The script downloads files from the specified Telegram channels sequentially
- Deduplication: Files are deduplicated across all channels
- Extraction: Archive files are extracted, respecting provided passwords
- Post-Extraction Deduplication: Files are deduplicated after extraction
- Processing:
- Text files are combined, sorted, and deduplicated
- Stealer logs are processed if archives are present
- Output: Processed files are moved to the output directory
- Cleanup: Temporary directories are removed (with user confirmation or auto-clean)
The script generates the following types of output files:
{channel_name}-{month-year}-combo.csv: Combined and deduplicated text content (typically UPL format){channel_name}-{month-year}-credentials.csv: Processed credentials (from stealer logs){channel_name}-{month-year}-autofills.csv: Processed autofill data (from stealer logs)
All output files are stored in the specified output directory.