MedDRA File Loader

A robust tool for loading MedDRA files into a database with batch processing support.

Features

✅ Efficient batch processing
✅ Robust error handling
✅ File and configuration validation
✅ Meddra multi-language and version support
✅ Modular and extensible architecture
✅ Detailed logging

Project Structure

meddra_loader/
├── __init__.py
├── meddra-cli.py                    # CLI entry point
├── config.py                 # Configuration and environment variables
├── exceptions.py             # Custom exceptions
├── models.py                 # Database models (existing)
├── processors/
│   ├── __init__.py
│   ├── base.py              # Abstract base processor
│   ├── file_processor.py    # File processing logic
│   └── batch_processor.py   # Batch processing logic
├── database/
│   ├── __init__.py
│   ├── connection.py        # Database connection management
│   └── operations.py        # Database operations
├── utils/
│   ├── __init__.py
│   ├── file_utils.py        # File handling utilities
│   └── progress.py          # Progress tracking utilities
└── README.md

Installation

Clone the repository:

git clone <repository-url>
cd meddra-loader

Create a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:
```
pip install -r requirements.txt
```

Set up environment variables:

cp .env.example .env
# Edit .env with your configuration

Configuration

Environment Variables

Create a .env file with the following variables:

DATABASE_URL=postgresql://username:password@localhost:5432/meddra_db

Additional Configuration

The config.py file allows configuration of:

MedDRA Version: Default 28.0
Language: Default 'en'
Batch Size: Default 5000
Encoding: Default 'UTF-8'
Separator: Default '$'

Database DDL

This version requires your database tables to match the structure defined in models.py. Custom table structures are not supported yet.

To create the necessary tables in your database, simply run:

python3 models.py

This will initialize your database with the correct schema for the loader to work properly.

Ensure your database connection is correctly set up in the .env file before running this command.

Usage

Basic Commands

Process a single file

python meddra-cli.py --file-path /path/to/file.asc

Process all files in a directory

python meddra-cli.py --path /path/to/meddra/files

Custom configuration

python meddra-cli.py --path /path/to/files \
    --version 27.1 \
    --language es \
    --batch-size 1000 \
    --verbose

Command Line Options

Option	Type	Default	Description
`--file-path`	string	-	Path to a specific MedDRA file
`--path`	string	-	Directory containing .asc files
`--version`	float	28.0	MedDRA version
`--language`	string	en	Language code
`--batch-size`	int	5000	Batch size for processing
`--verbose`	flag	false	Enable detailed output

Supported File Types

Supported file types are defined in models.py through the generate_meddra_file_mappings() function. Common MedDRA files include:

pt.asc - Preferred Terms
llt.asc - Lowest Level Terms
hlt.asc - High Level Terms
hlgt.asc - High Level Group Terms
soc.asc - System Organ Class
smq_list.asc - Standardised MedDRA Queries
And more...

Usage Examples

Example 1: Basic Processing

Process all MedDRA files in the standard directory:

python meddra-cli.py --path /data/meddra/28.0/MedAscii

Output:

Setup validation passed
Processing files in directory: /data/meddra/28.0/MedAscii
Found 15 files to process

--- Processing /data/meddra/28.0/MedAscii/pt.asc ---
Starting Processing pt file...
  file_path: /data/meddra/28.0/MedAscii/pt.asc
  file_size: 15728640
  total_lines: 75000
Progress: 20% - Batch 1 (5000/75000 records) - Elapsed: 2.3s - ETA: 9.2s
Progress: 40% - Batch 2 (10000/75000 records) - Elapsed: 4.6s - ETA: 6.9s
...
✓ Successfully processed 75000 records

=== Processing Summary ===
Total files processed: 15/15
Total records processed: 850000
All files processed successfully!

Example 2: Custom Configuration

Process files with specific version and language:

python meddra-cli.py --path /data/meddra/27.1/MedAscii \
    --version 27.1 \
    --language es \
    --batch-size 2000 \
    --verbose

Output:

Configuration loaded successfully:
  Database URL: postgresql://user:***@localhost:5432/meddra_db
  Version: 27.1
  Language: es
  Batch size: 2000
Setup validation passed
Processing files in directory: /data/meddra/27.1/MedAscii
...

Example 3: Process Specific File

Process only the Preferred Terms file:

python meddra-cli.py --file-path /data/meddra/28.0/MedAscii/pt.asc

Output:

Processing single file: /data/meddra/28.0/MedAscii/pt.asc
Starting Processing pt file...
  file_path: /data/meddra/28.0/MedAscii/pt.asc
  file_size: 15728640
  total_lines: 75000
Progress: 100% - Batch 15 (75000/75000 records) - Elapsed: 23.5s
✓ Successfully processed 75000 records

Error Handling

The application handles various types of errors gracefully:

Configuration Errors

$ python meddra-cli.py --path /data/meddra
Processing error: DATABASE_URL environment variable not set

File Errors

$ python meddra-meddra-cli.py --file-path /nonexistent/file.asc
Processing error: Error processing file '/nonexistent/file.asc': File not found

Database Errors

$ python meddra-meddra-cli.py --path /data/meddra
Processing error: Database connection test failed

Unsupported File Types

$ python meddra-meddra-cli.py --file-path /data/unknown_file.asc
Error: Unsupported file type 'unknown_file'
Supported types: pt, llt, hlt, hlgt, soc, smq_list, mdhier, intl_ord

Progress Tracking

The application provides detailed progress information:

Real-time progress: Percentage completion and current batch
Performance metrics: Records processed per second
Time estimates: Elapsed time and estimated time remaining
Memory usage: Current memory consumption
Batch information: Current batch number and size

Database Schema

The application expects your database models to have the following standard fields:

created_at: Timestamp when record was created
updated_at: Timestamp when record was last updated
language: Language code (e.g., 'en', 'es', 'fr')
version: MedDRA version (e.g., 28.0, 27.1)

Development

Adding New Processors

Create a new class inheriting from BaseProcessor:

from processors.base import BaseProcessor, ProcessorResult

class CustomProcessor(BaseProcessor):
    def process(self, *args, **kwargs) -> ProcessorResult:
        # Your processing logic here
        return ProcessorResult(success=True, records_processed=count)

Register the processor in the appropriate factory or configuration.

Adding New Utilities

Create functions in the appropriate utils module:

def new_utility_function(param):
    """Description of what this function does."""
    # Implementation
    return result

Add to the module's __all__ list for proper imports.

Troubleshooting

Common Issues

Database Connection Issues
- Verify DATABASE_URL is correct
- Check database server is running
- Ensure user has proper permissions
File Encoding Problems
- MedDRA files typically use 'UTF-8' encoding
- Try different encodings if processing fails
Memory Issues with Large Files
- Reduce batch size using --batch-size
- Monitor system memory usage
Permission Errors
- Ensure read permissions on input files
- Check write permissions for log files

Debug Mode

Enable verbose logging for troubleshooting:

python meddra-cli.py --path /data/meddra --verbose

Performance Optimization

Batch Size Tuning

Small files (< 10MB): Use batch size 1000-2000
Medium files (10-100MB): Use batch size 5000 (default)
Large files (> 100MB): Use batch size 10000-20000

Changelog

Version 1.0.0

Initial release
Basic file processing functionality
Batch processing support
Progress tracking
Error handling
CLI interface

🛡️ License

This project is licensed under the MIT License.

📚 About MedDRA®

This software requires structured data from the MedDRA® dictionary, which is the property of the MSSO (Maintenance and Support Services Organization) and the ICH.

⚠️ This repository does not distribute or include any MedDRA® files.

To use meddra-cli, you must have a valid MedDRA license and download the official files from: https://www.meddra.org

This software is not affiliated with or endorsed by the MSSO or ICH.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.vscode		.vscode
core		core
database		database
utils		utils
.env		.env
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.py		config.py
exceptions.py		exceptions.py
meddra-cli.py		meddra-cli.py
models.py		models.py
requirements.txt		requirements.txt
test.ipynb		test.ipynb

License

meddyg/meddra-cli

Folders and files

Latest commit

History

Repository files navigation

MedDRA File Loader

Features

Project Structure

Installation

Configuration

Environment Variables

Additional Configuration

Database DDL

Usage

Basic Commands

Process a single file

Process all files in a directory

Custom configuration

Command Line Options

Supported File Types

Usage Examples

Example 1: Basic Processing

Example 2: Custom Configuration

Example 3: Process Specific File

Error Handling

Configuration Errors

File Errors

Database Errors

Unsupported File Types

Progress Tracking

Database Schema

Development

Adding New Processors

Adding New Utilities

Troubleshooting

Common Issues

Debug Mode

Performance Optimization

Batch Size Tuning

Changelog

Version 1.0.0

🛡️ License

📚 About MedDRA®

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages