Skip to content

victoropp/enterprise-ner-intelligence

Repository files navigation

🧠 Enterprise NER Intelligence Platform

Python TensorFlow HuggingFace Streamlit License

A production-ready Named Entity Recognition (NER) system powered by BERT Transformers and Bi-LSTM-CRF architectures.

This project demonstrates an end-to-end MLOps pipeline for extracting structured insights from unstructured text, tailored for high-impact business use cases in Finance, Legal, and Healthcare.


πŸš€ Key Features

  • Dual Architecture Support:
    • Transformer (BERT): State-of-the-art accuracy, handling out-of-vocabulary (OOV) words and deep context.
    • Bi-LSTM-CRF: Efficient, custom-implemented sequence labeling for resource-constrained environments.
  • Interactive Dashboard: A professional Streamlit app for real-time inference and visualization.
  • Industry Solutions: Pre-configured modules for:
    • πŸ’° Finance: Extracting tickers, companies, and executives from news.
    • βš–οΈ Legal: Identifying parties and jurisdictions in contracts.
    • πŸ₯ Healthcare: De-identifying patient records (HIPAA compliance).
    • πŸ‘₯ HR: Extracting skills and qualifications from resumes.
  • Confidence Scoring: Probabilistic outputs for risk-adjusted decision making.

πŸ› οΈ Technical Architecture

graph TD
    A[Unstructured Text] --> B{Model Selector}

    subgraph "Deep Learning Path"
    B -->|Bi-LSTM-CRF| C[Word Embeddings]
    C --> D[Bi-Directional LSTM]
    D --> E[CRF Layer]
    E --> F[Sequence Tags]
    end

    subgraph "Transformer Path"
    B -->|BERT| G[Tokenizer]
    G --> H[BERT-Base]
    H --> I[Token Classification Head]
    I --> F
    end

    F --> J[Streamlit Dashboard]
    J --> K[Visualizations & Analytics]
Loading

πŸ’» Installation

  1. Clone the repository

    git clone https://github.com/victoropp/enterprise-ner-intelligence.git
    cd enterprise-ner-intelligence
  2. Download Model Files

    The trained model files are tracked with Git LFS. After cloning, ensure Git LFS is installed:

    git lfs install
    git lfs pull
  3. Install Dependencies

    pip install -r requirements.txt
  4. Run the Application

    streamlit run deployment/app.py

πŸ“Š Model Performance

Model Precision Recall F1-Score
BERT-Base 0.91 0.93 0.92
Bi-LSTM-CRF 0.67 0.64 0.65

Note: BERT significantly outperforms the traditional Bi-LSTM approach, especially on unseen entities, demonstrating the power of Transfer Learning.

πŸ“‚ Project Structure

β”œβ”€β”€ data/               # CoNLL-2003 Dataset
β”œβ”€β”€ deployment/         # Streamlit Application
β”‚   └── app.py          # Main Dashboard
β”œβ”€β”€ models/             # Saved Models & Checkpoints
β”‚   β”œβ”€β”€ ner_model.h5    # Trained Bi-LSTM-CRF model
β”‚   β”œβ”€β”€ word2idx.pkl    # Vocabulary mappings
β”‚   └── tag2idx.pkl     # Tag mappings
β”œβ”€β”€ src/                # Source Code
β”‚   β”œβ”€β”€ train_bert.py   # BERT Fine-tuning Script
β”‚   β”œβ”€β”€ train.py        # Bi-LSTM Training Script
β”‚   β”œβ”€β”€ model.py        # Bi-LSTM Architecture
β”‚   β”œβ”€β”€ crf.py          # Custom CRF Layer (TensorFlow)
β”‚   β”œβ”€β”€ data_loader.py  # Data preprocessing utilities
β”‚   └── evaluate.py     # Evaluation Metrics
β”œβ”€β”€ notebooks/          # Jupyter notebooks for exploration
β”œβ”€β”€ tests/              # Unit tests
└── requirements.txt    # Dependencies

πŸ’Ό Business Use Cases

1. Financial Intelligence

Automatically scan thousands of earnings call transcripts to extract:

  • Organizations: Competitors, partners, subsidiaries.
  • Persons: Key executives, analysts.
  • Locations: Emerging markets, factory locations.

2. Legal Compliance

Automate contract review by extracting:

  • Parties: "Alpha Corp" vs "Beta Ltd".
  • Jurisdictions: "State of Delaware", "London".

3. Healthcare Data Processing

De-identify medical records by detecting and masking:

  • Patient Names: HIPAA compliance
  • Locations: Hospital names, addresses
  • Organizations: Healthcare providers

4. HR & Recruitment

Extract structured information from resumes:

  • Skills: Programming languages, certifications
  • Organizations: Previous employers
  • Locations: Work locations, willingness to relocate

πŸš€ Deployment

Streamlit Cloud

This application is ready to deploy on Streamlit Cloud:

  1. Fork this repository to your GitHub account
  2. Go to share.streamlit.io
  3. Click "New app"
  4. Select your repository, branch, and deployment/app.py
  5. Click "Deploy"

See deployment/README.md for detailed deployment instructions.

πŸ§ͺ Training Your Own Models

Train Bi-LSTM-CRF Model

python src/train.py --epochs 50 --batch-size 32

Fine-tune BERT Model

python src/train_bert.py --model bert-base-cased --epochs 3

πŸ“ˆ Evaluation

Evaluate model performance on the CoNLL-2003 test set:

python src/evaluate.py --model models/ner_model.h5

🀝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ‘¨β€πŸ’» Author

Victor Collins Oppon NLP Engineer | Data Scientist | FCCA, MBA, BSc

LinkedIn GitHub

Portfolio website coming soon!


πŸ™ Acknowledgments

  • CoNLL-2003 Dataset: Erik F. Tjong Kim Sang and Fien De Meulder
  • HuggingFace Transformers: For the excellent BERT implementation
  • Streamlit: For the intuitive web framework

Built with ❀️ for the NLP Community

About

Enterprise Named Entity Recognition | BERT + Bi-LSTM-CRF | 92% F1 Score | Streamlit Dashboard

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages