🧠 Enterprise NER Intelligence Platform

A production-ready Named Entity Recognition (NER) system powered by BERT Transformers and Bi-LSTM-CRF architectures.

This project demonstrates an end-to-end MLOps pipeline for extracting structured insights from unstructured text, tailored for high-impact business use cases in Finance, Legal, and Healthcare.

🚀 Key Features

Dual Architecture Support:
- Transformer (BERT): State-of-the-art accuracy, handling out-of-vocabulary (OOV) words and deep context.
- Bi-LSTM-CRF: Efficient, custom-implemented sequence labeling for resource-constrained environments.
Interactive Dashboard: A professional Streamlit app for real-time inference and visualization.
Industry Solutions: Pre-configured modules for:
- 💰 Finance: Extracting tickers, companies, and executives from news.
- ⚖️ Legal: Identifying parties and jurisdictions in contracts.
- 🏥 Healthcare: De-identifying patient records (HIPAA compliance).
- 👥 HR: Extracting skills and qualifications from resumes.
Confidence Scoring: Probabilistic outputs for risk-adjusted decision making.

🛠️ Technical Architecture

graph TD
    A[Unstructured Text] --> B{Model Selector}

    subgraph "Deep Learning Path"
    B -->|Bi-LSTM-CRF| C[Word Embeddings]
    C --> D[Bi-Directional LSTM]
    D --> E[CRF Layer]
    E --> F[Sequence Tags]
    end

    subgraph "Transformer Path"
    B -->|BERT| G[Tokenizer]
    G --> H[BERT-Base]
    H --> I[Token Classification Head]
    I --> F
    end

    F --> J[Streamlit Dashboard]
    J --> K[Visualizations & Analytics]

💻 Installation

Clone the repository

git clone https://github.com/victoropp/enterprise-ner-intelligence.git
cd enterprise-ner-intelligence

Download Model Files

The trained model files are tracked with Git LFS. After cloning, ensure Git LFS is installed:
```
git lfs install
git lfs pull
```
Install Dependencies
```
pip install -r requirements.txt
```
Run the Application
```
streamlit run deployment/app.py
```

📊 Model Performance

Model	Precision	Recall	F1-Score
BERT-Base	0.91	0.93	0.92
Bi-LSTM-CRF	0.67	0.64	0.65

Note: BERT significantly outperforms the traditional Bi-LSTM approach, especially on unseen entities, demonstrating the power of Transfer Learning.

📂 Project Structure

├── data/               # CoNLL-2003 Dataset
├── deployment/         # Streamlit Application
│   └── app.py          # Main Dashboard
├── models/             # Saved Models & Checkpoints
│   ├── ner_model.h5    # Trained Bi-LSTM-CRF model
│   ├── word2idx.pkl    # Vocabulary mappings
│   └── tag2idx.pkl     # Tag mappings
├── src/                # Source Code
│   ├── train_bert.py   # BERT Fine-tuning Script
│   ├── train.py        # Bi-LSTM Training Script
│   ├── model.py        # Bi-LSTM Architecture
│   ├── crf.py          # Custom CRF Layer (TensorFlow)
│   ├── data_loader.py  # Data preprocessing utilities
│   └── evaluate.py     # Evaluation Metrics
├── notebooks/          # Jupyter notebooks for exploration
├── tests/              # Unit tests
└── requirements.txt    # Dependencies

💼 Business Use Cases

1. Financial Intelligence

Automatically scan thousands of earnings call transcripts to extract:

Organizations: Competitors, partners, subsidiaries.
Persons: Key executives, analysts.
Locations: Emerging markets, factory locations.

2. Legal Compliance

Automate contract review by extracting:

Parties: "Alpha Corp" vs "Beta Ltd".
Jurisdictions: "State of Delaware", "London".

3. Healthcare Data Processing

De-identify medical records by detecting and masking:

Patient Names: HIPAA compliance
Locations: Hospital names, addresses
Organizations: Healthcare providers

4. HR & Recruitment

Extract structured information from resumes:

Skills: Programming languages, certifications
Organizations: Previous employers
Locations: Work locations, willingness to relocate

🚀 Deployment

Streamlit Cloud

This application is ready to deploy on Streamlit Cloud:

Fork this repository to your GitHub account
Go to share.streamlit.io
Click "New app"
Select your repository, branch, and deployment/app.py
Click "Deploy"

See deployment/README.md for detailed deployment instructions.

🧪 Training Your Own Models

Train Bi-LSTM-CRF Model

python src/train.py --epochs 50 --batch-size 32

Fine-tune BERT Model

python src/train_bert.py --model bert-base-cased --epochs 3

📈 Evaluation

Evaluate model performance on the CoNLL-2003 test set:

python src/evaluate.py --model models/ner_model.h5

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

👨‍💻 Author

Victor Collins Oppon NLP Engineer | Data Scientist | FCCA, MBA, BSc

Portfolio website coming soon!

🙏 Acknowledgments

CoNLL-2003 Dataset: Erik F. Tjong Kim Sang and Fien De Meulder
HuggingFace Transformers: For the excellent BERT implementation
Streamlit: For the intuitive web framework

Built with ❤️ for the NLP Community

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🧠 Enterprise NER Intelligence Platform

🚀 Key Features

🛠️ Technical Architecture

💻 Installation

📊 Model Performance

📂 Project Structure

💼 Business Use Cases

1. Financial Intelligence

2. Legal Compliance

3. Healthcare Data Processing

4. HR & Recruitment

🚀 Deployment

Streamlit Cloud

🧪 Training Your Own Models

Train Bi-LSTM-CRF Model

Fine-tune BERT Model

📈 Evaluation

🤝 Contributing

📄 License

👨‍💻 Author

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data		data
deployment		deployment
models		models
notebooks		notebooks
social_media		social_media
src		src
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.python-version		.python-version
Home.py		Home.py
LICENSE		LICENSE
LINKEDIN_POST.md		LINKEDIN_POST.md
README.md		README.md
generate_social_media_graphics.py		generate_social_media_graphics.py
requirements.txt		requirements.txt

License

victoropp/enterprise-ner-intelligence

Folders and files

Latest commit

History

Repository files navigation

🧠 Enterprise NER Intelligence Platform

🚀 Key Features

🛠️ Technical Architecture

💻 Installation

📊 Model Performance

📂 Project Structure

💼 Business Use Cases

1. Financial Intelligence

2. Legal Compliance

3. Healthcare Data Processing

4. HR & Recruitment

🚀 Deployment

Streamlit Cloud

🧪 Training Your Own Models

Train Bi-LSTM-CRF Model

Fine-tune BERT Model

📈 Evaluation

🤝 Contributing

📄 License

👨‍💻 Author

🙏 Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages