Skip to content

visaoenhance/databricks-openai-lead-processing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation


Databricks + OpenAI Lead Processing Pipeline

License Last Commit Built By

This project demonstrates an automated lead cleaning and enrichment pipeline using OpenAI GPT-4, Databricks, and the Salesforce Bulk API.

It extracts raw leads from a CSV file, uses GPT-4 to clean and validate fields, and inserts them into Salesforce in batch.


πŸ” Use Case

This is ideal for:

  • CRM & Marketing teams managing bulk lead imports
  • Consultants automating messy data cleanup
  • AI-enhanced enrichment pipelines for customer data

πŸ“ Folder Structure

databricks-openai-lead-processing/
β”œβ”€β”€ notebooks/                  # Python notebook with full pipeline
β”œβ”€β”€ sample_data/               # Sample leads CSV file
β”œβ”€β”€ architecture_decisions.md  # Design justifications
β”œβ”€β”€ README.md                  # This file

🧱 Tech Stack

  • Databricks (Python)
  • OpenAI GPT-4 (chat.completions API)
  • Salesforce API
  • Pandas, Requests, Simple-Salesforce

πŸ”§ Setup

  1. Upload your leads_sample.csv to the sample_data/ directory.
  2. Create a Databricks Secret Scope named salesforce:
    • openai_api_key
    • username, password, security_token for Salesforce
  3. Install required Python packages:
    %pip install simple-salesforce pandas requests openai

▢️ How It Works

  1. Loads CSV data into a Pandas DataFrame
  2. Sends each row to OpenAI for data cleanup
  3. Transforms GPT output into Salesforce-ready format
  4. Batches cleaned leads and inserts them via API
  5. Logs status and verifies results

🧠 Diagram

+--------------------+
|   leads_sample.csv  |
+----------+---------+
           ↓
  +--------+--------+
  | Databricks Notebook |
  +--------+--------+
           ↓
   +-------+--------+
   | OpenAI GPT-4 API |
   +-------+--------+
           ↓
   +-------+--------+
   | Cleaned DataFrame |
   +-------+--------+
           ↓
   +-------+--------+
   | Salesforce Bulk API |
   +--------------------+

πŸ“œ License

MIT


πŸ”— Related Projects

  • CRM Data Model Blueprint β†’ View Repo
  • Medium Post β†’ Coming Soon
  • YouTube Demo β†’ Coming Soon

✍️ Author

Built by Emilio Taylor β€’ VisaoEnhance