This project demonstrates an automated lead cleaning and enrichment pipeline using OpenAI GPT-4, Databricks, and the Salesforce Bulk API.
It extracts raw leads from a CSV file, uses GPT-4 to clean and validate fields, and inserts them into Salesforce in batch.
This is ideal for:
- CRM & Marketing teams managing bulk lead imports
- Consultants automating messy data cleanup
- AI-enhanced enrichment pipelines for customer data
databricks-openai-lead-processing/
βββ notebooks/ # Python notebook with full pipeline
βββ sample_data/ # Sample leads CSV file
βββ architecture_decisions.md # Design justifications
βββ README.md # This file
- Databricks (Python)
- OpenAI GPT-4 (chat.completions API)
- Salesforce API
- Pandas, Requests, Simple-Salesforce
- Upload your
leads_sample.csvto thesample_data/directory. - Create a Databricks Secret Scope named
salesforce:openai_api_keyusername,password,security_tokenfor Salesforce
- Install required Python packages:
%pip install simple-salesforce pandas requests openai
- Loads CSV data into a Pandas DataFrame
- Sends each row to OpenAI for data cleanup
- Transforms GPT output into Salesforce-ready format
- Batches cleaned leads and inserts them via API
- Logs status and verifies results
+--------------------+
| leads_sample.csv |
+----------+---------+
β
+--------+--------+
| Databricks Notebook |
+--------+--------+
β
+-------+--------+
| OpenAI GPT-4 API |
+-------+--------+
β
+-------+--------+
| Cleaned DataFrame |
+-------+--------+
β
+-------+--------+
| Salesforce Bulk API |
+--------------------+
- CRM Data Model Blueprint β View Repo
- Medium Post β Coming Soon
- YouTube Demo β Coming Soon
Built by Emilio Taylor β’ VisaoEnhance