Skip to content

Highly accurate, Fast and Reliable business card reader powered by Google Gemini AI

Notifications You must be signed in to change notification settings

PINAK-WORK/AI-Card-Scanner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 

Repository files navigation

AI Card Scanner

Total Views

A simple, powerful tool that uses Google Gemini AI to read business cards (images) and convert them into clean, usable data (JSON). It works with single images or merged images (front & back) and automatically cleans up the data.

Free google gemini Api key : https://aistudio.google.com/app/apikey

Quick Start

1. Install Dependencies

# Create new directorory
mkdir ai_card_scanner

# Move to your directorory
cd ai_card_scanner

# Install Python. 
sudo apt install python3-pip python3-venv -y

# Create virtual environment
python3 -m venv ai_env

# Install required package
./ai_env/bin/pip install requests

2. Setup Script

Create scanner.py and add your API key:

import base64
import requests
import json
import sys
import os

API_KEY = "YOUR_API_KEY_HERE"  # Replace with your actual key

def encode_image(image_path):
    """Encodes an image file to base64 string"""
    if not os.path.exists(image_path):
        print(f"Error: File not found -> {image_path}")
        return None
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode('utf-8')

def scan_card(image_path):
    url = f"https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent?key={API_KEY}"

    b64_image = encode_image(image_path)
    if not b64_image: return

    print("... Scanning with Smart Logic ...")

    # --- THE ADVANCED PROMPT ---
    # This prompt tells the AI how to interpret visual cues, not just read text.
    smart_prompt = """
    You are an advanced Business Card Intelligence Agent. 
    Analyze the image (which may contain front and back views) and extract contact details into a strict JSON format.

    ### REASONING LOGIC:
    1. **Company Name vs Person Name:** - The Company Name is often the largest text or next to a logo. 
       - The Person's Name is usually near a job title (e.g., CEO, Manager, Engineer).
    2. **Phone Numbers:** - Look for labels like "M:", "Ph:", "T:", "C:", or international codes (+91, +1).
       - Ignore fax numbers unless requested.
    3. **Addresses:** - Merge multi-line addresses into a single line. 
       - Look for city/state/zip codes to confirm it is an address.
    4. **Noise Filtering:** - Ignore text like "Scan me", QR code helper text, or printers' marks.

    ### REQUIRED JSON STRUCTURE:
    {
        "companyName": "string (or null)",
        "personNames": ["string"],
        "designation": "string (Job Title, e.g. Manager)",
        "contactNumbers": ["string (Standardized format)"],
        "emails": ["string"],
        "websites": ["string"],
        "address": "string (Full address in one line)"
    }

    ### FINAL RULES:
    - Return ONLY valid JSON.
    - Do not use Markdown (no ```json).
    - If a field is missing, use null (not empty string).
    """

    payload = {
        "contents": [{
            "parts": [
                {"text": smart_prompt},
                {
                    "inline_data": {
                        "mime_type": "image/jpeg",
                        "data": b64_image
                    }
                }
            ]
        }]
    }

    try:
        response = requests.post(
            url, 
            headers={'Content-Type': 'application/json'},
            json=payload
        )
        
        if response.status_code == 200:
            result = response.json()
            raw_text = result['candidates'][0]['content']['parts'][0]['text']
            clean_text = raw_text.replace("```json", "").replace("```", "").strip()
            
            # Verify JSON is valid before printing
            parsed = json.loads(clean_text)
            print("\n--- RESULT ---")
            print(json.dumps(parsed, indent=4))
        else:
            print(f"Error {response.status_code}: {response.text}")

    except Exception as e:
        print(f"Failed to connect: {e}")

if __name__ == "__main__":
    if len(sys.argv) < 2:
        print("Usage: python3 fast_scan.py card.jpg")
    else:
        scan_card(sys.argv[1])

3. Run

  1. Place your business card image (e.g., card.jpg) in the same folder as the script.
  2. Run this command in your terminal:
./ai_env/bin/python3 scanner.py card.jpg

Example Output

{
    "companyName": "XYZ Technologies",
    "personName": "PINAK TILAVAT",
    "designation": "App Developer",
    "contactNumbers": [""],
    "emails": ["real.pinak@gmail.com"],
    "websites": ["https://pinak.surge.sh/"],
    "address": "In your Hearts (*illegally)"
}

❓ Troubleshooting

Error: "ModuleNotFoundError: No module named requests"
Fix: You probably ran python3 scanner.py instead of ./ai_env/bin/python3 scanner.py. You must use the python inside the ai_env folder! Use ./ai_env/bin/python3 instead of python3

Image not found error?
Fix: Check that your image name is correct. card.jpg is not the same as Card.jpg (Capitals matter!).

🌐 Connect With Me

Instagram YouTube Twitter Portfolio Gmail

App Demo

About

Highly accurate, Fast and Reliable business card reader powered by Google Gemini AI

Topics

Resources

Stars

Watchers

Forks