DeepStack-AILM-Assignment/validate_user.py at main · jadhav045/DeepStack-AILM-Assignment · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
import os
import json
import sys
from openai import OpenAI
from dotenv import load_dotenv

# 1. Load Environment Variables
load_dotenv()

# 2. Initialize the Client
client = OpenAI(
    api_key=os.getenv("LLM_API_KEY"),
    base_url=os.getenv("LLM_BASE_URL")
)

MODEL_NAME = os.getenv("LLM_MODEL", "gpt-3.5-turbo")

def validate_user_profile(user_data):
    """
    Validates user profile using an LLM.
    Enforces strict JSON output schema.
    """

    # SYSTEM PROMPT
    # Updated to explicitly check phone number LENGTH
    system_prompt = """
   You are a strict Data Validation Assistant.

Your task is to validate a user profile JSON against real-world data standards using reasoning only.
You must NOT use regex, validation libraries, or hardcoded lookup tables.
The LLM itself is the only validator.

---

### CRITICAL INSTRUCTIONS

- Output ONLY valid JSON.
- Do NOT explain why any field is valid.
- Report ONLY rule violations.
- Do NOT infer, guess, or fabricate missing data.
- Apply validation rules ONLY to fields that are present in the input.
- Do NOT invent new fields or rules.
- If multiple rules are violated, report ALL of them.
- All messages must be grounded strictly in the provided input values.

---

### VALIDATION RULES

#### 1. ERRORS (Invalid Data — Must Be Fixed)

Add a message to the "errors" list ONLY if a present field clearly violates a real-world standard.

- **name**
  Must be a non-empty string.

- **email**
  Must follow a valid, real-world email address format.

- **age**
  Must be a valid, non-negative number.

- **country**
  Must follow the ISO-3166-1 alpha-2 country code standard.

- **phone**
  Must follow the E.164 international phone number standard, including:
  - Proper use of the '+' prefix
  - A valid country calling code
  - A plausible total length for that country code

If a phone number is too short, too long, malformed, or not plausible under E.164, it is an ERROR.

---

#### 2. WARNINGS (Valid but Risky Data)

Add a message to the "warnings" list ONLY if the data is valid but potentially risky.

- **name**
  Very short names may be risky.

- **age**
  Values indicating a minor may be risky.

- **email**
  Disposable or temporary email domains may be risky.

- **phone**
  A phone number that is valid E.164 but whose country calling code does not align with the provided country field may be risky.

---

### RESPONSE FORMAT (STRICT)

Return a single JSON object in the following format:

{
  "is_valid": boolean,
  "errors": string[],
  "warnings": string[]
}

Rules:
- "is_valid" must be true ONLY if "errors" is empty.
- Warnings must NOT make "is_valid" false.
- Do NOT include any fields outside this schema.

    """

    try:
        response = client.chat.completions.create(
            model=MODEL_NAME,
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": json.dumps(user_data)}
            ],
            response_format={"type": "json_object"},
            temperature=0
        )

        content = response.choices[0].message.content
        return json.loads(content)

    except Exception as e:
        return {
            "is_valid": False,
            "errors": [f"Internal System Error: {str(e)}"],
            "warnings": []
        }

# --- MAIN EXECUTION BLOCK (CLI) ---
if __name__ == "__main__":
    if len(sys.argv) < 2:
        print("Usage: python validate_user.py <input_file.json>")
        sys.exit(1)

    file_path = sys.argv[1]

    try:
        with open(file_path, 'r') as f:
            user_data = json.load(f)

        result = validate_user_profile(user_data)
        print(json.dumps(result, indent=2))

    except FileNotFoundError:
        print(json.dumps({
            "is_valid": False,
            "errors": [f"File not found: {file_path}"],
            "warnings": []
        }, indent=2))
    except json.JSONDecodeError:
        print(json.dumps({
            "is_valid": False,
            "errors": ["Invalid JSON format in input file"],
            "warnings": []
        }, indent=2))