Skip to content

merge: optimize AI#18

Merged
galagyy merged 10 commits intomainfrom
ai-v5
Oct 24, 2025
Merged

merge: optimize AI#18
galagyy merged 10 commits intomainfrom
ai-v5

Conversation

@galagyy
Copy link
Copy Markdown
Contributor

@galagyy galagyy commented Oct 24, 2025

  • Updated AI speed.
  • Updated AI documentation.
  • Reduced potential output for quality.

galagyy and others added 3 commits October 23, 2025 21:19
- Updated model to use a better dataset.
- Removed unneeded __init__.py files.

Co-Authored-By: Om-Kasar <188000396+Om-Kasar@users.noreply.github.com>
- Removed unneeded endpoints.
- Kept the health and generation endpoints.
- Updated to a newer, more concise model.
- Reduced dataset size for more accurate results.

Co-Authored-By: Om-Kasar <188000396+Om-Kasar@users.noreply.github.com>
- Updated Python documentation.
- Removed unneeded Python files (test_recommend.py)
- Updated code version.

Co-Authored-By: Om-Kasar <188000396+Om-Kasar@users.noreply.github.com>
@galagyy galagyy requested a review from Copilot October 24, 2025 05:20
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR optimizes the AI career recommendation system by updating the model version to 4.0.0, implementing memory-efficient calibration, and streamlining the API to essential endpoints only.

Key Changes:

  • Updated model version from 2.0.0 to 4.0.0 with optimized hyperparameters (200 trees, max depth 20)
  • Implemented memory-efficient probability calibration using sigmoid method instead of isotonic
  • Reduced API surface by removing batch processing and auxiliary endpoints

Reviewed Changes

Copilot reviewed 8 out of 13 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
ai/careers/run.py Version bump to 4.0.0
ai/careers/test_recommend.py Removed legacy test script
ai/careers/examples.py Removed example demonstrations file
ai/careers/app/services/create_dataset.py Complete rewrite: replaced dataset generation with production model implementation including calibration and memory error handling
ai/careers/app/services/career_recommender.py Added probability calibration, confidence boosting, and memory-efficient training with fallback mechanisms
ai/careers/app/routes/career_routes.py Simplified API from POST to GET endpoint, removed batch processing and auxiliary endpoints
ai/careers/app/main.py Added model type detection logic for calibrated models
ai/careers/app/config.py Version update and removed example .env template

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines +936 to +938
model_path = 'C:\\repos\\Aspira\\ai\\app\\services\\career_recommender_model.pkl'

if not Path(model_path).exists():
Copy link

Copilot AI Oct 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded absolute Windows path detected. This will fail on non-Windows systems and different development environments. Use relative paths or environment variables instead, e.g., Path(__file__).parent / 'models' / 'career_recommender_model.pkl' or retrieve from configuration.

Suggested change
model_path = 'C:\\repos\\Aspira\\ai\\app\\services\\career_recommender_model.pkl'
if not Path(model_path).exists():
model_path = Path(__file__).parent / 'career_recommender_model.pkl'
if not model_path.exists():

Copilot uses AI. Check for mistakes.
Comment on lines +930 to +978
logger.info("="*70)
logger.info("CAREER RECOMMENDER - PREDICTION TEST")
logger.info("="*70)

print(f"\n📊 Sample Preview (first 5 rows):")
print(df.head(5).to_string())
try:
logger.info("Loading pre-trained model...")
model_path = 'C:\\repos\\Aspira\\ai\\app\\services\\career_recommender_model.pkl'

if not Path(model_path).exists():
logger.error(f"Model file not found: {model_path}")
logger.error("Please train the model first by running the training script.")
sys.exit(1)

recommender = CareerRecommender.load_from_file(model_path)
logger.info("Model loaded successfully!")
logger.info("="*70)

# Define test case
test_input = {
'interests': ['Literature', 'History & Culture'],
'traits': ['Patient', 'Empathetic', 'Organized'],
'skills': ['Public Speaking', 'Leadership', 'Time Management'],
'goal': 'Personal growth and skill development'
}

# Run prediction
logger.info("Running prediction...")
logger.info("-"*70)

result = recommender.predict(test_input, top_k=3)

logger.info(f"PREDICTION: {result['top_career']}")
logger.info(f"CONFIDENCE: {result['confidence_percent']}")
logger.info(f"")

# Save to CSV
print(f"\n💾 Saving to CSV...")
csv_filename = 'career_prediction_dataset.csv'
df.to_csv(csv_filename, index=False)
logger.info(f"Top 3 Careers:")
for pred in result['top_predictions']:
logger.info(f" {pred['rank']}. {pred['career']}: {pred['confidence']}")

print(f"\n{'='*70}")
print("✅ SUCCESS!")
print(f"{'='*70}")
print(f"\n📁 File saved: {csv_filename}")
print(f"📦 File size: ~{df.memory_usage(deep=True).sum() / (1024*1024):.2f} MB")
print(f"\n🎓 Ready for machine learning!")
print(f" • Perfect for Random Forest")
print(f" • Suitable for Neural Networks")
print(f" • Compatible with XGBoost, SVM, etc.")
print(f"\n{'='*70}\n") No newline at end of file
logger.info("="*70)
logger.info("PREDICTION COMPLETED SUCCESSFULLY!")
except FileNotFoundError as e:
logger.error(f"Error: {e}")
except Exception as e:
logger.error(f"An error occurred: {e}")
import traceback
logger.error(traceback.format_exc()) No newline at end of file
Copy link

Copilot AI Oct 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test/demo code found at module level in production file. This code at lines 928-978 will execute on module import, which is inappropriate for a production module. Move this test code to a separate test file or wrap it in a proper if __name__ == '__main__': block (which is already present but appears after the docstring).

Suggested change
logger.info("="*70)
logger.info("CAREER RECOMMENDER - PREDICTION TEST")
logger.info("="*70)
print(f"\n📊 Sample Preview (first 5 rows):")
print(df.head(5).to_string())
try:
logger.info("Loading pre-trained model...")
model_path = 'C:\\repos\\Aspira\\ai\\app\\services\\career_recommender_model.pkl'
if not Path(model_path).exists():
logger.error(f"Model file not found: {model_path}")
logger.error("Please train the model first by running the training script.")
sys.exit(1)
recommender = CareerRecommender.load_from_file(model_path)
logger.info("Model loaded successfully!")
logger.info("="*70)
# Define test case
test_input = {
'interests': ['Literature', 'History & Culture'],
'traits': ['Patient', 'Empathetic', 'Organized'],
'skills': ['Public Speaking', 'Leadership', 'Time Management'],
'goal': 'Personal growth and skill development'
}
# Run prediction
logger.info("Running prediction...")
logger.info("-"*70)
result = recommender.predict(test_input, top_k=3)
logger.info(f"PREDICTION: {result['top_career']}")
logger.info(f"CONFIDENCE: {result['confidence_percent']}")
logger.info(f"")
# Save to CSV
print(f"\n💾 Saving to CSV...")
csv_filename = 'career_prediction_dataset.csv'
df.to_csv(csv_filename, index=False)
logger.info(f"Top 3 Careers:")
for pred in result['top_predictions']:
logger.info(f" {pred['rank']}. {pred['career']}: {pred['confidence']}")
print(f"\n{'='*70}")
print("✅ SUCCESS!")
print(f"{'='*70}")
print(f"\n📁 File saved: {csv_filename}")
print(f"📦 File size: ~{df.memory_usage(deep=True).sum() / (1024*1024):.2f} MB")
print(f"\n🎓 Ready for machine learning!")
print(f" • Perfect for Random Forest")
print(f" • Suitable for Neural Networks")
print(f" • Compatible with XGBoost, SVM, etc.")
print(f"\n{'='*70}\n")
\ No newline at end of file
logger.info("="*70)
logger.info("PREDICTION COMPLETED SUCCESSFULLY!")
except FileNotFoundError as e:
logger.error(f"Error: {e}")
except Exception as e:
logger.error(f"An error occurred: {e}")
import traceback
logger.error(traceback.format_exc())
if __name__ == '__main__':
logger.info("="*70)
logger.info("CAREER RECOMMENDER - PREDICTION TEST")
logger.info("="*70)
try:
logger.info("Loading pre-trained model...")
model_path = 'C:\\repos\\Aspira\\ai\\app\\services\\career_recommender_model.pkl'
if not Path(model_path).exists():
logger.error(f"Model file not found: {model_path}")
logger.error("Please train the model first by running the training script.")
sys.exit(1)
recommender = CareerRecommender.load_from_file(model_path)
logger.info("Model loaded successfully!")
logger.info("="*70)
# Define test case
test_input = {
'interests': ['Literature', 'History & Culture'],
'traits': ['Patient', 'Empathetic', 'Organized'],
'skills': ['Public Speaking', 'Leadership', 'Time Management'],
'goal': 'Personal growth and skill development'
}
# Run prediction
logger.info("Running prediction...")
logger.info("-"*70)
result = recommender.predict(test_input, top_k=3)
logger.info(f"PREDICTION: {result['top_career']}")
logger.info(f"CONFIDENCE: {result['confidence_percent']}")
logger.info(f"")
logger.info(f"Top 3 Careers:")
for pred in result['top_predictions']:
logger.info(f" {pred['rank']}. {pred['career']}: {pred['confidence']}")
logger.info("="*70)
logger.info("PREDICTION COMPLETED SUCCESSFULLY!")
except FileNotFoundError as e:
logger.error(f"Error: {e}")
except Exception as e:
logger.error(f"An error occurred: {e}")
import traceback
logger.error(traceback.format_exc())

Copilot uses AI. Check for mistakes.
Comment on lines +111 to +114
n_estimators: Number of trees in the forest (default: 500)
max_depth: Maximum depth of trees (default: 30)
min_samples_split: Minimum samples required to split (default: 2)
min_samples_leaf: Minimum samples required at leaf (default: 1)
Copy link

Copilot AI Oct 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation is outdated. The docstring states defaults of 500 trees and max_depth of 30, but the actual defaults from OPTIMAL_PARAMS are 200 and 20 respectively (lines 65-66). Update the docstring to reflect the correct default values.

Suggested change
n_estimators: Number of trees in the forest (default: 500)
max_depth: Maximum depth of trees (default: 30)
min_samples_split: Minimum samples required to split (default: 2)
min_samples_leaf: Minimum samples required at leaf (default: 1)
n_estimators: Number of trees in the forest (default: 200)
max_depth: Maximum depth of trees (default: 20)
min_samples_split: Minimum samples required to split (default: 5)
min_samples_leaf: Minimum samples required at leaf (default: 2)

Copilot uses AI. Check for mistakes.
'goal': profile.goal
'interests': hobbies,
'traits': traits,
'skills': hobbies,
Copy link

Copilot AI Oct 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 'skills' field is incorrectly mapped to 'hobbies'. This duplicates hobby data as skills, which will produce incorrect predictions since the model expects distinct skill information. The API accepts hobbies but doesn't accept skills as a parameter, yet the model requires a skills field. Either add a skills parameter to the endpoint or document why skills are being mapped to hobbies.

Copilot uses AI. Check for mistakes.
Comment on lines +78 to +79
if hasattr(model.model, 'estimator'):
n_estimators = model.model.estimator.n_estimators
Copy link

Copilot AI Oct 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect attribute access for CalibratedClassifierCV. The calibrated model uses base_estimator (or estimator_ in older sklearn versions), not estimator. This will fail to retrieve n_estimators from calibrated models. Use model.model.base_estimator.n_estimators or check for base_estimator attribute.

Suggested change
if hasattr(model.model, 'estimator'):
n_estimators = model.model.estimator.n_estimators
if hasattr(model.model, 'base_estimator'):
n_estimators = getattr(model.model.base_estimator, 'n_estimators', "N/A")

Copilot uses AI. Check for mistakes.
galagyy and others added 3 commits October 23, 2025 22:24
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-Authored-By: Om-Kasar <188000396+Om-Kasar@users.noreply.github.com>
- Fixed inputted keys.
- Fixed documentation.
- Fixed n_estimator.

Co-Authored-By: Om-Kasar <188000396+Om-Kasar@users.noreply.github.com>
@galagyy galagyy requested a review from Copilot October 24, 2025 05:41
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 8 out of 13 changed files in this pull request and generated 3 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.


Author: Aspira AI Team
Version: 2.0.0
Version: 3.0.0
Copy link

Copilot AI Oct 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Version mismatch: The file header shows version 3.0.0, but MODEL_VERSION constant on line 61 and other files (run.py, main.py, config.py) show version 4.0.0. Update the header to version 4.0.0 for consistency.

Suggested change
Version: 3.0.0
Version: 4.0.0

Copilot uses AI. Check for mistakes.
based on user profiles including interests, traits, skills, and career goals.

Author: Aspira AI Team
Version: 4.0.0
Copy link

Copilot AI Oct 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file header describes this as 'Career Path Recommender - Production Ready' and lists methods like 'load_and_validate_data', 'train', 'predict', etc., but the filename is 'create_dataset.py' which suggests data generation functionality. This file appears to be a duplicate or misplaced copy of career_recommender.py. Either rename this file to match its actual purpose or ensure it contains dataset creation code as the name suggests.

Copilot uses AI. Check for mistakes.
Comment on lines 101 to 102
@router.get(
"/recommend",
Copy link

Copilot AI Oct 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing the recommendation endpoint from POST to GET violates REST API best practices. POST is more appropriate for this endpoint because: (1) it accepts complex input data (traits, interests, skills, goals) that should be in the request body, (2) GET requests with multiple query parameters become unwieldy and hit URL length limits, (3) the operation generates recommendations which is a non-idempotent action. Revert this endpoint to use POST with a request body.

Copilot uses AI. Check for mistakes.
@galagyy galagyy marked this pull request as draft October 24, 2025 05:49
galagyy and others added 4 commits October 23, 2025 22:54
- Updated endpoint version.
- Change recommendation endpoint to POST.

Co-Authored-By: Om-Kasar <188000396+Om-Kasar@users.noreply.github.com>
- Updated career_routes to follow the PEP8 standard.
- Updated all code to follow all PEP8 standards; fixed W293, E501, W291, and E128.
@galagyy galagyy merged commit f6c11af into main Oct 24, 2025
1 check passed
@galagyy galagyy deleted the ai-v5 branch October 25, 2025 01:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants