Conversation
galagyy
commented
Oct 24, 2025
- Updated AI speed.
- Updated AI documentation.
- Reduced potential output for quality.
- Updated model to use a better dataset. - Removed unneeded __init__.py files. Co-Authored-By: Om-Kasar <188000396+Om-Kasar@users.noreply.github.com>
- Removed unneeded endpoints. - Kept the health and generation endpoints. - Updated to a newer, more concise model. - Reduced dataset size for more accurate results. Co-Authored-By: Om-Kasar <188000396+Om-Kasar@users.noreply.github.com>
- Updated Python documentation. - Removed unneeded Python files (test_recommend.py) - Updated code version. Co-Authored-By: Om-Kasar <188000396+Om-Kasar@users.noreply.github.com>
There was a problem hiding this comment.
Pull Request Overview
This PR optimizes the AI career recommendation system by updating the model version to 4.0.0, implementing memory-efficient calibration, and streamlining the API to essential endpoints only.
Key Changes:
- Updated model version from 2.0.0 to 4.0.0 with optimized hyperparameters (200 trees, max depth 20)
- Implemented memory-efficient probability calibration using sigmoid method instead of isotonic
- Reduced API surface by removing batch processing and auxiliary endpoints
Reviewed Changes
Copilot reviewed 8 out of 13 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| ai/careers/run.py | Version bump to 4.0.0 |
| ai/careers/test_recommend.py | Removed legacy test script |
| ai/careers/examples.py | Removed example demonstrations file |
| ai/careers/app/services/create_dataset.py | Complete rewrite: replaced dataset generation with production model implementation including calibration and memory error handling |
| ai/careers/app/services/career_recommender.py | Added probability calibration, confidence boosting, and memory-efficient training with fallback mechanisms |
| ai/careers/app/routes/career_routes.py | Simplified API from POST to GET endpoint, removed batch processing and auxiliary endpoints |
| ai/careers/app/main.py | Added model type detection logic for calibrated models |
| ai/careers/app/config.py | Version update and removed example .env template |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| model_path = 'C:\\repos\\Aspira\\ai\\app\\services\\career_recommender_model.pkl' | ||
|
|
||
| if not Path(model_path).exists(): |
There was a problem hiding this comment.
Hardcoded absolute Windows path detected. This will fail on non-Windows systems and different development environments. Use relative paths or environment variables instead, e.g., Path(__file__).parent / 'models' / 'career_recommender_model.pkl' or retrieve from configuration.
| model_path = 'C:\\repos\\Aspira\\ai\\app\\services\\career_recommender_model.pkl' | |
| if not Path(model_path).exists(): | |
| model_path = Path(__file__).parent / 'career_recommender_model.pkl' | |
| if not model_path.exists(): |
| logger.info("="*70) | ||
| logger.info("CAREER RECOMMENDER - PREDICTION TEST") | ||
| logger.info("="*70) | ||
|
|
||
| print(f"\n📊 Sample Preview (first 5 rows):") | ||
| print(df.head(5).to_string()) | ||
| try: | ||
| logger.info("Loading pre-trained model...") | ||
| model_path = 'C:\\repos\\Aspira\\ai\\app\\services\\career_recommender_model.pkl' | ||
|
|
||
| if not Path(model_path).exists(): | ||
| logger.error(f"Model file not found: {model_path}") | ||
| logger.error("Please train the model first by running the training script.") | ||
| sys.exit(1) | ||
|
|
||
| recommender = CareerRecommender.load_from_file(model_path) | ||
| logger.info("Model loaded successfully!") | ||
| logger.info("="*70) | ||
|
|
||
| # Define test case | ||
| test_input = { | ||
| 'interests': ['Literature', 'History & Culture'], | ||
| 'traits': ['Patient', 'Empathetic', 'Organized'], | ||
| 'skills': ['Public Speaking', 'Leadership', 'Time Management'], | ||
| 'goal': 'Personal growth and skill development' | ||
| } | ||
|
|
||
| # Run prediction | ||
| logger.info("Running prediction...") | ||
| logger.info("-"*70) | ||
|
|
||
| result = recommender.predict(test_input, top_k=3) | ||
|
|
||
| logger.info(f"PREDICTION: {result['top_career']}") | ||
| logger.info(f"CONFIDENCE: {result['confidence_percent']}") | ||
| logger.info(f"") | ||
|
|
||
| # Save to CSV | ||
| print(f"\n💾 Saving to CSV...") | ||
| csv_filename = 'career_prediction_dataset.csv' | ||
| df.to_csv(csv_filename, index=False) | ||
| logger.info(f"Top 3 Careers:") | ||
| for pred in result['top_predictions']: | ||
| logger.info(f" {pred['rank']}. {pred['career']}: {pred['confidence']}") | ||
|
|
||
| print(f"\n{'='*70}") | ||
| print("✅ SUCCESS!") | ||
| print(f"{'='*70}") | ||
| print(f"\n📁 File saved: {csv_filename}") | ||
| print(f"📦 File size: ~{df.memory_usage(deep=True).sum() / (1024*1024):.2f} MB") | ||
| print(f"\n🎓 Ready for machine learning!") | ||
| print(f" • Perfect for Random Forest") | ||
| print(f" • Suitable for Neural Networks") | ||
| print(f" • Compatible with XGBoost, SVM, etc.") | ||
| print(f"\n{'='*70}\n") No newline at end of file | ||
| logger.info("="*70) | ||
| logger.info("PREDICTION COMPLETED SUCCESSFULLY!") | ||
| except FileNotFoundError as e: | ||
| logger.error(f"Error: {e}") | ||
| except Exception as e: | ||
| logger.error(f"An error occurred: {e}") | ||
| import traceback | ||
| logger.error(traceback.format_exc()) No newline at end of file |
There was a problem hiding this comment.
Test/demo code found at module level in production file. This code at lines 928-978 will execute on module import, which is inappropriate for a production module. Move this test code to a separate test file or wrap it in a proper if __name__ == '__main__': block (which is already present but appears after the docstring).
| logger.info("="*70) | |
| logger.info("CAREER RECOMMENDER - PREDICTION TEST") | |
| logger.info("="*70) | |
| print(f"\n📊 Sample Preview (first 5 rows):") | |
| print(df.head(5).to_string()) | |
| try: | |
| logger.info("Loading pre-trained model...") | |
| model_path = 'C:\\repos\\Aspira\\ai\\app\\services\\career_recommender_model.pkl' | |
| if not Path(model_path).exists(): | |
| logger.error(f"Model file not found: {model_path}") | |
| logger.error("Please train the model first by running the training script.") | |
| sys.exit(1) | |
| recommender = CareerRecommender.load_from_file(model_path) | |
| logger.info("Model loaded successfully!") | |
| logger.info("="*70) | |
| # Define test case | |
| test_input = { | |
| 'interests': ['Literature', 'History & Culture'], | |
| 'traits': ['Patient', 'Empathetic', 'Organized'], | |
| 'skills': ['Public Speaking', 'Leadership', 'Time Management'], | |
| 'goal': 'Personal growth and skill development' | |
| } | |
| # Run prediction | |
| logger.info("Running prediction...") | |
| logger.info("-"*70) | |
| result = recommender.predict(test_input, top_k=3) | |
| logger.info(f"PREDICTION: {result['top_career']}") | |
| logger.info(f"CONFIDENCE: {result['confidence_percent']}") | |
| logger.info(f"") | |
| # Save to CSV | |
| print(f"\n💾 Saving to CSV...") | |
| csv_filename = 'career_prediction_dataset.csv' | |
| df.to_csv(csv_filename, index=False) | |
| logger.info(f"Top 3 Careers:") | |
| for pred in result['top_predictions']: | |
| logger.info(f" {pred['rank']}. {pred['career']}: {pred['confidence']}") | |
| print(f"\n{'='*70}") | |
| print("✅ SUCCESS!") | |
| print(f"{'='*70}") | |
| print(f"\n📁 File saved: {csv_filename}") | |
| print(f"📦 File size: ~{df.memory_usage(deep=True).sum() / (1024*1024):.2f} MB") | |
| print(f"\n🎓 Ready for machine learning!") | |
| print(f" • Perfect for Random Forest") | |
| print(f" • Suitable for Neural Networks") | |
| print(f" • Compatible with XGBoost, SVM, etc.") | |
| print(f"\n{'='*70}\n") | |
| \ No newline at end of file | |
| logger.info("="*70) | |
| logger.info("PREDICTION COMPLETED SUCCESSFULLY!") | |
| except FileNotFoundError as e: | |
| logger.error(f"Error: {e}") | |
| except Exception as e: | |
| logger.error(f"An error occurred: {e}") | |
| import traceback | |
| logger.error(traceback.format_exc()) | |
| if __name__ == '__main__': | |
| logger.info("="*70) | |
| logger.info("CAREER RECOMMENDER - PREDICTION TEST") | |
| logger.info("="*70) | |
| try: | |
| logger.info("Loading pre-trained model...") | |
| model_path = 'C:\\repos\\Aspira\\ai\\app\\services\\career_recommender_model.pkl' | |
| if not Path(model_path).exists(): | |
| logger.error(f"Model file not found: {model_path}") | |
| logger.error("Please train the model first by running the training script.") | |
| sys.exit(1) | |
| recommender = CareerRecommender.load_from_file(model_path) | |
| logger.info("Model loaded successfully!") | |
| logger.info("="*70) | |
| # Define test case | |
| test_input = { | |
| 'interests': ['Literature', 'History & Culture'], | |
| 'traits': ['Patient', 'Empathetic', 'Organized'], | |
| 'skills': ['Public Speaking', 'Leadership', 'Time Management'], | |
| 'goal': 'Personal growth and skill development' | |
| } | |
| # Run prediction | |
| logger.info("Running prediction...") | |
| logger.info("-"*70) | |
| result = recommender.predict(test_input, top_k=3) | |
| logger.info(f"PREDICTION: {result['top_career']}") | |
| logger.info(f"CONFIDENCE: {result['confidence_percent']}") | |
| logger.info(f"") | |
| logger.info(f"Top 3 Careers:") | |
| for pred in result['top_predictions']: | |
| logger.info(f" {pred['rank']}. {pred['career']}: {pred['confidence']}") | |
| logger.info("="*70) | |
| logger.info("PREDICTION COMPLETED SUCCESSFULLY!") | |
| except FileNotFoundError as e: | |
| logger.error(f"Error: {e}") | |
| except Exception as e: | |
| logger.error(f"An error occurred: {e}") | |
| import traceback | |
| logger.error(traceback.format_exc()) |
| n_estimators: Number of trees in the forest (default: 500) | ||
| max_depth: Maximum depth of trees (default: 30) | ||
| min_samples_split: Minimum samples required to split (default: 2) | ||
| min_samples_leaf: Minimum samples required at leaf (default: 1) |
There was a problem hiding this comment.
Documentation is outdated. The docstring states defaults of 500 trees and max_depth of 30, but the actual defaults from OPTIMAL_PARAMS are 200 and 20 respectively (lines 65-66). Update the docstring to reflect the correct default values.
| n_estimators: Number of trees in the forest (default: 500) | |
| max_depth: Maximum depth of trees (default: 30) | |
| min_samples_split: Minimum samples required to split (default: 2) | |
| min_samples_leaf: Minimum samples required at leaf (default: 1) | |
| n_estimators: Number of trees in the forest (default: 200) | |
| max_depth: Maximum depth of trees (default: 20) | |
| min_samples_split: Minimum samples required to split (default: 5) | |
| min_samples_leaf: Minimum samples required at leaf (default: 2) |
| 'goal': profile.goal | ||
| 'interests': hobbies, | ||
| 'traits': traits, | ||
| 'skills': hobbies, |
There was a problem hiding this comment.
The 'skills' field is incorrectly mapped to 'hobbies'. This duplicates hobby data as skills, which will produce incorrect predictions since the model expects distinct skill information. The API accepts hobbies but doesn't accept skills as a parameter, yet the model requires a skills field. Either add a skills parameter to the endpoint or document why skills are being mapped to hobbies.
ai/careers/app/main.py
Outdated
| if hasattr(model.model, 'estimator'): | ||
| n_estimators = model.model.estimator.n_estimators |
There was a problem hiding this comment.
Incorrect attribute access for CalibratedClassifierCV. The calibrated model uses base_estimator (or estimator_ in older sklearn versions), not estimator. This will fail to retrieve n_estimators from calibrated models. Use model.model.base_estimator.n_estimators or check for base_estimator attribute.
| if hasattr(model.model, 'estimator'): | |
| n_estimators = model.model.estimator.n_estimators | |
| if hasattr(model.model, 'base_estimator'): | |
| n_estimators = getattr(model.model.base_estimator, 'n_estimators', "N/A") |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-Authored-By: Om-Kasar <188000396+Om-Kasar@users.noreply.github.com>
- Fixed inputted keys. - Fixed documentation. - Fixed n_estimator. Co-Authored-By: Om-Kasar <188000396+Om-Kasar@users.noreply.github.com>
There was a problem hiding this comment.
Pull Request Overview
Copilot reviewed 8 out of 13 changed files in this pull request and generated 3 comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
|
|
||
| Author: Aspira AI Team | ||
| Version: 2.0.0 | ||
| Version: 3.0.0 |
There was a problem hiding this comment.
Version mismatch: The file header shows version 3.0.0, but MODEL_VERSION constant on line 61 and other files (run.py, main.py, config.py) show version 4.0.0. Update the header to version 4.0.0 for consistency.
| Version: 3.0.0 | |
| Version: 4.0.0 |
| based on user profiles including interests, traits, skills, and career goals. | ||
|
|
||
| Author: Aspira AI Team | ||
| Version: 4.0.0 |
There was a problem hiding this comment.
The file header describes this as 'Career Path Recommender - Production Ready' and lists methods like 'load_and_validate_data', 'train', 'predict', etc., but the filename is 'create_dataset.py' which suggests data generation functionality. This file appears to be a duplicate or misplaced copy of career_recommender.py. Either rename this file to match its actual purpose or ensure it contains dataset creation code as the name suggests.
| @router.get( | ||
| "/recommend", |
There was a problem hiding this comment.
Changing the recommendation endpoint from POST to GET violates REST API best practices. POST is more appropriate for this endpoint because: (1) it accepts complex input data (traits, interests, skills, goals) that should be in the request body, (2) GET requests with multiple query parameters become unwieldy and hit URL length limits, (3) the operation generates recommendations which is a non-idempotent action. Revert this endpoint to use POST with a request body.
- Updated endpoint version. - Change recommendation endpoint to POST. Co-Authored-By: Om-Kasar <188000396+Om-Kasar@users.noreply.github.com>
- Updated career_routes to follow the PEP8 standard.
- Updated all code to follow all PEP8 standards; fixed W293, E501, W291, and E128.