merge: optimize AI by galagyy · Pull Request #18 · SoraCreationKit/Aspira

galagyy · 2025-10-24T05:20:21Z

Updated AI speed.
Updated AI documentation.
Reduced potential output for quality.

- Updated model to use a better dataset. - Removed unneeded __init__.py files. Co-Authored-By: Om-Kasar <188000396+Om-Kasar@users.noreply.github.com>

- Removed unneeded endpoints. - Kept the health and generation endpoints. - Updated to a newer, more concise model. - Reduced dataset size for more accurate results. Co-Authored-By: Om-Kasar <188000396+Om-Kasar@users.noreply.github.com>

- Updated Python documentation. - Removed unneeded Python files (test_recommend.py) - Updated code version. Co-Authored-By: Om-Kasar <188000396+Om-Kasar@users.noreply.github.com>

Copilot

Pull Request Overview

This PR optimizes the AI career recommendation system by updating the model version to 4.0.0, implementing memory-efficient calibration, and streamlining the API to essential endpoints only.

Key Changes:

Updated model version from 2.0.0 to 4.0.0 with optimized hyperparameters (200 trees, max depth 20)
Implemented memory-efficient probability calibration using sigmoid method instead of isotonic
Reduced API surface by removing batch processing and auxiliary endpoints

Reviewed Changes

Copilot reviewed 8 out of 13 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
ai/careers/run.py	Version bump to 4.0.0
ai/careers/test_recommend.py	Removed legacy test script
ai/careers/examples.py	Removed example demonstrations file
ai/careers/app/services/create_dataset.py	Complete rewrite: replaced dataset generation with production model implementation including calibration and memory error handling
ai/careers/app/services/career_recommender.py	Added probability calibration, confidence boosting, and memory-efficient training with fallback mechanisms
ai/careers/app/routes/career_routes.py	Simplified API from POST to GET endpoint, removed batch processing and auxiliary endpoints
ai/careers/app/main.py	Added model type detection logic for calibrated models
ai/careers/app/config.py	Version update and removed example .env template

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

ai/careers/app/services/create_dataset.py

Copilot · 2025-10-24T05:22:12Z

ai/careers/app/services/create_dataset.py

+    model_path = 'C:\\repos\\Aspira\\ai\\app\\services\\career_recommender_model.pkl'
+
+    if not Path(model_path).exists():


Hardcoded absolute Windows path detected. This will fail on non-Windows systems and different development environments. Use relative paths or environment variables instead, e.g., Path(__file__).parent / 'models' / 'career_recommender_model.pkl' or retrieve from configuration.

Suggested change

model_path = 'C:\\repos\\Aspira\\ai\\app\\services\\career_recommender_model.pkl'

if not Path(model_path).exists():

model_path = Path(__file__).parent / 'career_recommender_model.pkl'

if not model_path.exists():

Copilot · 2025-10-24T05:22:12Z

ai/careers/app/services/create_dataset.py

+logger.info("="*70)
+logger.info("CAREER RECOMMENDER - PREDICTION TEST")
+logger.info("="*70)

-print(f"\n📊 Sample Preview (first 5 rows):")
-print(df.head(5).to_string())
+try:
+    logger.info("Loading pre-trained model...")
+    model_path = 'C:\\repos\\Aspira\\ai\\app\\services\\career_recommender_model.pkl'
+
+    if not Path(model_path).exists():
+        logger.error(f"Model file not found: {model_path}")
+        logger.error("Please train the model first by running the training script.")
+        sys.exit(1)
+
+    recommender = CareerRecommender.load_from_file(model_path)
+    logger.info("Model loaded successfully!")
+    logger.info("="*70)
+
+    # Define test case
+    test_input = {
+                    'interests': ['Literature', 'History & Culture'],
+                    'traits': ['Patient', 'Empathetic', 'Organized'],
+                    'skills': ['Public Speaking', 'Leadership', 'Time Management'],
+                    'goal': 'Personal growth and skill development'
+                }
+
+    # Run prediction
+    logger.info("Running prediction...")
+    logger.info("-"*70)
+
+    result = recommender.predict(test_input, top_k=3)
+
+    logger.info(f"PREDICTION: {result['top_career']}")
+    logger.info(f"CONFIDENCE: {result['confidence_percent']}")
+    logger.info(f"")

-# Save to CSV
-print(f"\n💾 Saving to CSV...")
-csv_filename = 'career_prediction_dataset.csv'
-df.to_csv(csv_filename, index=False)
+    logger.info(f"Top 3 Careers:")
+    for pred in result['top_predictions']:
+        logger.info(f"  {pred['rank']}. {pred['career']}: {pred['confidence']}")

-print(f"\n{'='*70}")
-print("✅ SUCCESS!")
-print(f"{'='*70}")
-print(f"\n📁 File saved: {csv_filename}")
-print(f"📦 File size: ~{df.memory_usage(deep=True).sum() / (1024*1024):.2f} MB")
-print(f"\n🎓 Ready for machine learning!")
-print(f"   • Perfect for Random Forest")
-print(f"   • Suitable for Neural Networks")
-print(f"   • Compatible with XGBoost, SVM, etc.")
-print(f"\n{'='*70}\n")
+    logger.info("="*70)
+    
+    logger.info("PREDICTION COMPLETED SUCCESSFULLY!")
+    
+except FileNotFoundError as e:
+    logger.error(f"Error: {e}")
+except Exception as e:
+    logger.error(f"An error occurred: {e}")
+    import traceback
+    logger.error(traceback.format_exc())


Test/demo code found at module level in production file. This code at lines 928-978 will execute on module import, which is inappropriate for a production module. Move this test code to a separate test file or wrap it in a proper if __name__ == '__main__': block (which is already present but appears after the docstring).

Suggested change

logger.info("="*70)

logger.info("CAREER RECOMMENDER - PREDICTION TEST")

logger.info("="*70)

print(f"\n📊 Sample Preview (first 5 rows):")

print(df.head(5).to_string())

try:

logger.info("Loading pre-trained model...")

model_path = 'C:\\repos\\Aspira\\ai\\app\\services\\career_recommender_model.pkl'

if not Path(model_path).exists():

logger.error(f"Model file not found: {model_path}")

logger.error("Please train the model first by running the training script.")

sys.exit(1)

recommender = CareerRecommender.load_from_file(model_path)

logger.info("Model loaded successfully!")

logger.info("="*70)

# Define test case

test_input = {

'interests': ['Literature', 'History & Culture'],

'traits': ['Patient', 'Empathetic', 'Organized'],

'skills': ['Public Speaking', 'Leadership', 'Time Management'],

'goal': 'Personal growth and skill development'

}

# Run prediction

logger.info("Running prediction...")

logger.info("-"*70)

result = recommender.predict(test_input, top_k=3)

logger.info(f"PREDICTION: {result['top_career']}")

logger.info(f"CONFIDENCE: {result['confidence_percent']}")

logger.info(f"")

# Save to CSV

print(f"\n💾 Saving to CSV...")

csv_filename = 'career_prediction_dataset.csv'

df.to_csv(csv_filename, index=False)

logger.info(f"Top 3 Careers:")

for pred in result['top_predictions']:

logger.info(f" {pred['rank']}. {pred['career']}: {pred['confidence']}")

print(f"\n{'='*70}")

print("✅ SUCCESS!")

print(f"{'='*70}")

print(f"\n📁 File saved: {csv_filename}")

print(f"📦 File size: ~{df.memory_usage(deep=True).sum() / (1024*1024):.2f} MB")

print(f"\n🎓 Ready for machine learning!")

print(f" • Perfect for Random Forest")

print(f" • Suitable for Neural Networks")

print(f" • Compatible with XGBoost, SVM, etc.")

print(f"\n{'='*70}\n")

\ No newline at end of file

logger.info("="*70)

logger.info("PREDICTION COMPLETED SUCCESSFULLY!")

except FileNotFoundError as e:

logger.error(f"Error: {e}")

except Exception as e:

logger.error(f"An error occurred: {e}")

import traceback

logger.error(traceback.format_exc())

if __name__ == '__main__':

logger.info("="*70)

logger.info("CAREER RECOMMENDER - PREDICTION TEST")

logger.info("="*70)

try:

logger.info("Loading pre-trained model...")

model_path = 'C:\\repos\\Aspira\\ai\\app\\services\\career_recommender_model.pkl'

if not Path(model_path).exists():

logger.error(f"Model file not found: {model_path}")

logger.error("Please train the model first by running the training script.")

sys.exit(1)

recommender = CareerRecommender.load_from_file(model_path)

logger.info("Model loaded successfully!")

logger.info("="*70)

# Define test case

test_input = {

'interests': ['Literature', 'History & Culture'],

'traits': ['Patient', 'Empathetic', 'Organized'],

'skills': ['Public Speaking', 'Leadership', 'Time Management'],

'goal': 'Personal growth and skill development'

}

# Run prediction

logger.info("Running prediction...")

logger.info("-"*70)

result = recommender.predict(test_input, top_k=3)

logger.info(f"PREDICTION: {result['top_career']}")

logger.info(f"CONFIDENCE: {result['confidence_percent']}")

logger.info(f"")

logger.info(f"Top 3 Careers:")

for pred in result['top_predictions']:

logger.info(f" {pred['rank']}. {pred['career']}: {pred['confidence']}")

logger.info("="*70)

logger.info("PREDICTION COMPLETED SUCCESSFULLY!")

except FileNotFoundError as e:

logger.error(f"Error: {e}")

except Exception as e:

logger.error(f"An error occurred: {e}")

import traceback

logger.error(traceback.format_exc())

Copilot · 2025-10-24T05:22:13Z

ai/careers/app/services/career_recommender.py

+            n_estimators: Number of trees in the forest (default: 500)
+            max_depth: Maximum depth of trees (default: 30)
+            min_samples_split: Minimum samples required to split (default: 2)
+            min_samples_leaf: Minimum samples required at leaf (default: 1)


Documentation is outdated. The docstring states defaults of 500 trees and max_depth of 30, but the actual defaults from OPTIMAL_PARAMS are 200 and 20 respectively (lines 65-66). Update the docstring to reflect the correct default values.

Suggested change

n_estimators: Number of trees in the forest (default: 500)

max_depth: Maximum depth of trees (default: 30)

min_samples_split: Minimum samples required to split (default: 2)

min_samples_leaf: Minimum samples required at leaf (default: 1)

n_estimators: Number of trees in the forest (default: 200)

max_depth: Maximum depth of trees (default: 20)

min_samples_split: Minimum samples required to split (default: 5)

min_samples_leaf: Minimum samples required at leaf (default: 2)

Copilot · 2025-10-24T05:22:13Z

ai/careers/app/routes/career_routes.py

-            'goal': profile.goal
+            'interests': hobbies,
+            'traits': traits,
+            'skills': hobbies,


The 'skills' field is incorrectly mapped to 'hobbies'. This duplicates hobby data as skills, which will produce incorrect predictions since the model expects distinct skill information. The API accepts hobbies but doesn't accept skills as a parameter, yet the model requires a skills field. Either add a skills parameter to the endpoint or document why skills are being mapped to hobbies.

Copilot · 2025-10-24T05:22:13Z

ai/careers/app/main.py

+            if hasattr(model.model, 'estimator'):
+                n_estimators = model.model.estimator.n_estimators


Incorrect attribute access for CalibratedClassifierCV. The calibrated model uses base_estimator (or estimator_ in older sklearn versions), not estimator. This will fail to retrieve n_estimators from calibrated models. Use model.model.base_estimator.n_estimators or check for base_estimator attribute.

Suggested change

if hasattr(model.model, 'estimator'):

n_estimators = model.model.estimator.n_estimators

if hasattr(model.model, 'base_estimator'):

n_estimators = getattr(model.model.base_estimator, 'n_estimators', "N/A")

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Co-Authored-By: Om-Kasar <188000396+Om-Kasar@users.noreply.github.com>

- Fixed inputted keys. - Fixed documentation. - Fixed n_estimator. Co-Authored-By: Om-Kasar <188000396+Om-Kasar@users.noreply.github.com>

Copilot

Pull Request Overview

Copilot reviewed 8 out of 13 changed files in this pull request and generated 3 comments.

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-10-24T05:43:19Z

ai/careers/app/services/career_recommender.py


 Author: Aspira AI Team
-Version: 2.0.0
+Version: 3.0.0


Version mismatch: The file header shows version 3.0.0, but MODEL_VERSION constant on line 61 and other files (run.py, main.py, config.py) show version 4.0.0. Update the header to version 4.0.0 for consistency.

Suggested change

Version: 3.0.0

Version: 4.0.0

Copilot · 2025-10-24T05:43:20Z

ai/careers/app/services/create_dataset.py

+based on user profiles including interests, traits, skills, and career goals.
+
+Author: Aspira AI Team
+Version: 4.0.0


The file header describes this as 'Career Path Recommender - Production Ready' and lists methods like 'load_and_validate_data', 'train', 'predict', etc., but the filename is 'create_dataset.py' which suggests data generation functionality. This file appears to be a duplicate or misplaced copy of career_recommender.py. Either rename this file to match its actual purpose or ensure it contains dataset creation code as the name suggests.

Copilot · 2025-10-24T05:43:20Z

ai/careers/app/routes/career_routes.py

+@router.get(
    "/recommend",


Changing the recommendation endpoint from POST to GET violates REST API best practices. POST is more appropriate for this endpoint because: (1) it accepts complex input data (traits, interests, skills, goals) that should be in the request body, (2) GET requests with multiple query parameters become unwieldy and hit URL length limits, (3) the operation generates recommendations which is a non-idempotent action. Revert this endpoint to use POST with a request body.

- Updated endpoint version. - Change recommendation endpoint to POST. Co-Authored-By: Om-Kasar <188000396+Om-Kasar@users.noreply.github.com>

- Updated career_routes to follow the PEP8 standard.

- Updated all code to follow all PEP8 standards; fixed W293, E501, W291, and E128.

galagyy and others added 3 commits October 23, 2025 21:19

feat(ai): update dataset & cleanliness

6c41c89

- Updated model to use a better dataset. - Removed unneeded __init__.py files. Co-Authored-By: Om-Kasar <188000396+Om-Kasar@users.noreply.github.com>

feat(ai): clean endpoints

4cab101

- Removed unneeded endpoints. - Kept the health and generation endpoints. - Updated to a newer, more concise model. - Reduced dataset size for more accurate results. Co-Authored-By: Om-Kasar <188000396+Om-Kasar@users.noreply.github.com>

feat(ai): update documentation

b8eb86f

- Updated Python documentation. - Removed unneeded Python files (test_recommend.py) - Updated code version. Co-Authored-By: Om-Kasar <188000396+Om-Kasar@users.noreply.github.com>

galagyy requested a review from Copilot October 24, 2025 05:20

Copilot AI reviewed Oct 24, 2025

View reviewed changes

galagyy and others added 3 commits October 23, 2025 22:24

fix(ai): update model version

78d7bd4

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

fix(ai): remove hard coded path

934cc7a

Co-Authored-By: Om-Kasar <188000396+Om-Kasar@users.noreply.github.com>

fix(ai): keys & documentation

b664bf8

- Fixed inputted keys. - Fixed documentation. - Fixed n_estimator. Co-Authored-By: Om-Kasar <188000396+Om-Kasar@users.noreply.github.com>

galagyy requested a review from Copilot October 24, 2025 05:41

Copilot AI reviewed Oct 24, 2025

View reviewed changes

galagyy marked this pull request as draft October 24, 2025 05:49

galagyy and others added 4 commits October 23, 2025 22:54

fix(ai): change endpoint from GET to POST

183a2f8

- Updated endpoint version. - Change recommendation endpoint to POST. Co-Authored-By: Om-Kasar <188000396+Om-Kasar@users.noreply.github.com>

fix(ai): update career_routes for PEP8

4d92b22

- Updated career_routes to follow the PEP8 standard.

fix(ai): update files to PEP8 standard

9b7c5c7

- Updated all code to follow all PEP8 standards; fixed W293, E501, W291, and E128.

fix(ai): fix small convention issues

35da1a4

galagyy merged commit f6c11af into main Oct 24, 2025
1 check passed

galagyy deleted the ai-v5 branch October 25, 2025 01:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

merge: optimize AI#18

merge: optimize AI#18
galagyy merged 10 commits intomainfrom
ai-v5

galagyy commented Oct 24, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI Oct 24, 2025

Uh oh!

Copilot AI Oct 24, 2025

Uh oh!

Copilot AI Oct 24, 2025

Uh oh!

Copilot AI Oct 24, 2025

Uh oh!

Copilot AI Oct 24, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Oct 24, 2025

Uh oh!

Copilot AI Oct 24, 2025

Uh oh!

Copilot AI Oct 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		model_path = 'C:\\repos\\Aspira\\ai\\app\\services\\career_recommender_model.pkl'

		if not Path(model_path).exists():

		if hasattr(model.model, 'estimator'):
		n_estimators = model.model.estimator.n_estimators

		@router.get(
		"/recommend",

Conversation

galagyy commented Oct 24, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Copilot AI Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Copilot AI Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants