-
Notifications
You must be signed in to change notification settings - Fork 4
fix: large system message in Xpert Assistant #213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes a bug where extremely large system messages (typically from extensive video transcripts) were causing Xpert Assistant failures by consuming the entire token limit and leaving no room for learner or LA messages.
- Implements proportional trimming for unit content to ensure balanced inclusion of all content types
- Calculates dynamic limits based on static content size to maintain room for chat messages
- Adds comprehensive test coverage for various content scenarios including large video transcripts
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| learning_assistant/api.py | Implements proportional trimming logic for unit content handling |
| tests/test_api.py | Adds extensive test cases for content trimming scenarios |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
6f6523c to
6ccb48d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
mraman-2U
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update the release version here
Add the new version to the change log
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| trimmed_unit_content.append({"content_type": ctype, "content_text": ""}) | ||
| continue | ||
|
|
||
| allowed_chars = max(1, int((len(text) / total_chars) * adjusted_unit_limit)) |
Copilot
AI
Oct 8, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Division by zero is possible if total_chars is 0. Although there's a check at line 159, this code could still execute if total_chars becomes 0 through other paths.
| allowed_chars = max(1, int((len(text) / total_chars) * adjusted_unit_limit)) | |
| if total_chars == 0: | |
| allowed_chars = 1 # fallback to minimum allowed | |
| else: | |
| allowed_chars = max(1, int((len(text) / total_chars) * adjusted_unit_limit)) |
| 'A' * 200, # Long string case to test trimming | ||
| [ # VIDEO content case | ||
| {'content_type': 'VIDEO', 'content_text': f"Video transcript {i} " + ("A" * 200)} for i in range(10) | ||
| ], | ||
| [ # TEXT content case | ||
| {'content_type': 'TEXT', 'content_text': f"Paragraph {i} " + ("B" * 100)} for i in range(20) | ||
| ], | ||
| [ # Mixed VIDEO + TEXT case | ||
| {'content_type': 'VIDEO', 'content_text': "Video intro " + ("C" * 100)}, | ||
| {'content_type': 'TEXT', 'content_text': "Some explanation " + ("D" * 100)}, |
Copilot
AI
Oct 8, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Creating large test strings with repeated characters could be memory inefficient. Consider using smaller test data or generating content on-demand within the test method.
| 'A' * 200, # Long string case to test trimming | |
| [ # VIDEO content case | |
| {'content_type': 'VIDEO', 'content_text': f"Video transcript {i} " + ("A" * 200)} for i in range(10) | |
| ], | |
| [ # TEXT content case | |
| {'content_type': 'TEXT', 'content_text': f"Paragraph {i} " + ("B" * 100)} for i in range(20) | |
| ], | |
| [ # Mixed VIDEO + TEXT case | |
| {'content_type': 'VIDEO', 'content_text': "Video intro " + ("C" * 100)}, | |
| {'content_type': 'TEXT', 'content_text': "Some explanation " + ("D" * 100)}, | |
| 'A' * 20, # Long string case to test trimming (reduced size) | |
| [ # VIDEO content case | |
| {'content_type': 'VIDEO', 'content_text': f"Video transcript {i} " + ("A" * 20)} for i in range(10) | |
| ], | |
| [ # TEXT content case | |
| {'content_type': 'TEXT', 'content_text': f"Paragraph {i} " + ("B" * 10)} for i in range(20) | |
| ], | |
| [ # Mixed VIDEO + TEXT case | |
| {'content_type': 'VIDEO', 'content_text': "Video intro " + ("C" * 10)}, | |
| {'content_type': 'TEXT', 'content_text': "Some explanation " + ("D" * 10)}, |
aa3daa4 to
3caaa4e
Compare
Description:
This PR addresses a bug where excessively large system messages, typically caused by unit content with extensive video transcripts, result in Xpert message failures. In these scenarios, the system prompt consumes the entire token limit, leaving no room for learner or LA messages, thus breaking the interaction flow.
Resolution:
Test Coverage:
All tests and lint checks pass.
Results (1.32s):
55 passed
2U Private Jira Link
Jira ticket: https://2u-internal.atlassian.net/browse/COSMO2-14