Skip to content

Add LLM-assisted instruction extraction from natural language chat#586

Merged
Chris0Jeky merged 12 commits intomainfrom
enhance/573-llm-instruction-extraction
Mar 29, 2026
Merged

Add LLM-assisted instruction extraction from natural language chat#586
Chris0Jeky merged 12 commits intomainfrom
enhance/573-llm-instruction-extraction

Conversation

@Chris0Jeky
Copy link
Copy Markdown
Owner

Summary

  • OpenAI and Gemini providers now extract structured instructions from natural language via LLM structured output (JSON mode)
  • A shared LlmInstructionExtractionPrompt defines the system prompt and response parser
  • ChatService uses LLM-extracted instructions when available, falls back to static classifier
  • Multi-instruction support: a single message can produce multiple proposals
  • Mock provider unchanged for deterministic test behavior
  • Review-first gate preserved: extracted instructions become proposals, not direct mutations

Closes #573

Test plan

  • Existing mock-based tests pass unchanged (960 application tests pass)
  • New LlmInstructionExtractionPromptTests (12 tests) verify JSON parsing, code fences, edge cases
  • New ChatServiceTests (4 tests) verify instruction extraction flow, fallback, multi-proposal
  • Updated provider tests account for system prompt injection
  • dotnet test passes clean across all projects (1,544 total tests)
  • Manual: "can you create onboarding tasks?" with real provider should produce a proposal

Extend LlmCompletionResult with optional Instructions list for LLM-extracted
structured instructions. Extend ChatCompletionRequest with optional SystemPrompt
field for provider-specific system prompts.
Shared system prompt and JSON response parser used by OpenAI and Gemini providers
to extract actionable board instructions from natural language chat messages.
Handles markdown code fences, missing fields, and malformed JSON gracefully.
Add system prompt for instruction extraction, request JSON mode via
response_format, and parse structured output into LlmCompletionResult.Instructions.
Falls back to static LlmIntentClassifier when structured parse fails.
Add system prompt prepended as user message, request JSON via responseMimeType,
and parse structured output into LlmCompletionResult.Instructions.
Falls back to static LlmIntentClassifier when structured parse fails.
When LlmCompletionResult.Instructions has entries, iterate each and call
ParseInstructionAsync individually. Falls back to raw user message when no
instructions are extracted. Supports multiple proposals from a single message.
OpenAI and Gemini providers now prepend a system prompt message, so role
mapping tests expect two messages instead of one.
LlmInstructionExtractionPromptTests: 12 tests covering valid/invalid JSON,
code fences, missing fields, multiple instructions, and system prompt content.
ChatServiceTests: 4 new tests for LLM-extracted instruction flow, fallback
behavior, multi-instruction proposals, and empty instructions list handling.
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements LLM-assisted instruction extraction for the OpenAI and Gemini providers, enabling the conversion of natural language chat messages into structured board actions. Key changes include a new shared system prompt, JSON response parsing logic, and updates to the ChatService to support multiple extracted instructions. Feedback identifies a high-severity issue in the Gemini provider where system prompts are incorrectly prepended as user messages, which may violate API role requirements. Furthermore, the markdown stripping logic for parsing LLM responses is noted as fragile and should be improved using a more robust brace-matching strategy.

Comment on lines 41 to 61
var systemPrompt = request.SystemPrompt ?? LlmInstructionExtractionPrompt.SystemPrompt;
var allMessages = new List<object>
{
new
{
role = "user",
parts = new[] { new { text = systemPrompt } }
}
};
allMessages.AddRange(request.Messages.Select(MapMessage));

message.Content = JsonContent.Create(new
{
contents = request.Messages.Select(MapMessage).ToArray(),
contents = allMessages.ToArray(),
generationConfig = new
{
temperature = request.Temperature,
maxOutputTokens = request.MaxTokens
maxOutputTokens = request.MaxTokens,
responseMimeType = "application/json"
}
});
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current implementation prepends the system prompt as a user message. This will cause API errors for any multi-turn conversation, as the Gemini API requires roles to alternate between user and model. Prepending a user message will result in two consecutive user messages at the start of the conversation.

The recommended way to provide system instructions to Gemini models is by using the system_instruction field at the top level of the request payload. This avoids altering the message history and ensures the role sequence remains valid.

            var systemPrompt = request.SystemPrompt ?? LlmInstructionExtractionPrompt.SystemPrompt;
            message.Content = JsonContent.Create(new
            {
                contents = request.Messages.Select(MapMessage).ToArray(),
                system_instruction = new
                {
                    parts = new[] { new { text = systemPrompt } }
                },
                generationConfig = new
                {
                    temperature = request.Temperature,
                    maxOutputTokens = request.MaxTokens,
                    responseMimeType = "application/json"
                }
            });

Comment on lines +65 to +73
var trimmed = responseBody.Trim();
if (trimmed.StartsWith("```", StringComparison.Ordinal))
{
var firstNewline = trimmed.IndexOf('\n');
if (firstNewline >= 0)
trimmed = trimmed[(firstNewline + 1)..];
if (trimmed.EndsWith("```", StringComparison.Ordinal))
trimmed = trimmed[..^3].TrimEnd();
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current logic for stripping markdown code fences is not fully robust. It can fail if the LLM returns a fenced JSON block without a newline after the language specifier (e.g., ```json{...}```). This would cause JSON parsing to fail and the system to incorrectly fall back to the static classifier.

A more resilient approach is to find the first opening brace { and the last closing brace } to extract the JSON object, as this is less sensitive to variations in markdown formatting.

            var firstBrace = responseBody.IndexOf('{');
            var lastBrace = responseBody.LastIndexOf('}');

            if (firstBrace == -1 || lastBrace < firstBrace)
                return false; // Not a valid JSON object structure

            var trimmed = responseBody[firstBrace..(lastBrace + 1)];

Probe requests now pass SystemPrompt = "" to avoid forcing JSON mode,
which would cause the LLM to return JSON instead of the expected "OK"
response. Instruction extraction system prompt and JSON mode are only
applied when SystemPrompt is null (the default for chat requests).
@Chris0Jeky
Copy link
Copy Markdown
Owner Author

Adversarial Self-Review

Issues Found and Fixed

  1. Probe requests broken by JSON mode (fixed in 43d5acf): Probe requests (Reply with exactly: OK) were getting the instruction extraction system prompt and JSON mode forced on, which would cause the LLM to either return JSON instead of "OK" or error on non-JSON output. Fixed by having probes pass SystemPrompt = "" to opt out.

Remaining Considerations

Prompt injection risk (low):

  • The system prompt instructs the LLM to respond with JSON only. A malicious user message could attempt to override this (e.g., "ignore the JSON format and ..."). However:
    • The existing ContainsBlockedPromptPattern check in ChatService already blocks common prompt injection phrases
    • Even if the LLM returns non-JSON, TryParseStructuredResponse fails gracefully and falls back to the static classifier
    • Instructions still go through ParseInstructionAsync which validates against structured patterns before creating proposals
    • The review-first gate (proposals, not direct mutations) is preserved

Edge cases handled:

  • Empty/null instructions list: falls back to raw user message
  • Malformed JSON from LLM: falls back to static classifier
  • Markdown code fences in LLM response: stripped before parsing
  • Partial success in multi-instruction: reports successful proposals, surfaces first failure

Potential follow-up items:

  • The system prompt is static and does not include board context (column names, existing cards). Board-context prompting (Board-context-aware LLM prompting for chat proposals #575) would improve instruction quality.
  • Multi-instruction support creates one proposal per instruction rather than batching. Issue Multi-instruction parsing for batch chat requests #574 could optimize this.
  • responseMimeType for Gemini and response_format for OpenAI are always sent for non-probe requests, even when structured output is not strictly needed. This is harmless but slightly wasteful for conversational messages.
  • The firstSuccess variable in ChatService is computed but not directly used (only proposalIds is used for the response). This is intentional for future use but could be cleaned up.

…pt as user message

The Gemini API supports a top-level system_instruction field for system
prompts. Sending the system prompt as a user message breaks multi-turn
conversations by creating consecutive user messages. This moves the
system prompt to system_instruction and omits it when empty (e.g. for
probe requests). Updates tests to verify the new payload structure.
…raction

The regex-based code fence stripping fails when the LLM returns JSON
without a newline after the language specifier (e.g. ```json{...}```).
Use brace-matching (first '{' to last '}') to reliably extract the JSON
object regardless of surrounding text or formatting. Adds tests for the
no-newline edge case, bare code fences, and JSON with surrounding prose.
@Chris0Jeky
Copy link
Copy Markdown
Owner Author

Gemini code review findings fixed

1. System prompt sent via system_instruction (HIGH)

Commit: aefdbc3

Moved the system prompt from being prepended as a user message in the contents array to the Gemini API's native system_instruction top-level field. This prevents consecutive user messages that break multi-turn conversations. When the system prompt is empty (e.g. probe requests), the system_instruction field is omitted entirely.

Files changed:

  • backend/src/Taskdeck.Application/Services/GeminiLlmProvider.cs
  • backend/tests/Taskdeck.Application.Tests/Services/GeminiLlmProviderTests.cs -- updated null-role test, added two new tests verifying system_instruction presence/absence

2. Brace-matching JSON extraction (HIGH)

Commit: 351a349

Replaced the regex-based markdown code fence stripping with brace-matching (IndexOf('{') to LastIndexOf('}')). The old approach failed when the LLM returned JSON without a newline after the language specifier (e.g. ```json{"reply":...}```). The new approach handles all formatting variations: code fences with/without newlines, bare JSON, and JSON with surrounding prose.

Files changed:

  • backend/src/Taskdeck.Application/Services/LlmInstructionExtractionPrompt.cs
  • backend/tests/Taskdeck.Application.Tests/Services/LlmInstructionExtractionPromptTests.cs -- added 3 new edge case tests

Verification

All 1,690 backend tests pass (0 failures): dotnet test backend/Taskdeck.sln -c Release -m:1

@Chris0Jeky Chris0Jeky merged commit f0741cf into main Mar 29, 2026
18 checks passed
@github-project-automation github-project-automation bot moved this from Pending to Done in Taskdeck Execution Mar 29, 2026
@Chris0Jeky Chris0Jeky deleted the enhance/573-llm-instruction-extraction branch March 29, 2026 23:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

LLM-assisted instruction extraction from natural language chat

1 participant