Skip to content

Implement v3 fully iterative agent with user approval workflow and enhanced schema detection#1

Draft
Copilot wants to merge 15 commits intov3from
copilot/iterative-system-implementation
Draft

Implement v3 fully iterative agent with user approval workflow and enhanced schema detection#1
Copilot wants to merge 15 commits intov3from
copilot/iterative-system-implementation

Conversation

Copy link
Copy Markdown

Copilot AI commented Dec 12, 2025

V3 Iterative System Implementation

Status: Complete with concise reasoning, proper parameter passing, and improved schema extraction

Changes Made

  1. ✅ Core iterative system with single-action execution
  2. ✅ User approval required before each action
  3. ✅ Balanced decision-making
  4. ✅ Automatic getSchema after page navigation
  5. ✅ Full schema data in decision context
  6. Concise reasoning output (1-2 sentences max)
  7. URLs and parameters in action descriptions
  8. Enhanced schema filtering - includes all important input fields
  9. ✅ Stop/Skip functionality at any point

Core Features

1. Constrained Reasoning Length:

  • Changed schema: "reasoning": "1-2 sentence explanation only"
  • Added Rule 11: "CONCISE REASONING: Keep reasoning to 1-2 sentences maximum. No lengthy explanations or circular thinking."
  • Examples now show concise format
  • Eliminates verbose, repetitive thinking

2. Required Parameters in Actions:

  • Updated Rule 7: "For opening tabs, ALWAYS include the full URL in the action."
  • Examples show correct format: "Open a new tab with URL https://www.youtube.com"
  • Ensures executor can extract all required parameters (especially URLs)
  • Fixes "URL is required for open action" error

3. Enhanced Schema Extraction:

  • Search inputs, text inputs, and other important form fields always included
  • Enhanced accessible name detection (title, name attributes as fallbacks)
  • Special handling for text-entry elements (search, text, email, password, tel, url)
  • Elements with searchbox/textbox/combobox roles always included
  • Prevents missing critical input fields like search boxes

Example Output Format

Before (problematic):

{
  "action": "manageTabs: open new tab",
  "reasoning": "Okay, let's break this down... [300+ words of circular thinking]..."
}

Error: "URL is required for open action"

Schema Issue: Only 13 elements, missing search input field

After (correct):

{
  "action": "Open a new tab with URL https://www.youtube.com",
  "reasoning": "User wants to access YouTube in a new tab.",
  "message": "Opening YouTube..."
}

Result: Executor successfully extracts URL and executes

Schema Fixed: Includes search input field, agent can type queries before clicking search button

Technical Changes

Prompt Updates:

  • JSON schema specifies "1-2 sentence explanation only"
  • Rule 7 emphasizes including URLs in actions
  • Rule 11 explicitly constrains reasoning length
  • Examples demonstrate concise format with full parameters

Schema Extraction Improvements:

  • Enhanced computeAccessibleName() to check title and name attributes
  • Added isImportantInputType() helper function
  • Special handling for text-entry input types
  • Always includes search boxes, text inputs, textareas
  • Better support for modern web applications

Benefits:

  • ✓ Faster, more readable reasoning
  • ✓ No circular or repetitive thinking
  • ✓ All parameters passed correctly to executor
  • ✓ Search and text inputs always visible in schema
  • ✓ Agent can type into inputs before clicking buttons
  • ✓ Fewer execution errors
  • ✓ Better user experience

Testing

Test scenario: "Open YouTube in another tab and search for 'never gonna give you up'"

Expected behavior:

  1. Agent outputs: "Open a new tab with URL https://www.youtube.com"
  2. Reasoning: 1-2 sentences max
  3. Executor extracts URL from action description
  4. Tab opens successfully
  5. Schema includes search input field
  6. Agent types query into search box
  7. Agent clicks search button to submit
Original prompt

In v3, I want a completely iterative system instead of creating a plan.
You must not generate or follow a multi-step plan.
Instead, operate as a fully iterative agent.
On every turn you must re-evaluate the user’s goal, the latest context, and all tool results seen so far.
Decide only the single best next action at that moment. Do not commit to future steps.
Your reasoning should be fresh each iteration, allowing you to change direction whenever new information appears.
If the next action is unclear, ambiguous, risky, irreversible, or missing required details, ask the user a clarifying question.
If an external action or environment state is required, call the appropriate tool.
After each tool call, observe the result and reassess the situation from scratch.
Continue this interpret → decide → act → observe loop until the user’s goal is achieved or the user stops the task.


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI and others added 6 commits December 12, 2025 05:12
Co-authored-by: Varun-Patkar <50433487+Varun-Patkar@users.noreply.github.com>
Co-authored-by: Varun-Patkar <50433487+Varun-Patkar@users.noreply.github.com>
Co-authored-by: Varun-Patkar <50433487+Varun-Patkar@users.noreply.github.com>
Co-authored-by: Varun-Patkar <50433487+Varun-Patkar@users.noreply.github.com>
Co-authored-by: Varun-Patkar <50433487+Varun-Patkar@users.noreply.github.com>
Co-authored-by: Varun-Patkar <50433487+Varun-Patkar@users.noreply.github.com>
Copilot AI changed the title [WIP] Implement iterative system approach in version 3 Implement v3 fully iterative agent - single-action execution with continuous re-evaluation Dec 12, 2025
Copilot AI requested a review from Varun-Patkar December 12, 2025 05:25
@Varun-Patkar
Copy link
Copy Markdown
Owner

@copilot I tested it. The reasoning model thinks too long, especially when prevbious iteration fails. it double guesses itself a lot. Also before executing always ask user for permission, with them being able to stop it at any point.

Co-authored-by: Varun-Patkar <50433487+Varun-Patkar@users.noreply.github.com>
Copy link
Copy Markdown
Author

Copilot AI commented Dec 12, 2025

@copilot I tested it. The reasoning model thinks too long, especially when prevbious iteration fails. it double guesses itself a lot. Also before executing always ask user for permission, with them be...

Fixed both issues in commit 7f61eaf:

1. Faster decision-making: Added prompt rules to reduce overthinking

  • "BE CONCISE: Keep reasoning brief (1-2 sentences max)"
  • "TRUST YOURSELF: Don't second-guess. If previous action failed, simply try next logical alternative"

2. User approval required: Each action now shows Execute/Skip/Stop buttons before execution

┌─────────────────────────────────────────────────────┐
│ ⚡ Next Action: Open new tab with youtube.com      │
│                                                     │
│  [✓ Execute]  [⏭️ Skip]  [⏹️ Stop]                  │
│   (Green)     (Orange)   (Red)                     │
└─────────────────────────────────────────────────────┘

Button behavior:

  • Execute → Runs action, shows result, continues to next decision
  • Skip → Skips action, asks agent for alternative approach
  • Stop → Halts iteration completely, re-enables user input

Users now have full control at every step.

…constant for delay

Co-authored-by: Varun-Patkar <50433487+Varun-Patkar@users.noreply.github.com>
Copilot AI changed the title Implement v3 fully iterative agent - single-action execution with continuous re-evaluation Implement v3 fully iterative agent with user approval workflow Dec 12, 2025
@Varun-Patkar
Copy link
Copy Markdown
Owner

@copilot Now it thinks too less and messes up the tool to use. The approve etc is good no need to change that. also we need to use the schema tool as input whenever we change page like manage tabs, etc. Wehnever page has a cyhange. The page doewsnt change on click etc. but after navigating to another page you need to call that in default to decide next input tool. As that way iut is informed what is has toi work with on the page.

Copilot AI and others added 2 commits December 13, 2025 06:23
Co-authored-by: Varun-Patkar <50433487+Varun-Patkar@users.noreply.github.com>
Co-authored-by: Varun-Patkar <50433487+Varun-Patkar@users.noreply.github.com>
Copy link
Copy Markdown
Author

Copilot AI commented Dec 13, 2025

@copilot Now it thinks too less and messes up the tool to use. The approve etc is good no need to change that. also we need to use the schema tool as input whenever we change page like manage tabs, et...

Fixed in commits 23ee9a2 and 795fce5:

1. Balanced reasoning (23ee9a2):

  • Removed overly restrictive "BE CONCISE" rule
  • Added "THINK CAREFULLY: Take time to choose the right tool and parameters"
  • Agent now makes more thoughtful decisions

2. Auto-schema after navigation (23ee9a2):

  • Automatically calls getSchema after manageTabs or navigate
  • Shows "🔍 Auto Schema: Found X interactive elements" in UI
  • Ensures schema is always available on new pages

3. Full schema in context (795fce5):

  • Most recent schema now included at top of decision context
  • Agent sees full element details: {"id": 1, "type": "input", "label": "Search", ...}
  • Both decision agent and executor have complete schema data
  • Better tool selection with accurate element IDs

The agent now has proper context to make informed decisions about which elements to interact with.

Co-authored-by: Varun-Patkar <50433487+Varun-Patkar@users.noreply.github.com>
Copilot AI changed the title Implement v3 fully iterative agent with user approval workflow Implement v3 fully iterative agent with user approval workflow and automatic schema detection Dec 13, 2025
@Varun-Patkar
Copy link
Copy Markdown
Owner

@copilot Here is the first input. See howthe reaSONING IS so big. also still it messed up the fact that url is not in executor call of youtube. earlier this was working
Open yoiutube in another tab and search for "never gonna give you up"

View Reasoning
Okay, let's break this down. The user wants me to open YouTube in another tab and search for "never gonna give you up".

First, I'm on the Google homepage. The task is to open YouTube in a new tab. The known URLs include youtube.com, so I should navigate to that. But wait, the user said "in another tab", so I need to open a new tab first.

Wait, the available tools include manageTabs. So the first step is to open a new tab. Then navigate to youtube.com. But the user also wants to search for the specific query. However, the instructions say to do it in another tab, so the next action is to create a new tab, then go to YouTube.

Wait, the user's instruction says "Open yoiutube in another tab and search for...". So first, open a new tab with YouTube. Then, once on YouTube, perform the search.

But according to the rules, after navigating to a new tab (using manageTabs), I need to call getSchema before interacting. Wait, the schema is for the page structure. So first, manageTabs to open a new tab, then navigate to youtube.com, then wait for the page to load, then getSchema to check if the search box is present.

Wait, the user's previous conversation: the user said "Open yoiutube in another tab and search for 'never gonna give you up'".

So current page is Google. The first step is to open a new tab. The available tools are manageTabs, navigate, etc.

Wait, the correct sequence here is:

Open a new tab (manageTabs with action to open a new tab)
Navigate to youtube.com in that new tab
Wait for the page to load
Get the schema to find the search input
Type the query and submit
But according to the rules, after navigating to a new tab (using manageTabs), the next action should be waitFor. Wait, the rule says: "After opening tabs or navigating, next action should be waitFor page load."

Wait, the user's instruction is to open YouTube in another tab. So the immediate next action is to open a new tab. Then navigate to YouTube.

Wait, but the current state is that the user is on Google. The user wants to open YouTube in another tab. So first, manageTabs to open a new tab. Then, navigate to youtube.com.

Wait, but the user says "Open yoiutube in another tab". So first, create a new tab, then go to YouTube.

But in the current iteration, the user is on Google. So the first action is to open a new tab.

Wait, the available tools include manageTabs. So the action would be to open a new tab.

Wait, the manageTabs tool can open a new tab. So the next action is manageTabs with action 'open new tab'.

But the user's instruction is to open YouTube in another tab, so after opening the new tab, navigate to YouTube.

Wait, but the current step is the first action. So the first action is to open a new tab.

Wait, but the user's instruction is to open YouTube in another tab and search. So first, open the tab, then go to YouTube.

But according to the rules, after navigating (which includes manageTabs), the next action is waitFor.

Wait, no. The manageTabs is a separate tool. Let me check the tools again.

Available Tools: click, type, select, pressKey, scroll, navigate, manageTabs, waitFor, getSchema, getHTML

So manageTabs is a tool. The description says: "manageTabs: open, close, switch, or manage browser tabs."

So to open a new tab, the action would be manageTabs with action 'open new tab'.

But in the context of the user's request, the first step is to open a new tab. Then, navigate to YouTube.

But the user's instruction is to "Open yoiutube in another tab and search for...".

So the immediate next action is to open a new tab.

Wait, but the user's current page is Google. So the next action is to open a new tab.

Wait, but the user is asking to open YouTube in another tab. So the first step is to create a new tab. Then, navigate to YouTube.

But the current step is the first action. So the next action is to open a new tab.

Wait, but the user's previous conversation shows that the user is on Google. So the agent should first open a new tab.

Wait, the rules state that after any page navigation (manageTabs, navigate), you must call getSchema before attempting interactions.

Wait, no. The rule says: "After opening tabs or navigating, next action should be waitFor page load."

Wait, the rule says: "After opening tabs or navigating, next action should be waitFor page load."

Wait, the exact rule 9: "After opening tabs or navigating, next action should be waitFor page load."

Wait, the user's instruction is to open a new tab (manageTabs), then navigate to YouTube. So after manageTabs, the next action is to navigate to YouTube.

Wait, no. Wait, the manageTabs action is opening a new tab, so the next step is to navigate to the URL in that new tab.

Wait, the manageTabs tool's parameters would specify the action (open new tab), but how to specify which tab? Maybe the manageTabs action is to open a new tab, and then the next step is to navigate to the URL in that new tab.

Wait, perhaps the correct sequence is:

manageTabs to open a new tab.

navigate to youtube.com in that new tab.

But according to the tools, navigate is a separate tool. So after opening the new tab, the next action is navigate to youtube.com.

But the rule says that after navigating (using navigate), the next action should be waitFor. Wait, no. Rule 9 says: "After opening tabs or navigating, next action should be waitFor page load."

Wait, the rule is a bit confusing. Let me read it again.

Rule 9: "After opening tabs or navigating, next action should be waitFor page load."

Wait, so if you open a new tab (manageTabs), then the next action should be waitFor to load the page. But no, because manageTabs opens a new tab, but the page isn't loaded yet. Wait, no. If you open a new tab, you have to navigate to a URL first. So the correct sequence is:

manageTabs to open a new tab.

Then, navigate to the URL (youtube.com).

Then, waitFor the page to load.

Wait, but the rule says after navigating (which is the navigate tool), you must waitFor. So:

After navigate (which is the action of going to a URL), the next action is waitFor.

So the steps would be:

manageTabs (open new tab)

navigate to youtube.com

waitFor page load

getSchema

interact with the search box.

But according to the current situation, the user is on Google. So the first action is to open a new tab.

Wait, but the user's instruction is to open YouTube in another tab and search for the query. So the immediate next action is to open a new tab.

Wait, but the user's previous conversation shows that the user is on Google. So the agent's first action should be to open a new tab.

Wait, but the problem here is that the agent needs to execute the next action. So the first step is to open a new tab.

Wait, but the user's instruction is to open YouTube in another tab. So the agent should open a new tab, then navigate to YouTube.

But according to the rules, after opening a tab (manageTabs), the next action should be waitFor. Wait, no. The manageTabs action is to open a new tab. Then, the next action is to navigate to the URL in that tab.

Wait, maybe the manageTabs tool is used to open a new tab, and then navigate to the URL. So the next action after manageTabs is navigate to youtube.com.

Wait, but the rule says: "After opening tabs or navigating, next action should be waitFor page load."

Wait, opening tabs (manageTabs) is a separate action from navigating. So when you open a new tab (manageTabs), the page isn't loaded yet. You need to navigate to the URL first. So the sequence is:

manageTabs (open new tab)

navigate to youtube.com

waitFor page load

getSchema

type search query

pressKey to submit

But according to the problem statement, the agent is supposed to output the single next action. So in the current state, the user is on Google. The next action should be to open a new tab.

Wait, but the user's instruction is to open YouTube in another tab. So the first step is to open a new tab. Therefore, the next action is to manageTabs to open a new tab.

Wait, but the agent has to output the action. So the next action is to open a new tab.

Wait, but the user might be confused because the current page is Google. So the agent needs to open a new tab, then go to YouTube.

But the immediate next action is to open a new tab.

Wait, but the agent must not plan multiple steps. So the immediate next action is to open a new tab.

Wait, but the user's instruction is to open YouTube in another tab. So the agent needs to first open a new tab, then navigate to YouTube.

So the first action is to open a new tab.

Wait, but the available tools include manageTabs. So the action is manageTabs with action 'open new tab'.

Then, after that, navigate to youtube.com.

But in the current iteration, the agent's next action is to open a new tab.

Wait, but the user's instruction is to open YouTube in another tab and search. So the first action is to open a new tab.

Therefore, the next action is to open a new tab.

Wait, but the user is on Google. So the agent should open a new tab.

Wait, but the correct action here is to open a new tab. So the action would be manageTabs with action 'open new tab'.

Wait, but the manageTabs tool's parameters might require more details. However, the problem statement says to be specific. The tool is manageTabs, so the action description should be clear.

Wait, the example shows that manageTabs can be used to open a new tab. So the action is "open a new tab".

Wait, the output JSON should have the action as "manageTabs: open new tab".

But according to the schema, the action is a string. So the action would be "manageTabs: open new tab".

But the rules say: "BE SPECIFIC: Include exact URLs, text, and details in action descriptions."

So the action should be precise. So the action is to open a new tab.

Wait, but the user's instruction is to open YouTube in another tab. So the next action is to open a new tab, then navigate to YouTube.

But the agent must output only the next immediate action.

So in this case, the next action is to open a new tab.

Wait, but the user's previous conversation shows that the user is on Google. So the agent should first open a new tab.

Yes. So the next action is to open a new tab.

Wait, but the problem is that the agent needs to output the action. So the action is manageTabs to open a new tab.

But according to the rules, after opening tabs (manageTabs), the next action should be waitFor. Wait, no. The rule says: "After opening tabs or navigating, next action should be waitFor page load."

Wait, but manageTabs is opening a new tab, not navigating. So after opening the tab, you need to navigate to the URL. So the next action after manageTabs is navigate to youtube.com.

But the agent's current step is to decide the next action. So if the agent is on the Google page, the immediate next action is to open a new tab.

Wait, no. Because the agent is supposed to open YouTube in another tab. So the first action is to open a new tab, then navigate to YouTube.

So the first action is to open a new tab.

Therefore, the next action is to open a new tab.

Wait, but the user is asking to open YouTube in another tab. So the agent's first action is to open a new tab.

Therefore, the action is to manageTabs with open new tab.

But the problem is that the user's instruction is to open YouTube in another tab, so the agent needs to open a new tab, then go to YouTube.

But according to the rules, the agent can only do one action at a time.

So the immediate next action is to open a new tab.

Wait, but the agent must not create multi-step plans. So the agent should only do the next immediate action.

Therefore, the next action is to open a new tab.

Wait, but the user's instruction is to open YouTube in another tab. So the agent needs to open a new tab, then navigate to YouTube.

So the first step is to open a new tab.

Therefore, the correct action is to manageTabs to open a new tab.

But the output should be a JSON with action as "manageTabs: open new tab".

Wait, but the example shows that when navigating, the action is "navigate to URL".

Wait, the available tools include navigate. So to open YouTube in another tab, the agent would first open a new tab (manageTabs), then navigate to YouTube.

But the immediate next action is to open the new tab.

So the answer would be:

needs_action: true

action: "manageTabs: open new tab"

reasoning: "To open YouTube in another tab, first create a new tab."

message: "Opening a new tab to access YouTube..."

But wait, the user's instruction is to open YouTube in another tab and search. So the agent is supposed to first open a new tab, then navigate to YouTube.

But according to the rules, after opening a new tab (manageTabs), the next action should be waitFor. Wait, no. The rule says: "After opening tabs or navigating, next action should be waitFor page load."

Wait, the rule says after opening tabs (manageTabs) or navigating (navigate), the next action should be waitFor.

Wait, but manageTabs is opening a new tab, which doesn't load a page yet. You need to navigate to a URL first. So after manageTabs (open new tab), the next action is navigate to youtube.com.

But according to the rule 9: "After opening tabs or navigating, next action should be waitFor page load."

Wait, the rule is a bit confusing. If you open a new tab (manageTabs), then navigate to a URL (navigate), then you need to waitFor.

Wait, the rule says: "After opening tabs or navigating, next action should be waitFor page load."

So if you open a new tab (manageTabs), then navigate (which is a separate action), then after navigating, the next action is waitFor.

But in this case, the agent is on the Google page. To open YouTube in another tab, the steps are:

manageTabs: open new tab

navigate to youtube.com

waitFor page load

getSchema

type search query

pressKey

But the agent can only do one action at a time.

So the immediate next action after being on Google is to open a new tab.

Wait, but the user is asking to open YouTube in another tab. So the first step is to open a new tab. Then navigate to YouTube.

Therefore, the next action is to open a new tab.

But the rule 9 says that after opening tabs (manageTabs), the next action should be waitFor. Wait, but manageTabs opens a new tab, which is empty. To load a page, you need to navigate. So the waitFor would be after navigating.

Wait, perhaps the rule is that after any navigation (navigate), you need to waitFor. But manageTabs is not navigation. So after manageTabs, you need to navigate, then waitFor.

But in the current situation, the agent is on Google. To open YouTube in another tab, the agent must first open a new tab (manageTabs), then navigate to YouTube (navigate), then waitFor.

So the immediate next action is manageTabs: open new tab.

Then, after that, navigate to YouTube.

But according to the current iteration, the agent's next action is to open a new tab.

Wait, the user's previous conversation shows that the agent is on Google. So the next action is to open a new tab.

Therefore, the correct action is to open a new tab.

Wait, but the user's instruction is to open YouTube in another tab. So the agent needs to open a new tab, then go to YouTube.

Thus, the first step is to open a new tab.

Therefore, the action is manageTabs: open new tab.

But the problem is that the agent must output this as the action.

Wait, but the user's instruction is to open YouTube in another tab and search for the query. So the agent needs to open a new tab, then navigate to YouTube.

The immediate next action is to open a new tab.

So the answer should be:

{ "needs_action": true, "action": "manageTabs: open new tab", "reasoning": "To fulfill the user's request to open YouTube in another tab, first create a new tab.", "message": "Opening a new tab to access YouTube...", "ask_user": null }

But wait, the user said "Open yoiutube in another tab and search for...". The correct URL is youtube.com, but the user misspelled it as "yoiutube". However, the agent should proceed with the correct URL.

Wait, the known URLs include youtube.com, so the agent would navigate to that.

But the next action is to open a new tab, then navigate to YouTube.

So the immediate next action is to open a new tab.

Yes.

Another check: the previous conversation shows that the user is on Google. So the agent needs to open a new tab.

Yes.

Therefore, the correct action is to open a new tab.

Opening a new tab to access YouTube...

💭 To fulfill the user's request to open YouTube in another tab, the immediate next step is to create a new browser tab. This allows navigating to YouTube without disrupting the current Google tab.
✗ Failed: manageTabs: open new tab
Error: URL is required for open action

…iptions

Co-authored-by: Varun-Patkar <50433487+Varun-Patkar@users.noreply.github.com>
Copy link
Copy Markdown
Author

Copilot AI commented Dec 13, 2025

@copilot Here is the first input. See howthe reaSONING IS so big. also still it messed up the fact that url is not in executor call of youtube. earlier this was working
Open yoiutube in another tab an...

Fixed both issues in commit c1e675d:

1. Constrained verbose reasoning:

  • Schema now specifies: "reasoning": "1-2 sentence explanation only"
  • Added Rule 11: "CONCISE REASONING: Keep reasoning to 1-2 sentences maximum. No lengthy explanations or circular thinking."
  • Examples demonstrate concise format

2. Fixed missing URL in executor:

  • Updated Rule 7: "For opening tabs, ALWAYS include the full URL in the action."
  • Examples show correct format: "Open a new tab with URL https://www.youtube.com"
  • Executor can now extract URL from action description

Result:

  • Reasoning now limited to 1-2 sentences instead of 300+ words
  • Actions include all required parameters (URLs, text, etc.)
  • Fixes "URL is required for open action" error

@Varun-Patkar
Copy link
Copy Markdown
Owner

@copilot The get schema is too constrivctive
[
{
"id": 1,
"type": "button",
"role": "button",
"label": "Guide",
"placeholder": null,
"location": "top"
},
{
"id": 2,
"type": "button",
"role": "button",
"label": "Search",
"placeholder": null,
"location": "top"
},
{
"id": 3,
"type": "button",
"role": "button",
"label": "Next",
"placeholder": null,
"location": "top"
},
{
"id": 4,
"type": "a",
"role": "link",
"label": "2:46",
"placeholder": null
},
{
"id": 5,
"type": "div",
"role": "button",
"label": "Go to channel",
"placeholder": null
},
{
"id": 6,
"type": "a",
"role": "link",
"label": "youtube lied again 2 minutes, 46 seconds",
"placeholder": null,
"text": "youtube lied again"
},
{
"id": 7,
"type": "a",
"role": "link",
"label": "ProZD",
"placeholder": null
},
{
"id": 8,
"type": "button",
"role": "button",
"label": "More actions",
"placeholder": null
},
{
"id": 9,
"type": "a",
"role": "link",
"label": "10:14",
"placeholder": null
},
{
"id": 10,
"type": "div",
"role": "button",
"label": "Go to channel",
"placeholder": null
},
{
"id": 11,
"type": "a",
"role": "link",
"label": "Kaafi Gujarati Audience | Stand up Comedy | Abijit Ganguly | Crowd Work 10 minutes, 14 seconds",
"placeholder": null,
"text": "Kaafi Gujarati Audience | Stand up Comedy | Abijit"
},
{
"id": 12,
"type": "a",
"role": "link",
"label": "Abijit Ganguly",
"placeholder": null
},
{
"id": 13,
"type": "button",
"role": "button",
"label": "More actions",
"placeholder": null
}
]

This is the object and we have 13 but the sarch input eolement is not shown so it tries to click search button directly. Which is wrong

Co-authored-by: Varun-Patkar <50433487+Varun-Patkar@users.noreply.github.com>
Copy link
Copy Markdown
Author

Copilot AI commented Dec 13, 2025

@copilot The get schema is too constrivctive
[
{...

Fixed in commit 7382c9b:

Problem: Schema filter was too restrictive and excluded search inputs without explicit labels, showing only 13 elements instead of the search input field.

Solution:

  1. Enhanced accessible name detection:

    • Added title and name attributes as fallbacks
    • Better label detection for inputs
  2. Always include important input types:

    • Search, text, email, password, tel, url inputs
    • Textareas
    • Elements with searchbox/textbox/combobox roles
    • Included even without explicit labels

Result:

  • Search inputs now visible in schema
  • Agent can type query into search box
  • Then click search button to submit
  • Proper interaction flow restored

Co-authored-by: Varun-Patkar <50433487+Varun-Patkar@users.noreply.github.com>
Copilot AI changed the title Implement v3 fully iterative agent with user approval workflow and automatic schema detection Implement v3 fully iterative agent with user approval workflow and enhanced schema detection Dec 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants