Skip to content

Cohort builder code API#6

Open
azimov wants to merge 23 commits intodevelopfrom
phentoype-skill
Open

Cohort builder code API#6
azimov wants to merge 23 commits intodevelopfrom
phentoype-skill

Conversation

@azimov
Copy link
Collaborator

@azimov azimov commented Feb 3, 2026

This is an experimental branch to play with approaches to build cohorts with.
Things implemented:

  1. Recreation of Capr style procedural API
  2. Implementation of a new context manager approach - this is designed to make it easier to read and parse code descriptions of phenotypes.
  3. An example "SKILL.md" to be used by an LLM. The idea being you feed it a clinical description and some concept sets (could be done after) and it should workout the cohort logic. This is not straightforward as the models appear to 'overfit' to the examples and output a lot of junk. I suspect this will happen with any approaches you try to get the coding agents or chatbots to use, so I'm firmly of the opinion 1) that human review is required on all outputted logic; and 2) a solution that a human can easily modify is prefered over having to go back and forth with a bot to get the JSON right.
    with context also seems to help the LLM context in deep nested structures more than the class based definitions

Example code for cohort creation with my preffered API:

from circe.cohort_builder import CohortBuilder
from circe.vocabulary import concept_set, descendants
from circe.api import build_cohort_query

# 1. Define concept sets
t2dm = concept_set(descendants(201826), id=1, name="T2DM")
metformin = concept_set(descendants(1503297), id=2, name="Metformin")

# 2. Build cohort using context manager
with CohortBuilder("New Metformin Users with T2DM") as cohort:
    cohort.with_concept_sets(t2dm, metformin)
    cohort.with_drug(concept_set_id=2)  # Entry: metformin exposure
    cohort.first_occurrence()  # First exposure only
    cohort.with_observation_window(prior_days=365)  # 365 days prior
    cohort.min_age(18)  # Adults only
    cohort.require_condition(concept_set_id=1, within_days_before=365)

    with cohort.include_rule("No Prior Insulin") as rule:
        rule.exclude_drug(3, anytime_before=True)

As soon as a context manager with block is ended the cohort expression is built (so this is where exceptions might occur.

The API here just works with the models so experimneting with different APIs is certainly possible.

azimov and others added 23 commits January 21, 2026 09:25
…api-tooling

# Conflicts:
#	gpt4o_cohort_builder_system_prompt.md
…ch provide llms with context in complex nested situations. Added functionality for agents to get this skill and use it
@azimov azimov changed the base branch from main to develop February 4, 2026 16:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant