agent-evals uses keyword matching and explicit domains fields to classify what each agent covers. This file is the canonical reference for all recognized domains, their keywords, and the boundary probes used during live testing.
Keywords: backend, server, api, rest, graphql, grpc, microservice, service layer, business logic, middleware, endpoint, request handling, http server
Boundary probes test against: frontend, devops, databases
Keywords: frontend, front-end, react, vue, angular, svelte, css, html, browser, dom, ui component, web app, responsive, accessibility, a11y, tailwind, next.js, nuxt
Boundary probes test against: backend, devops
Keywords: database, sql, postgres, mysql, mongodb, redis, query optimization, indexing, schema, migration, orm, sqlite, dynamodb, cassandra, connection pool, transaction
Boundary probes test against: devops, frontend
Keywords: devops, ci/cd, pipeline, docker, kubernetes, k8s, terraform, ansible, infrastructure, deployment, helm, github actions, gitlab ci, jenkins, argocd, container
Boundary probes test against: frontend, databases
Keywords: security, authentication, authorization, oauth, jwt, encryption, vulnerability, penetration, owasp, cors, csrf, xss, rbac, sso, zero trust, secrets management, tls, certificate, firewall, audit log
Boundary probes test against: databases, devops
Keywords: testing, test, unit test, integration test, e2e, coverage, tdd, bdd, cypress, playwright, jest, pytest, vitest, test fixture, mock, stub, snapshot test, load test, regression test
Boundary probes test against: architecture, backend
Keywords: architecture, system design, design pattern, microservices, monolith, event sourcing, cqrs, domain-driven, hexagonal, clean architecture, solid, api gateway, service mesh, saga pattern
Boundary probes test against: backend, databases
Keywords: distributed, consensus, replication, partition, raft, paxos, eventual consistency, message queue, kafka, event-driven, pub/sub, rabbitmq, nats, grpc streaming, load balancing, circuit breaker
Boundary probes test against: frontend, databases
Keywords: mobile, ios, android, react native, flutter, swift, kotlin, xcode, app store, google play, push notification, deep link, mobile ui
Boundary probes test against: backend, security
Keywords: machine learning, deep learning, neural network, training, inference, pytorch, tensorflow, transformer, fine-tuning, rag, embedding, llm, prompt engineering, classification, regression, nlp, computer vision, reinforcement learning, diffusion model, vector database
Boundary probes test against: distributed_systems, api_design
Keywords: data science, data analysis, pandas, numpy, jupyter, visualization, statistics, data pipeline, etl, data warehouse, spark, airflow, dbt, feature engineering, a/b test, experiment, dashboard, data lake, bigquery, snowflake, redshift
Boundary probes test against: distributed_systems, devops
Keywords: aws, azure, gcp, cloud, s3, ec2, lambda, serverless, cloud function, cloud run, iam, vpc, cdn, route 53, cloudfront, load balancer, auto scaling, fargate, ecs, cloud formation
Boundary probes test against: frontend, databases
Keywords: observability, monitoring, logging, tracing, metrics, prometheus, grafana, datadog, opentelemetry, alerting, sli, slo, sla, incident, on-call, pagerduty, kibana, elasticsearch, apm
Boundary probes test against: frontend, databases
Keywords: api design, openapi, swagger, rest api, api versioning, rate limiting, pagination, hateoas, api gateway, webhook, idempotent, api contract, protobuf, schema registry, backward compatible
Boundary probes test against: devops, ml_ai
Keywords: legal, law, regulation, compliance, contract, liability, intellectual property, gdpr, hipaa, terms of service, privacy policy, copyright, patent
Boundary probes test against: distributed_systems, security
Keywords: medical, clinical, diagnosis, treatment, patient, pharmacology, symptom, dosage, contraindication, clinical trial, healthcare, therapeutic
Boundary probes test against: ml_ai, security
Keywords: financial, accounting, revenue, profit, balance sheet, investment, portfolio, tax, audit, budgeting, financial model, valuation, equity, debt, forex
Boundary probes test against: data_science, security
Keywords: writing, copywriting, content, blog, article, editorial, prose, narrative, technical writing, documentation, style guide, tone of voice
Boundary probes test against: databases, security
Every agent receives three generic out-of-scope probes regardless of domain:
- Time-sensitive factual question (Federal Reserve interest rate)
- Medical question (warfarin drug interactions)
- Legal question (GPL licensing in proprietary products)
These verify that agents hedge appropriately on questions that fall outside any technical domain.
You can customize domains without modifying source code by adding a domains section to your agent-evals.yaml:
# Use all 18 built-in domains (default when omitted)
# domains:
# Select specific built-ins only
domains:
- backend
- frontend
- databases
# Extend a built-in with extra keywords
domains:
- name: backend
extends: builtin
keywords: [axum, actix-web, tokio]
# Add a fully custom domain
domains:
- name: payments
keywords: [payment gateway, stripe, plaid, ach transfer]
# Mix built-in refs, extensions, and custom domains
domains:
- backend
- frontend
- name: backend
extends: builtin
keywords: [axum, actix-web, tokio]
- name: payments
keywords: [payment gateway, stripe, plaid, ach transfer]Each entry is either a string (built-in reference) or a map with:
name(required) — the domain identifierkeywords(required) — list of keywords to match in agent promptsextends: builtin(optional) — merge your keywords onto the built-in keyword list
Edge cases:
- Omitted or empty
domainslist returns all built-ins - Unknown string references are skipped with a stderr warning
- Duplicate domain names: last entry wins
extends: builtinfor an unknown built-in: treated as custom-only- Custom domain with no keywords: skipped
To add a new built-in domain:
- Check this file first. New domains must not duplicate or conflict with existing domains. If an existing domain already covers your use case, consider adding keywords to it instead of creating a new domain.
- Add keywords to
BuiltinDomainsininternal/analysis/domains.go. - Add 3-4 boundary probe questions to
BoundaryQuestionsininternal/probes/questions.go. - Include at least one in-domain calibration question (where the
domainfield matches the domain key) and at least two cross-domain boundary questions. - Run
go test ./...to verify nothing breaks. - Update this file with the new domain, its keywords, and which domains its probes test against.
Probe questions should be specific enough that a knowledgeable agent could answer them, but clearly outside scope for agents in other domains. Avoid questions that are trivially googlable or that overlap heavily with multiple domains.