Last updated: 2026-03-05 Current status: Poland (
PL) is fully active (1,198 products, 20 categories). Germany (DE) is at full parity (1,066 products, 19 categories). DE graduated from micro-pilot to full market on 2026-03-05. See Section 12 for lessons learned.
This project is designed for eventual EU-wide coverage, but expansion must be methodical, not opportunistic. Each country introduces:
- Different product formulations (same brand, different recipe)
- Different labeling regulations (mandatory vs. voluntary fields)
- Different retail landscapes (stores, private labels)
- Different Nutri-Score adoption status
- Different food safety authorities and additive rules
Rule: A country is never "partially" added. Either all prerequisites are complete and the country is fully operational, or it does not exist in the database.
Before any data for country XX enters the database, all of the following must be true:
- Document which EU regulations apply to food labeling in country
XX - Confirm whether Nutri-Score is mandatory, voluntary, or absent
- Identify the national food safety authority (equivalent of Poland's GIS/SANEPID)
- Document any country-specific additive restrictions beyond EU baseline
- Identify label language(s) and translation requirements
- Identify the top 5–10 retailers in country
XXby market share - Document which retailers have online product catalogs with nutrition data
- Identify major private-label brands and their ownership
- Confirm per-100g labeling convention (vs. per-serving in non-EU contexts)
- Confirm Open Food Facts coverage for country
XX(number of verified products) - Identify at least one reliable primary source for label data
- Test data collection workflow on 5–10 products before committing
- Create pipeline folder:
db/pipelines/<country_lower>/ordb/pipelines/<category>-<country>/ - Verify that the
products(country, brand, product_name)unique constraint handles the new country - Confirm that all schemas support the new country's data without migration changes
- Create
RUN_LOCAL.ps1/RUN_REMOTE.ps1entries for the new pipelines - Set
country_ref.nutri_score_officialfor the new country (trueif officially adopted,falseotherwise) - Update
copilot-instructions.mdto list the new country as active
All of these must be true before activating a second country:
- API supports country filtering on search + listings (
p_countryparam, RPC-compatible) - Auto-country resolution when
p_countryis NULL (resolve_effective_country— user pref → system default) - Alternatives/similarity are country-isolated (inferred from source product)
- Scoring + completeness are country-parameterized (
score_categoryacceptsp_country) - Dashboard stats are country-aware (
v_api_category_overview_by_countryview) - Activation gating exists (
country_ref.is_activeenforced by QA) -
sql_generator.pyacceptscountryparameter (no more hardcoded'PL') - QA checks for cross-country leakage (api_surfaces #16, #17, #18)
- Pilot country with 5–10 products: all QA passes, no cross-country leakage
DE completion date: 2026-03-05 — all prerequisites met, 1,066 products across 19 categories, full QA validation passed (see §12).
Every row in the products table carries a country column. This is the primary isolation mechanism.
-- All pipeline queries MUST filter by country
WHERE p.country = 'XX'- Products from different countries are never mixed in a single pipeline file.
- A product sold in both Poland and Germany is entered as two separate rows with
country = 'PL'andcountry = 'DE', because formulations differ. - Scores are computed per-country, never averaged across countries.
Each country's pipelines are independent:
db/pipelines/
├── chips-pl/ # PL chips
├── dairy/ # PL dairy (legacy naming — no DE counterpart yet uses this)
├── zabka/ # PL Żabka store (PL-only)
├── chips-de/ # DE chips
├── dairy-de/ # DE dairy
├── bread-de/ # DE bread
└── ... # 39 total folders (20 PL + 19 DE)
Naming convention — decided: category + hyphenated country suffix:
| Strategy | Example | When to use | Status |
|---|---|---|---|
| Category-only | dairy/ |
Single-country (legacy PL) | Legacy |
| Category-pl | chips-pl/ |
PL category with DE counterpart | Active |
| Category-de | chips-de/ |
DE category | Active |
| Store-based | zabka/ |
Country-specific store chains | Active (PL) |
Transition rule: When a second country is added for an existing category:
- Rename the PL folder:
chips/→chips_pl/(updateRUN_*.ps1accordingly) - Create the new country folder:
chips_de/ - Both folders now follow the
<category>_<country>pattern - Do NOT rename until the second country is actually added — premature renaming breaks the current working pipeline
- Unhealthiness Scores are not comparable across countries because:
- Formulations differ (same brand/product, different recipe)
- Serving conventions may differ
- Regulatory thresholds (salt, sugar flags) may need country adjustment
- The
v_masterview operates globally but should be filtered bycountryin all queries.
| Aspect | Poland (PL) | Germany (DE) — Example |
|---|---|---|
| Discount leaders | Biedronka, Lidl | Aldi, Lidl |
| Convenience | Żabka | REWE To Go, Aral |
| Hypermarkets | Auchan, Carrefour | Kaufland, Real |
| Private label density | High (Biedronka, Lidl) | Very high (Aldi, Lidl, REWE) |
| Nutri-Score adoption | Voluntary | Voluntary (widely used) |
| Label language | Polish | German |
Implication: Store-based pipelines (like zabka/) are country-specific by definition and cannot be reused across countries. Category-based pipelines (like chips/) can serve as templates but must be duplicated and customized.
When expanding to a new country, follow this process:
# Example: Adding German chips
Copy-Item -Recurse db/pipelines/chips db/pipelines/chips_dePIPELINE__chips_de__01_insert_products.sql
PIPELINE__chips_de__03_add_nutrition.sql
PIPELINE__chips_de__04_scoring.sql
PIPELINE__chips_de__05_source_provenance.sql
In every file, change:
country = 'PL'→country = 'DE'category = 'Chips'remains'Chips'(categories are language-neutral)- Product names → German market SKUs
- Brand names → German market variants
- Store availability → German retailers
- Nutrition values → German label values
Update RUN_LOCAL.ps1 and RUN_REMOTE.ps1 to include the new pipeline folder.
Run the new pipeline against local Supabase. Verify:
- All products insert with
country = 'DE' - No conflict with existing PL products
- Scores compute correctly
-
v_masterview returns both PL and DE rows
The current schema is country-agnostic by design:
products.country— discriminator column, part of the unique constraintproducts— stores scores, flags, confidence, and source provenance inline (source_type,source_url,source_ean)nutrition_facts— linked byproduct_id, inheriting country from the productproduct_allergen_info— linked byproduct_idfor contains/traces declarations
These changes are not needed now but may be required during expansion:
| Change | Trigger |
|---|---|
Add country to source reference table (if introduced) |
Only if multi-source lineage is normalized to a separate table |
Add currency column for price data |
If price tracking is ever added (out of scope) |
Add regulation_ref to products |
To cite country-specific labeling regulations |
| Separate schema per country | Only if dataset exceeds millions of rows |
Rule: Any schema change requires a new Supabase migration. Never modify existing migrations.
| Regulation area | Varies by country? | Impact on scoring |
|---|---|---|
| Mandatory nutrition fields | Slightly | Core 7 fields are EU-wide |
| Nutri-Score adoption | Yes | Affects data availability |
| Traffic light labeling (UK) | Post-Brexit only | Not applicable in EU |
| Additive restrictions | Rarely | Some national bans beyond EU list |
| Organic certification marks | Yes | Different national logos |
| Allergen labeling format | Slightly | Bolding vs. separate list |
| Trans fat declaration | Voluntary in EU | May become mandatory |
This is a suggested order based on data availability, market size, and Nutri-Score adoption:
| Phase | Country | Code | Rationale | Status |
|---|---|---|---|---|
| 1 | Poland | PL | Founder's market; full access to labels | Active — 1,198 products, 20 categories |
| 2 | Germany | DE | Largest EU market; strong Open Food Facts coverage | Active — 1,066 products, 19 categories (full parity since 2026-03-05) |
| 3 | France | FR | Nutri-Score origin country; best data quality | Planned |
| 4 | Spain | ES | Large market; growing Nutri-Score adoption | Future |
| 5 | Italy | IT | Complex food landscape; controversial with NS | Future |
| 6 | Czechia | CZ | Regional neighbor; similar retail landscape to PL | Future |
This roadmap is non-binding. Expansion happens only when prerequisites in Section 2 are fully met.
- ❌ Do NOT add a country "just to test" — even test data pollutes the schema
- ❌ Do NOT share pipeline files between countries — always duplicate and customize
- ❌ Do NOT assume same brand = same product across countries
- ❌ Do NOT compare Unhealthiness Scores across countries without methodology review
- ❌ Do NOT modify Polish pipelines to accommodate another country's needs
- ❌ Do NOT use non-EU data sources (USDA, etc.) for EU country products
- ❌ Do NOT add a country if Open Food Facts has < 100 verified products for it
When adding a new country, configure search infrastructure:
INSERT INTO language_ref (code, name_en, name_native, sort_order)
VALUES ('xx', 'Language', 'Native Name', 10)
ON CONFLICT (code) DO NOTHING;Add a WHEN clause to the country CASE in build_search_vector():
WHEN 'XX' THEN 'config_name'::regconfigAvailable built-in configs: simple, english, german, french, italian,
spanish, dutch, danish, swedish, norwegian, finnish, russian.
Insert bidirectional synonym pairs for common food terms:
INSERT INTO search_synonyms (term_original, term_target, language_from, language_to)
VALUES
('local_term', 'english_term', 'xx', 'en'),
('english_term', 'local_term', 'en', 'xx');Aim for 40–50 bidirectional pairs covering common food categories.
UPDATE products
SET search_vector = build_search_vector(
product_name, product_name_en, brand, category, country
)
WHERE country = 'XX';Run db/qa/QA__search_architecture.sql to confirm all tests pass.
When adding a new country, configure data provenance and governance:
INSERT INTO country_data_policies (
country, is_active, allergen_strictness,
min_confidence_for_publish, regulatory_framework, notes
) VALUES (
'XX', false, 'standard',
0.60, 'EU FIC 1169/2011', 'Initial setup'
);Set allergen_strictness to 'strict' for countries with enhanced allergen
regulations (e.g., DE, UK).
Insert 6 rows per country (one per field group):
INSERT INTO freshness_policies
(country, field_group, warning_age_days, critical_age_days,
max_age_days, refresh_strategy)
VALUES
('XX', 'nutrition', 90, 120, 150, 'auto_api'),
('XX', 'allergens', 45, 60, 75, 'manual_review'),
('XX', 'ingredients', 90, 120, 150, 'auto_api'),
('XX', 'identity', 150, 200, 365, 'auto_api'),
('XX', 'images', 150, 200, 365, 'auto_api'),
('XX', 'scoring', 15, 20, 30, 'auto_api');Adjust thresholds based on local regulatory refresh requirements.
Insert rules for field groups that should auto-resolve vs. require manual review:
INSERT INTO conflict_resolution_rules
(country, field_group, max_auto_resolve_severity, resolution_strategy)
VALUES
('XX', 'nutrition', 'high', 'highest_confidence'),
('XX', 'allergens', 'low', 'manual_always'),
('XX', 'ingredients', 'medium', 'highest_confidence'),
('XX', 'identity', 'high', 'most_recent'),
('XX', 'images', 'high', 'most_recent'),
('XX', 'scoring', 'high', 'highest_confidence');Add retailer or local data sources:
INSERT INTO data_sources
(source_key, display_name, source_type, base_confidence,
country_coverage, is_active)
VALUES
('retailer_local', 'Local Retailer', 'retailer', 0.80,
ARRAY['XX'], true);Once fully configured and validated:
UPDATE country_data_policies SET is_active = true WHERE country = 'XX';Run db/qa/QA__data_provenance.sql — tests T15–T16 validate country policies.
Also run validate_product_for_country(product_id, 'XX') on sample products.
Expanding from PL-only to PL+DE took approximately 3 weeks of focused work. The following lessons apply to any future country expansion.
- Country isolation worked perfectly. The
products.countrydiscriminator + QA suiteQA__country_isolation.sql(11 checks) caught all cross-contamination risks before they shipped. - Pipeline generator (
pipeline/run.py --country DE) scaled cleanly. The--countryflag required minimal code changes —sql_generator.pyandcategories.pywere already parameterized. - Automated enrichment (
enrich_ingredients.py) applied unchanged to DE. The same OFF API enrichment flow that processed PL ingredients worked for DE with zero code changes. - Reference tables (
country_ref,category_ref) made activation trivial. Adding DE was a singleINSERT+UPDATE is_active = true.
- Volume of pipeline SQL files. 19 categories × 4–5 SQL files each = ~85 new files. Each required manual review of product names, EANs, and nutrition values. The pipeline generator helped but still produced files needing spot-checks.
- Multi-country QA checks needed v3.3 scoring function updates. Two QA checks in
QA__multi_country_consistency.sqlcalled the stalecompute_unhealthiness_v32()and had to be upgraded to v3.3 with correct column names. - Merge conflicts during parallel work.
copilot-instructions.mdandCURRENT_STATE.mdhad frequent conflicts when multiple PRs updated documentation counts. Resolution rule from §13.1 ("take the higher/more complete version") was essential. - OFF API data quality for DE. Some German products had missing or inconsistent nutrition data. 9 products had calorie back-calculation outliers from OFF source data — documented as known QA exceptions.
| Step | Action | Effort | Reference |
|---|---|---|---|
| 1 | Add INSERT INTO country_ref + UPDATE is_active |
1 migration | §2.4 |
| 2 | Run pipeline/run.py --country XX --category "..." --max-products 51 for each category |
~1 day per 5 categories | §5 |
| 3 | Execute generated SQL against local DB | ~2h | RUN_LOCAL.ps1 |
| 4 | Run enrich_ingredients.py for the new country |
~2h | PR #651, #654 |
| 5 | Run full QA suite + scoring anchor validation | ~1h | RUN_QA.ps1 |
| 6 | Update documentation (this guide, copilot-instructions.md, CHANGELOG.md) |
~2h | This section |
| 7 | Deploy to production (after user confirmation) | ~30 min | §5, RUN_REMOTE.ps1 |
| Metric | Value |
|---|---|
| Products added | 1,066 active + 105 deprecated |
| Categories | 19 (all PL categories except Żabka) |
| Pipeline folders created | 19 (*-de/ naming convention) |
| Unique ingredients (DE) | Integrated into shared ingredient_ref (2,898 total) |
| Product-ingredient links (DE) | Part of 14,392 total links |
| Allergen declarations (DE) | Part of 2,691 allergens + 2,702 traces total |
| QA checks added | 5 DE scoring anchors in QA__scoring_formula_tests.sql |
| New QA suites | QA__multi_country_consistency.sql (13 checks) |
| Migrations | 1 enrichment migration (18,938 lines) |
| EAN coverage | 99.9% (2,261/2,264) |