feat: add entity description property to extraction schema

## Problem

The pipeline's 12 LLM-extracted entity types (Concept, Challenge, Artifact, etc.) do not include a `description` property in the extraction schema (`extraction/schema.py`). As a result:

1. **No entities have descriptions** — The `LLMEntityRelExtractor` only extracts properties defined in the schema. Without a `description` property, the LLM extracts `name`, `display_name`, and type-specific fields but never a description.
2. **EntitySummarizer is a no-op** — The summarizer (`postprocessing/entity_summarizer.py`) consolidates fragmented descriptions into coherent summaries, but finds zero entities to process because the `description` property doesn't exist in the database.
3. **Community summaries are shallower** — `CommunitySummarizer` attempts to read `n.description` for richer community summaries but falls back to names and labels only, producing less informative results.
4. **RAG retrieval misses semantic context** — Entity matching currently relies on `name` only. A description like *"the practice of linking requirements to downstream artifacts to ensure completeness and enable impact analysis"* would give retrievers far richer semantic context.

### Evidence from staging pipeline run (2026-02-24)

```
WARNING:neo4j.notifications: The property `description` does not exist in database `neo4j`.
2026-02-24 16:29:46 [info] No entities with fragmented descriptions found
    Summarized 0 entities
```

The Neo4j warning confirms that no entity in the graph has a `description` property. The EntitySummarizer correctly returns immediately with zero work.

Verification query:
```cypher
MATCH (n:__Entity__) WHERE n.description IS NOT NULL RETURN count(n)
-- Returns 0
```

## Root Cause

In `src/graphrag_kg_pipeline/extraction/schema.py`, the `NODE_TYPES` dict defines properties for each entity type. None of the 12 types include a `description` property:

- **Concept**: `name`, `display_name`, `definition`, `aliases`
- **Challenge**: `name`, `display_name`, `severity`
- **Artifact**: `name`, `display_name`, `artifact_type`
- **Bestpractice**: `name`, `display_name`, `rationale`
- **Processstage**: `name`, `display_name`, `sequence`
- **Role**: `name`, `display_name`, `responsibilities`
- **Standard**: `name`, `display_name`, `organization`, `domain`
- **Tool**: `name`, `display_name`, `vendor`, `tool_type`
- **Methodology**: `name`, `display_name`, `approach`
- **Industry**: `name`, `display_name`, `regulated`
- **Organization**: `name`, `display_name`, `organization_type`, `domain`
- **Outcome**: `name`, `display_name`, `outcome_type`

The `SimpleKGPipeline` LLM extraction prompt only asks the LLM to extract properties listed in the schema, so descriptions are never produced.

## Proposed Solution

Add a `description` property to all 12 entity types in `NODE_TYPES`:

```python
"description": {
    "type": "STRING",
    "required": False,
    "description": "One-sentence description of this entity in the context of requirements management",
},
```

### Downstream effects (already implemented, will activate automatically)

1. **EntitySummarizer** — Will find entities with multi-fragment descriptions (>200 chars from multi-chunk extraction) and consolidate them via LLM into clean 1-3 sentence summaries. Currently implemented and tested but has zero work to do.
2. **CommunitySummarizer** — Already reads `n.description` in its community member query. Will produce richer community summaries without code changes.
3. **API repo retrieval** — `text2cypher.py` and entity search will benefit from richer entity context. No changes needed in the API repo.

### Cost and runtime impact

- **Extraction**: ~10-20% more tokens per article (LLM must extract an additional property per entity). Estimated additional cost: ~$1-2 on a full pipeline run.
- **EntitySummarizer**: Will now make LLM calls for entities with fragmented descriptions. Estimated: ~$0.50-1.00 (gpt-4o, ~100 entities with fragments).
- **Total additional cost per full run**: ~$1.50-3.00 (on top of existing ~$9-17).
- **No additional runtime** for community embeddings or vector indexes — descriptions don't affect those.

### Implementation steps

1. Add `description` property to all 12 entity types in `extraction/schema.py`
2. Verify `EntitySummarizer` activates on a staging run (should find entities with >200 char descriptions)
3. Compare community summary quality with/without entity descriptions
4. Update tests if schema property counts change in assertions
5. Run full staging pipeline to validate end-to-end

### What NOT to change

- **EntitySummarizer code** — Already correctly implemented, just needs data to work with
- **CommunitySummarizer code** — Already reads descriptions with graceful fallback
- **Extraction prompts** — `SimpleKGPipeline` auto-generates prompts from the schema; adding the property is sufficient
- **Validation queries** — No description-related checks currently exist

## Context

- Concept has a `definition` property (specific to Concept), but `description` is a general-purpose field for all entity types
- The `definition` property on Concept serves a different purpose — it captures formal definitions from the glossary, not contextual descriptions from extraction
- This was identified during the first staging pipeline run against `graphrag-api-db-stage` (local Neo4j Desktop instance)

## Labels

Enhancement, Pipeline

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add entity description property to extraction schema #49

Problem

Evidence from staging pipeline run (2026-02-24)

Root Cause

Proposed Solution

Downstream effects (already implemented, will activate automatically)

Cost and runtime impact

Implementation steps

What NOT to change

Context

Labels

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

feat: add entity description property to extraction schema #49

Description

Problem

Evidence from staging pipeline run (2026-02-24)

Root Cause

Proposed Solution

Downstream effects (already implemented, will activate automatically)

Cost and runtime impact

Implementation steps

What NOT to change

Context

Labels

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions