Skip to content

Commit 6e6d1e6

Browse files
committed
Replaced OpenAI LLM API Key with Gemini API Key for the ability to ask questions in plain English. Updated documentation and project version to cover missing items.
1 parent a4bf068 commit 6e6d1e6

14 files changed

Lines changed: 522 additions & 49 deletions

File tree

.gitignore

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -69,7 +69,6 @@ instance/
6969
.scrapy
7070

7171
# Sphinx documentation
72-
docs/_build/
7372

7473
# PyBuilder
7574
.pybuilder/
@@ -104,7 +103,6 @@ ipython_config.py
104103
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
105104
# This is especially recommended for binary packages to ensure reproducibility, and is more
106105
# commonly ignored for libraries.
107-
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
108106
#poetry.lock
109107
#poetry.toml
110108

@@ -211,5 +209,4 @@ knowcode_knowledge.json
211209
CHANGELOG.md
212210
docs_test/
213211
KnowCode.md
214-
docs/
215212

README.md

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,9 @@ source .venv/bin/activate # On Windows: .venv\Scripts\activate
2323

2424
# Install KnowCode (with dev dependencies)
2525
uv sync --dev
26+
27+
# Set OpenAI API Key (required for semantic search and 'ask' command)
28+
export GOOGLE_API_KEY="sk-..."
2629
```
2730

2831
## Quick Start
@@ -158,6 +161,34 @@ Once running, you can access endpoints like:
158161
- `POST /api/v1/context/query` `(semantic search)`
159162
- `POST /api/v1/reload` (to refresh data after a new `analyze` run)
160163

164+
### `history`
165+
Show git history for the codebase or specific entities. Requires analysis with `--temporal`.
166+
167+
```bash
168+
knowcode history [target] [--limit <n>]
169+
```
170+
171+
**Example:**
172+
```bash
173+
# Show recent project history
174+
knowcode history --limit 5
175+
176+
# Show history for a specific class
177+
knowcode history "KnowledgeStore"
178+
```
179+
180+
### `ask`
181+
Ask questions about the codebase using an LLM agent. Requires `GOOGLE_API_KEY` environment variable.
182+
183+
```bash
184+
knowcode ask <question> [--model <model>]
185+
```
186+
187+
**Example:**
188+
```bash
189+
knowcode ask "How does the graph builder work?"
190+
```
191+
161192
## Supported Languages (MVP)
162193

163194
- **Python** (.py) - Full AST parsing (Supports Python 3.9 - 3.12)

docs/api/knowledge_store.md

Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,72 @@
1+
# Knowledge Store API
2+
3+
The `KnowledgeStore` is the central repository for the semantic graph derived from the codebase. It persists the graph to a JSON file and provides query mechanisms.
4+
5+
## Class: `KnowledgeStore`
6+
7+
**Module:** `knowcode.storage.knowledge_store`
8+
9+
### Initialization
10+
11+
```python
12+
store = KnowledgeStore()
13+
```
14+
15+
### Persistence
16+
17+
#### `save(path: str | Path)`
18+
Saves the current graph, including entities, relationships, and metadata, to a JSON file (default `knowcode_knowledge.json`).
19+
20+
#### `load(path: str | Path) -> KnowledgeStore`
21+
Class method to load a store from a JSON file (or directory containing the file).
22+
23+
### Core Properties
24+
25+
- **`entities`**: A dictionary mapping entity IDs to `Entity` objects.
26+
- **`relationships`**: A list of `Relationship` objects.
27+
- **`metadata`**: A dictionary containing scan statistics (scan time, file count) and errors.
28+
29+
### Query Methods
30+
31+
#### `get_entity(entity_id: str) -> Optional[Entity]`
32+
Retrieve an entity object by its unique ID.
33+
34+
#### `search(pattern: str) -> list[Entity]`
35+
Search for entities where the name or qualified name matches the substring pattern (case-insensitive).
36+
37+
#### `get_callers(entity_id: str) -> list[Entity]`
38+
Find all entities that call the target entity (incoming `CALLS` edges).
39+
40+
#### `get_callees(entity_id: str) -> list[Entity]`
41+
Find all entities that are called by the source entity (outgoing `CALLS` edges).
42+
43+
#### `get_children(entity_id: str) -> list[Entity]`
44+
Find all entities contained within the source entity (e.g., methods within a class).
45+
46+
#### `get_parent(entity_id: str) -> Optional[Entity]`
47+
Find the container of an entity (e.g., the class containing a method).
48+
49+
#### `get_dependencies(entity_id: str) -> list[Entity]`
50+
Get all entities that the target entity depends on via calls or imports.
51+
52+
#### `get_dependents(entity_id: str) -> list[Entity]`
53+
Get all entities that depend on the target entity via calls or imports.
54+
55+
#### `get_entities_by_kind(kind: EntityKind | str) -> list[Entity]`
56+
List all entities of a specific kind (e.g., `EntityKind.CLASS`, "function").
57+
58+
### Data Models
59+
60+
#### `Entity`
61+
- `id`: Unique identifier (path + :: + qualified name)
62+
- `kind`: `EntityKind` (module, class, function, method, etc.)
63+
- `name`: Short name
64+
- `qualified_name`: Full dotted path
65+
- `location`: File path and line range
66+
- `source_code`: Raw source code (optional)
67+
- `docstring`: Extracted docstring (optional)
68+
69+
#### `Relationship`
70+
- `source_id`: Origin entity ID
71+
- `target_id`: Destination entity ID
72+
- `kind`: `RelationshipKind` (calls, imports, contains, inherits, etc.)

docs/evolution.md

Lines changed: 111 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,111 @@
1+
# **Detailed Architecture & Roadmap**
2+
3+
*Note: This document outlines the conceptual architecture of KnowCode, some of which looks ahead to future phases.*
4+
5+
---
6+
7+
## **1. Layered Architecture**
8+
9+
KnowCode follows a multi-layer design to ensure extensibility, maintainability, and scalability.
10+
11+
1. **Ingestion Layer**: Source Code scanning, Parsing (AST/Tree-sitter).
12+
2. **Analysis Layer**: Structural building, Semantic graph construction.
13+
3. **Storage Layer**: Graph persistence, Vector storage.
14+
4. **Retrieval Layer**: Hybrid search (Lexical + Semantic).
15+
5. **Intelligence Layer**: Context synthesis, RAG orchestration.
16+
6. **Interface Layer**: CLI, REST API (FastAPI).
17+
18+
---
19+
20+
## **2. Component Interaction**
21+
22+
```mermaid
23+
flowchart TB
24+
subgraph Ingestion
25+
L1[Layer 1: Source Ingestion]
26+
end
27+
28+
subgraph Analysis
29+
L2[Layer 2: Structural Parsing]
30+
L3[Layer 3: Semantic Graph]
31+
L4[Layer 4: Behavioral Analysis]
32+
L5[Layer 5: Runtime Signals]
33+
L6[Layer 6: Intent Extraction]
34+
end
35+
36+
subgraph Intelligence
37+
L7[Layer 7: Doc Synthesis]
38+
L8[Layer 8: Knowledge Store]
39+
L9[Layer 9: Context Synthesis]
40+
end
41+
42+
subgraph Interface
43+
L10[Layer 10: LLM Interface]
44+
DEV[Developer]
45+
end
46+
47+
subgraph Evolution
48+
L11[Layer 11: Feedback Loop]
49+
end
50+
51+
subgraph Cross-Cutting
52+
SEC[Security]
53+
OBS[Observability]
54+
CFG[Configuration]
55+
end
56+
57+
L1 --> L2
58+
L2 --> L3
59+
L3 --> L4
60+
L3 --> L6
61+
L4 --> L8
62+
L5 -.-> L8
63+
L6 --> L8
64+
L8 --> L7
65+
L8 --> L9
66+
L9 --> L10
67+
L10 --> DEV
68+
DEV --> L11
69+
L11 --> L8
70+
L11 --> L3
71+
72+
SEC -.-> L1 & L8 & L10
73+
OBS -.-> L1 & L3 & L8 & L10
74+
CFG -.-> L2 & L4 & L9
75+
```
76+
77+
---
78+
79+
## **Implementation Status & Roadmap**
80+
81+
### **Phase 1: Foundation (COMPLETED)**
82+
1. **[x] Source Scanning + Parsing (Layers 1-2)**: Scanner with gitignore support; parsers for Python (AST), JS/TS + Java (Tree-sitter), Markdown, YAML.
83+
2. **[x] Unified Semantic Graph (Layer 3)**: Entity/relationship model with reference resolution (calls/imports/contains/inherits).
84+
3. **[x] Local Knowledge Store (Layer 8)**: In-memory graph with JSON persistence and query helpers.
85+
4. **[x] Token-Budgeted Context Synthesis (Layer 9)**: Priority-ordered sections with truncation handling.
86+
5. **[x] Service Layer**: Shared business logic for CLI and API.
87+
88+
### **Phase 2: Intelligence Server & RAG (COMPLETED)**
89+
6. **[x] FastAPI Server (Layer 10)**: Health, stats, search, context, semantic query, reload, entity details, callers/callees.
90+
7. **[x] Semantic Search & Indexing (Layer 4a)**: Chunker (module header/imports/entities), OpenAI embeddings, FAISS vector store, hybrid BM25+vector retrieval (RRF), reranking, dependency expansion.
91+
8. **[x] Indexer Persistence + CLI**: `index`/`semantic-search` commands with save/load.
92+
9. **[x] Watch Mode**: Background indexer + filesystem monitor for incremental re-indexing.
93+
10. **[x] CLI Workflows**: `analyze`, `query`, `context`, `export`, `stats`, `server`, `history`, `ask`.
94+
95+
### **Phase 3: Temporal & Runtime Signals (COMPLETED)**
96+
11. **[x] Git History Ingestion (Temporal)**: Commit/author entities, authored/modified/changed_by relationships; surfaced via `--temporal` and `history`.
97+
12. **[x] Coverage Signals (Layer 5)**: Cobertura ingestion with coverage report entities and covers/executed_by relationships.
98+
99+
### **Phase 4: Documentation Synthesis (PARTIAL)**
100+
13. **[x] Markdown Export (MVP)**: CLI `export` produces an index-style Markdown doc.
101+
14. **[ ] Multi-Level Doc Synthesis (Layer 7)**: Architecture/module/function narratives, change summaries, and freshness tracking.
102+
103+
### **Phase 5: Deep Analysis (NEXT)**
104+
15. **[ ] Static Behavioral Analysis (Layer 4)**: Data flow, state transitions, side-effect classification.
105+
16. **[ ] Intent Extraction (Layer 6)**: ADR/PR/commit intent linking beyond commit metadata.
106+
17. **[ ] Confidence Scoring (Layer 3)**: Weighted edges/entities by evidence source.
107+
108+
### **Phase 6: Enterprise (FUTURE)**
109+
18. **[ ] Security & RBAC**: Permissioned access and audit trails.
110+
19. **[ ] Scalability**: Large monorepo support and distributed processing.
111+
20. **[ ] Team Sharing**: Remote knowledge store sync and collaboration.

0 commit comments

Comments
 (0)