Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
169 changes: 169 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,169 @@
# Agent Development Guide

This document provides guidance for AI agents (LLM-assisted development tools) working with the Grove Platform Tooling repository.

## Repository Overview

This is a monorepo containing multiple tools used by the MongoDB Developer Docs team for documentation-related tasks.

## Project Structure

### 1. `audit/` - Code Example Analysis Tools (Go)

Two Go projects that share common types and constants via the `audit/common` module:

#### `audit/gdcd` - Great Docs Code Devourer
- **Purpose**: Ingestion tool that extracts and categorizes code examples from MongoDB documentation
- **Language**: Go 1.24.4
- **Key Dependencies**:
- MongoDB Go Driver v2
- Ollama (for LLM-based code categorization using qwen2.5-coder model)
- langchaingo
- **Module**: `module gdcd` with local replace: `replace common => ../common`
- **Build**: `go build` from `audit/gdcd/`
- **Run**: `go run .` (requires `.env` file with `MONGODB_URI` and Ollama running locally)
- **Tests**: Standard Go tests (`*_test.go` files), run with `go test ./...`
- **Long-running**: Yes (~1-2 hours depending on project count)
- **Outputs**: Logs to `logs/` directory

#### `audit/dodec` - Database of Devoured Example Code
- **Purpose**: Query tool for code example database with aggregation pipelines
- **Language**: Go 1.24.0
- **Module**: `module dodec` with local replace: `replace common => ../../common`
- **Working Directory**: `audit/dodec/src/`
- **Build**: `go build` from `audit/dodec/src/`
- **Run**: `go run .` (requires `.env` file with `MONGODB_URI`)
- **Tests**: Standard Go tests

#### `audit/common` - Shared Types
- **Purpose**: Common Go type definitions and constants
- **Module**: `module common`
- **Used by**: Both gdcd and dodec via local replace directives

### 2. `dependency-manager/` - Multi-Language Dependency Manager (Go)

- **Purpose**: CLI tool to scan and update dependencies across multiple package managers
- **Language**: Go 1.25
- **Framework**: Cobra CLI
- **Module**: `module dependency-manager`
- **Build**: `go build -o depman` from `dependency-manager/`
- **Supported Package Managers**: npm, Maven, pip, Go modules, NuGet
- **Commands**:
- `depman check` - Dry run to check for updates
- `depman update` - Update dependency files only
- `depman install` - Update and install dependencies
- **Tests**: Located in `testdata/` directory
- **Documentation**: See `dependency-manager/README.md` and `dependency-manager/USAGE.md`

### 3. `github-metrics/` - GitHub Metrics Collection (Node.js)

- **Purpose**: Collects GitHub engagement metrics and writes to MongoDB Atlas
- **Language**: Node.js (ES modules)
- **Package Manager**: npm
- **Main Files**:
- `get-github-metrics.js` - Fetches metrics from GitHub using Octokit
- `write-to-db.js` - Writes data to MongoDB
- **Dependencies**: octokit, mongodb, esm
- **Run**: `node get-github-metrics.js` or `node write-to-db.js`
- **Status**: PoC (Proof of Concept)

### 4. `query-docs-feedback/` - Docs Feedback Query Tool (Go)

- **Purpose**: Queries MongoDB Docs Feedback for code example-related feedback
- **Language**: Go 1.23.1
- **Module**: `module query-docs-feedback`
- **Build**: `go build` from `query-docs-feedback/`
- **Run**: `go run .` (requires `.env` with `MONGODB_URI`, `DB_NAME`, `COLLECTION_NAME`)
- **Output**: CSV report

## Development Guidelines for Agents

### Go Projects

1. **Module System**: All Go projects use local module names, not GitHub paths
- Import using local module names: `import "common"`, `import "gdcd/add-code-examples"`, etc.
- Do NOT use full GitHub paths in imports
- The `replace` directives in `go.mod` handle local module resolution

2. **Testing**:
- Tests follow Go conventions: `*_test.go` files
- Run tests with `go test ./...` from project root
- Test data often in `test-data/` or `data/` subdirectories
- Many projects have helper functions for testing (e.g., `GetCodeExampleForTesting()`)

3. **Environment Variables**:
- Most projects require `.env` files (not committed to repo)
- Common variables: `MONGODB_URI`, `DB_NAME`, `COLLECTION_NAME`
- Use `github.com/joho/godotenv` for loading

4. **Build Commands**:
- Always run from the project directory containing `go.mod`
- Use `go build` or `go run .`
- For dodec, work from `audit/dodec/src/` not `audit/dodec/`

### Node.js Projects

1. **Package Management**: Use npm (package manager commands, not manual edits)
- Install: `npm install`
- Add dependency: `npm install <package>`
- Update: Use `ncu -u` then `npm install`

2. **Module System**: Uses ES modules (`"type": "module"` in package.json)

### Testing Philosophy

- Write tests for new functionality
- Run full test suite after implementation changes to catch regressions
- Remove debug output and debug files after diagnosing issues
- Optimize for maintainability over cleverness

### Code Style

- Use language-idiomatic documentation
- Capture "why" in comments, not just "what"
- Keep user-facing APIs simple (users are technical writers, not developers)
- Handle complexity internally when possible

## Common Tasks

### Running Tests
```bash
# Go projects
cd audit/gdcd && go test ./...
cd audit/dodec/src && go test ./...
cd dependency-manager && go test ./...

# Check for compilation errors
go build
```

### Building Tools
```bash
# GDCD
cd audit/gdcd && go build

# DoDEC
cd audit/dodec/src && go build

# Dependency Manager
cd dependency-manager && go build -o depman
```

### Updating Dependencies
```bash
# Go projects
go get -u ./...
go mod tidy

# Node.js projects
npm install
```

## Important Notes

- **Do NOT** manually edit `go.mod` files - use `go get` commands
- **Do NOT** manually edit `package.json` - use npm commands
- **Do NOT** create debug files without cleaning them up
- **Do NOT** add emojis or excessive success messages to output
- **Always** run full test suite after changes
- **Always** remove debug output from source code when done
8 changes: 4 additions & 4 deletions audit/dodec/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

This project contains scaffold and several aggregation pipelines to work with the Database of Devoured Example Code.
The Database of Devoured Example Code contains code examples and related metadata that has been ingested by the [Great
Docs Code Devourer](https://github.com/mongodb/code-example-tooling/tree/main/audit/gdcd).
Docs Code Devourer](https://github.com/grove-platform/tooling/tree/main/audit/gdcd).

This DoDEC tooling can currently perform the following tasks:

Expand Down Expand Up @@ -59,7 +59,7 @@ tables.

- [Print one table](src/utils/PrintSimpleCountDataToConsole.go) with rows representing each collection, product, category,
or programming language. Use where the aggregation returns a `simpleMap` as defined in [PerformAggregation](src/PerformAggregation.go)
- [Print multiple tables](src/utils/PrintNestedOneLevelCountDataToConsole.go) with each row representing a category or
- [Print multiple tables](src/utils/PrintNestedOneLevelCountDataToConsole.go) with each row representing a category or
programming language, and each table representing a higher-level division such as product or docs property. Use where
the aggregation returns a `nestedOneLevelMap` as defined in [PerformAggregation](src/PerformAggregation.go)
- [Print multiple tables from two-level nested maps](src/utils/PrintNestedTwoLevelCountDataToConsole.go) with each row
Expand Down Expand Up @@ -171,12 +171,12 @@ Every collection contains documents that map to one of two schemas:
#### Summary document

The summary document has a schema that conforms to the
[CollectionReport](https://github.com/mongodb/code-example-tooling/blob/main/audit/common/CollectionReport.go) type.
[CollectionReport](https://github.com/grove-platform/tooling/blob/main/audit/common/CollectionReport.go) type.

#### Docs page document

The remaining documents in the collection each map to an individual docs page. The docs page documents have a schema that
conforms to the [DocsPage](https://github.com/mongodb/code-example-tooling/blob/main/audit/common/DocsPage.go) type.
conforms to the [DocsPage](https://github.com/grove-platform/tooling/blob/main/audit/common/DocsPage.go) type.

Each docs page has a `nodes` array, which may be `null`, or may contain `CodeNode` elements. The `CodeNode` elements
contain metadata about the code examples, as well as the examples themselves. To work with only the `CodeNode` elements
Expand Down
50 changes: 25 additions & 25 deletions audit/gdcd/README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Great Docs Code Devourer (Code Ingest Tool)
# Great Docs Code Devourer (Code Ingest Tool)

The Great Docs Code Devourer (GDCD) processes MongoDB documentation pages to extract code examples. It compares these
examples with previously stored code to identify new, updated, or removed examples. GDCD stores all code examples and
The Great Docs Code Devourer (GDCD) processes MongoDB documentation pages to extract code examples. It compares these
examples with previously stored code to identify new, updated, or removed examples. GDCD stores all code examples and
metadata in a MongoDB Atlas database maintained by the Developer Docs team.


Expand All @@ -18,7 +18,7 @@ The database of devoured code examples enables powerful analysis of the document
- Language coverage across documentation

For querying this data, use the companion project,
[Database of Devoured Example Code (DODEC)](https://github.com/mongodb/code-example-tooling/tree/main/audit/dodec).
[Database of Devoured Example Code (DODEC)](https://github.com/grove-platform/tooling/tree/main/audit/dodec).

## How it works

Expand All @@ -31,17 +31,17 @@ GDCD follows this pipeline:

### LLM-Based Code Categorization

We use the Ollama [qwen2.5-coder](https://ollama.com/library/qwen2.5-coder) model to categorize new incoming
code examples. At the time of this writing, it is the latest series of code-specific Qwen models focused on improved code
reasoning, code generation, and code fixing. This model has consistently produced the most accurate results when
We use the Ollama [qwen2.5-coder](https://ollama.com/library/qwen2.5-coder) model to categorize new incoming
code examples. At the time of this writing, it is the latest series of code-specific Qwen models focused on improved code
reasoning, code generation, and code fixing. This model has consistently produced the most accurate results when
categorizing code examples. Refer to the [Ollama](https://ollama.com/) website for more details.

### Metadata Tracked

We track various metadata about the code examples and their associated documentation pages:

For each code example:
- Code example text
- Code example text
- File extension and programming language
- Category
- Categorization method (LLM or manual)
Expand Down Expand Up @@ -77,51 +77,51 @@ connection details and access.
```shell
go get gdcd
```
3. Create the relevant env configuration files in the project root. This project is set up for three environments. You will most likely be running against prod.
3. Create the relevant env configuration files in the project root. This project is set up for three environments. You will most likely be running against prod.
1. Create a `.env.ENVIRONMENT` file for the `ENVIRONMENT` where you want to run the tool:
- `production`
- `production`
- `development`
- `testing`

(for example, create `.env.production` to run against the prod database)
2. Add the following:
```dotenv
MONGODB_URI="YOUR_MONGODB_URI_HERE"
DB_NAME="RELEVANT_DB_NAME_HERE"
```
- `MONGODB_URI`: Connection string for the Code Snippets project in the Developer Docs Atlas organization.
- `MONGODB_URI`: Connection string for the Code Snippets project in the Developer Docs Atlas organization.
Contact the Developer Docs team for access.
- `DB_NAME`: The database to run the tool on. We maintain several databases for production, testing, and backup purposes.
- `DB_NAME`: The database to run the tool on. We maintain several databases for production, testing, and backup purposes.
Contact the Developer Docs team for the appropriate DB name.

## Running the Tool

Set the `APP_ENV` variable to the environment where you want to run the tool, then run from `main.go`.
Set the `APP_ENV` variable to the environment where you want to run the tool, then run from `main.go`.
Env values:
- `production`
- `development`
- `testing`

You can do this from the command line or your IDE:
You can do this from the command line or your IDE:

- **Command Line**

To run from the terminal, set the variable, then run from the project root.
To run from the terminal, set the variable, then run from the project root.
For example, to run against the `production` environment:
```shell
export APP_ENV=production
go build
go run .
```
- **IDE**:
To run from an IDE configuration:
1. Set the `APP_ENV` environment variable (e.g. `APP_ENV=production`)

To run from an IDE configuration:
1. Set the `APP_ENV` environment variable (e.g. `APP_ENV=production`)
2. Run `main.go`

The progress bar should immediately output to console and continue to display progress until all
projects are parsed. Depending on your machine and the amount of projects specified, this can be a
long-running program (~1-2hrs ).
The progress bar should immediately output to console and continue to display progress until all
projects are parsed. Depending on your machine and the amount of projects specified, this can be a
long-running program (~1-2hrs ).

## Reviewing logs

Expand Down Expand Up @@ -173,15 +173,15 @@ Error: "failed to connect to MongoDB"
```
1. Verify you've set the correct `APP_ENV` variable and corresponding `.env.ENVIRONMENT` exists in project root
2. Check your connection string in the corresponding `.env.ENVIRONMENT` file
3. Check connectivity to Atlas and that your IP is whitelisted
3. Check connectivity to Atlas and that your IP is whitelisted

### Other Issues

Contact the Developer Docs team for assistance with environment setup or access.

## Disclaimer

Enlist the aid of the Great Docs Code Devourer at your peril!
Enlist the aid of the Great Docs Code Devourer at your peril!

This beast is an amalgam of tools with some test coverage, but key bits of business logic still remain uncovered by tests.
This beast is an amalgam of tools with some test coverage, but key bits of business logic still remain uncovered by tests.
If demand/priority permits, we would love to expand and improve this tooling.
2 changes: 1 addition & 1 deletion audit/gdcd/scripts/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ moved, we must manually adjust the count of new applied usage examples to omit t

```bash
# Navigate to the scripts directory first
cd /Your/Local/Filepath/code-example-tooling/audit/gdcd/scripts
cd /Your/Local/Filepath/tooling/audit/gdcd/scripts

# Then run the Go script
go run parse-log.go ../logs/2025-09-24-18-01-30-app.log
Expand Down
Loading