diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 0000000..8e89894 --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,169 @@ +# Agent Development Guide + +This document provides guidance for AI agents (LLM-assisted development tools) working with the Grove Platform Tooling repository. + +## Repository Overview + +This is a monorepo containing multiple tools used by the MongoDB Developer Docs team for documentation-related tasks. + +## Project Structure + +### 1. `audit/` - Code Example Analysis Tools (Go) + +Two Go projects that share common types and constants via the `audit/common` module: + +#### `audit/gdcd` - Great Docs Code Devourer +- **Purpose**: Ingestion tool that extracts and categorizes code examples from MongoDB documentation +- **Language**: Go 1.24.4 +- **Key Dependencies**: + - MongoDB Go Driver v2 + - Ollama (for LLM-based code categorization using qwen2.5-coder model) + - langchaingo +- **Module**: `module gdcd` with local replace: `replace common => ../common` +- **Build**: `go build` from `audit/gdcd/` +- **Run**: `go run .` (requires `.env` file with `MONGODB_URI` and Ollama running locally) +- **Tests**: Standard Go tests (`*_test.go` files), run with `go test ./...` +- **Long-running**: Yes (~1-2 hours depending on project count) +- **Outputs**: Logs to `logs/` directory + +#### `audit/dodec` - Database of Devoured Example Code +- **Purpose**: Query tool for code example database with aggregation pipelines +- **Language**: Go 1.24.0 +- **Module**: `module dodec` with local replace: `replace common => ../../common` +- **Working Directory**: `audit/dodec/src/` +- **Build**: `go build` from `audit/dodec/src/` +- **Run**: `go run .` (requires `.env` file with `MONGODB_URI`) +- **Tests**: Standard Go tests + +#### `audit/common` - Shared Types +- **Purpose**: Common Go type definitions and constants +- **Module**: `module common` +- **Used by**: Both gdcd and dodec via local replace directives + +### 2. `dependency-manager/` - Multi-Language Dependency Manager (Go) + +- **Purpose**: CLI tool to scan and update dependencies across multiple package managers +- **Language**: Go 1.25 +- **Framework**: Cobra CLI +- **Module**: `module dependency-manager` +- **Build**: `go build -o depman` from `dependency-manager/` +- **Supported Package Managers**: npm, Maven, pip, Go modules, NuGet +- **Commands**: + - `depman check` - Dry run to check for updates + - `depman update` - Update dependency files only + - `depman install` - Update and install dependencies +- **Tests**: Located in `testdata/` directory +- **Documentation**: See `dependency-manager/README.md` and `dependency-manager/USAGE.md` + +### 3. `github-metrics/` - GitHub Metrics Collection (Node.js) + +- **Purpose**: Collects GitHub engagement metrics and writes to MongoDB Atlas +- **Language**: Node.js (ES modules) +- **Package Manager**: npm +- **Main Files**: + - `get-github-metrics.js` - Fetches metrics from GitHub using Octokit + - `write-to-db.js` - Writes data to MongoDB +- **Dependencies**: octokit, mongodb, esm +- **Run**: `node get-github-metrics.js` or `node write-to-db.js` +- **Status**: PoC (Proof of Concept) + +### 4. `query-docs-feedback/` - Docs Feedback Query Tool (Go) + +- **Purpose**: Queries MongoDB Docs Feedback for code example-related feedback +- **Language**: Go 1.23.1 +- **Module**: `module query-docs-feedback` +- **Build**: `go build` from `query-docs-feedback/` +- **Run**: `go run .` (requires `.env` with `MONGODB_URI`, `DB_NAME`, `COLLECTION_NAME`) +- **Output**: CSV report + +## Development Guidelines for Agents + +### Go Projects + +1. **Module System**: All Go projects use local module names, not GitHub paths + - Import using local module names: `import "common"`, `import "gdcd/add-code-examples"`, etc. + - Do NOT use full GitHub paths in imports + - The `replace` directives in `go.mod` handle local module resolution + +2. **Testing**: + - Tests follow Go conventions: `*_test.go` files + - Run tests with `go test ./...` from project root + - Test data often in `test-data/` or `data/` subdirectories + - Many projects have helper functions for testing (e.g., `GetCodeExampleForTesting()`) + +3. **Environment Variables**: + - Most projects require `.env` files (not committed to repo) + - Common variables: `MONGODB_URI`, `DB_NAME`, `COLLECTION_NAME` + - Use `github.com/joho/godotenv` for loading + +4. **Build Commands**: + - Always run from the project directory containing `go.mod` + - Use `go build` or `go run .` + - For dodec, work from `audit/dodec/src/` not `audit/dodec/` + +### Node.js Projects + +1. **Package Management**: Use npm (package manager commands, not manual edits) + - Install: `npm install` + - Add dependency: `npm install ` + - Update: Use `ncu -u` then `npm install` + +2. **Module System**: Uses ES modules (`"type": "module"` in package.json) + +### Testing Philosophy + +- Write tests for new functionality +- Run full test suite after implementation changes to catch regressions +- Remove debug output and debug files after diagnosing issues +- Optimize for maintainability over cleverness + +### Code Style + +- Use language-idiomatic documentation +- Capture "why" in comments, not just "what" +- Keep user-facing APIs simple (users are technical writers, not developers) +- Handle complexity internally when possible + +## Common Tasks + +### Running Tests +```bash +# Go projects +cd audit/gdcd && go test ./... +cd audit/dodec/src && go test ./... +cd dependency-manager && go test ./... + +# Check for compilation errors +go build +``` + +### Building Tools +```bash +# GDCD +cd audit/gdcd && go build + +# DoDEC +cd audit/dodec/src && go build + +# Dependency Manager +cd dependency-manager && go build -o depman +``` + +### Updating Dependencies +```bash +# Go projects +go get -u ./... +go mod tidy + +# Node.js projects +npm install +``` + +## Important Notes + +- **Do NOT** manually edit `go.mod` files - use `go get` commands +- **Do NOT** manually edit `package.json` - use npm commands +- **Do NOT** create debug files without cleaning them up +- **Do NOT** add emojis or excessive success messages to output +- **Always** run full test suite after changes +- **Always** remove debug output from source code when done diff --git a/audit/dodec/README.md b/audit/dodec/README.md index 44e755e..fb60454 100644 --- a/audit/dodec/README.md +++ b/audit/dodec/README.md @@ -2,7 +2,7 @@ This project contains scaffold and several aggregation pipelines to work with the Database of Devoured Example Code. The Database of Devoured Example Code contains code examples and related metadata that has been ingested by the [Great -Docs Code Devourer](https://github.com/mongodb/code-example-tooling/tree/main/audit/gdcd). +Docs Code Devourer](https://github.com/grove-platform/tooling/tree/main/audit/gdcd). This DoDEC tooling can currently perform the following tasks: @@ -59,7 +59,7 @@ tables. - [Print one table](src/utils/PrintSimpleCountDataToConsole.go) with rows representing each collection, product, category, or programming language. Use where the aggregation returns a `simpleMap` as defined in [PerformAggregation](src/PerformAggregation.go) -- [Print multiple tables](src/utils/PrintNestedOneLevelCountDataToConsole.go) with each row representing a category or +- [Print multiple tables](src/utils/PrintNestedOneLevelCountDataToConsole.go) with each row representing a category or programming language, and each table representing a higher-level division such as product or docs property. Use where the aggregation returns a `nestedOneLevelMap` as defined in [PerformAggregation](src/PerformAggregation.go) - [Print multiple tables from two-level nested maps](src/utils/PrintNestedTwoLevelCountDataToConsole.go) with each row @@ -171,12 +171,12 @@ Every collection contains documents that map to one of two schemas: #### Summary document The summary document has a schema that conforms to the -[CollectionReport](https://github.com/mongodb/code-example-tooling/blob/main/audit/common/CollectionReport.go) type. +[CollectionReport](https://github.com/grove-platform/tooling/blob/main/audit/common/CollectionReport.go) type. #### Docs page document The remaining documents in the collection each map to an individual docs page. The docs page documents have a schema that -conforms to the [DocsPage](https://github.com/mongodb/code-example-tooling/blob/main/audit/common/DocsPage.go) type. +conforms to the [DocsPage](https://github.com/grove-platform/tooling/blob/main/audit/common/DocsPage.go) type. Each docs page has a `nodes` array, which may be `null`, or may contain `CodeNode` elements. The `CodeNode` elements contain metadata about the code examples, as well as the examples themselves. To work with only the `CodeNode` elements diff --git a/audit/gdcd/README.md b/audit/gdcd/README.md index f7e62c9..3c668af 100644 --- a/audit/gdcd/README.md +++ b/audit/gdcd/README.md @@ -1,7 +1,7 @@ -# Great Docs Code Devourer (Code Ingest Tool) +# Great Docs Code Devourer (Code Ingest Tool) -The Great Docs Code Devourer (GDCD) processes MongoDB documentation pages to extract code examples. It compares these -examples with previously stored code to identify new, updated, or removed examples. GDCD stores all code examples and +The Great Docs Code Devourer (GDCD) processes MongoDB documentation pages to extract code examples. It compares these +examples with previously stored code to identify new, updated, or removed examples. GDCD stores all code examples and metadata in a MongoDB Atlas database maintained by the Developer Docs team. @@ -18,7 +18,7 @@ The database of devoured code examples enables powerful analysis of the document - Language coverage across documentation For querying this data, use the companion project, -[Database of Devoured Example Code (DODEC)](https://github.com/mongodb/code-example-tooling/tree/main/audit/dodec). +[Database of Devoured Example Code (DODEC)](https://github.com/grove-platform/tooling/tree/main/audit/dodec). ## How it works @@ -31,9 +31,9 @@ GDCD follows this pipeline: ### LLM-Based Code Categorization -We use the Ollama [qwen2.5-coder](https://ollama.com/library/qwen2.5-coder) model to categorize new incoming -code examples. At the time of this writing, it is the latest series of code-specific Qwen models focused on improved code -reasoning, code generation, and code fixing. This model has consistently produced the most accurate results when +We use the Ollama [qwen2.5-coder](https://ollama.com/library/qwen2.5-coder) model to categorize new incoming +code examples. At the time of this writing, it is the latest series of code-specific Qwen models focused on improved code +reasoning, code generation, and code fixing. This model has consistently produced the most accurate results when categorizing code examples. Refer to the [Ollama](https://ollama.com/) website for more details. ### Metadata Tracked @@ -41,7 +41,7 @@ categorizing code examples. Refer to the [Ollama](https://ollama.com/) website f We track various metadata about the code examples and their associated documentation pages: For each code example: -- Code example text +- Code example text - File extension and programming language - Category - Categorization method (LLM or manual) @@ -77,36 +77,36 @@ connection details and access. ```shell go get gdcd ``` -3. Create the relevant env configuration files in the project root. This project is set up for three environments. You will most likely be running against prod. +3. Create the relevant env configuration files in the project root. This project is set up for three environments. You will most likely be running against prod. 1. Create a `.env.ENVIRONMENT` file for the `ENVIRONMENT` where you want to run the tool: - - `production` + - `production` - `development` - `testing` - + (for example, create `.env.production` to run against the prod database) 2. Add the following: ```dotenv MONGODB_URI="YOUR_MONGODB_URI_HERE" DB_NAME="RELEVANT_DB_NAME_HERE" ``` - - `MONGODB_URI`: Connection string for the Code Snippets project in the Developer Docs Atlas organization. + - `MONGODB_URI`: Connection string for the Code Snippets project in the Developer Docs Atlas organization. Contact the Developer Docs team for access. - - `DB_NAME`: The database to run the tool on. We maintain several databases for production, testing, and backup purposes. + - `DB_NAME`: The database to run the tool on. We maintain several databases for production, testing, and backup purposes. Contact the Developer Docs team for the appropriate DB name. ## Running the Tool -Set the `APP_ENV` variable to the environment where you want to run the tool, then run from `main.go`. +Set the `APP_ENV` variable to the environment where you want to run the tool, then run from `main.go`. Env values: - `production` - `development` - `testing` -You can do this from the command line or your IDE: +You can do this from the command line or your IDE: - **Command Line** - To run from the terminal, set the variable, then run from the project root. + To run from the terminal, set the variable, then run from the project root. For example, to run against the `production` environment: ```shell export APP_ENV=production @@ -114,14 +114,14 @@ You can do this from the command line or your IDE: go run . ``` - **IDE**: - - To run from an IDE configuration: - 1. Set the `APP_ENV` environment variable (e.g. `APP_ENV=production`) + + To run from an IDE configuration: + 1. Set the `APP_ENV` environment variable (e.g. `APP_ENV=production`) 2. Run `main.go` -The progress bar should immediately output to console and continue to display progress until all -projects are parsed. Depending on your machine and the amount of projects specified, this can be a -long-running program (~1-2hrs ). +The progress bar should immediately output to console and continue to display progress until all +projects are parsed. Depending on your machine and the amount of projects specified, this can be a +long-running program (~1-2hrs ). ## Reviewing logs @@ -173,7 +173,7 @@ Error: "failed to connect to MongoDB" ``` 1. Verify you've set the correct `APP_ENV` variable and corresponding `.env.ENVIRONMENT` exists in project root 2. Check your connection string in the corresponding `.env.ENVIRONMENT` file -3. Check connectivity to Atlas and that your IP is whitelisted +3. Check connectivity to Atlas and that your IP is whitelisted ### Other Issues @@ -181,7 +181,7 @@ Contact the Developer Docs team for assistance with environment setup or access. ## Disclaimer -Enlist the aid of the Great Docs Code Devourer at your peril! +Enlist the aid of the Great Docs Code Devourer at your peril! -This beast is an amalgam of tools with some test coverage, but key bits of business logic still remain uncovered by tests. +This beast is an amalgam of tools with some test coverage, but key bits of business logic still remain uncovered by tests. If demand/priority permits, we would love to expand and improve this tooling. diff --git a/audit/gdcd/scripts/README.md b/audit/gdcd/scripts/README.md index b3413d3..716d2cf 100644 --- a/audit/gdcd/scripts/README.md +++ b/audit/gdcd/scripts/README.md @@ -63,7 +63,7 @@ moved, we must manually adjust the count of new applied usage examples to omit t ```bash # Navigate to the scripts directory first -cd /Your/Local/Filepath/code-example-tooling/audit/gdcd/scripts +cd /Your/Local/Filepath/tooling/audit/gdcd/scripts # Then run the Go script go run parse-log.go ../logs/2025-09-24-18-01-30-app.log