@@ -121,8 +121,7 @@ significantly reduce the IO overhead and avoid potential race condition.
121121
122122
123123 If you’re setting up a standalone ChromaDB server, I recommend sticking to
124- v0.6.3. ChromaDB recently released v1.0.0, which may not work with VectorCode.
125- I’m testing with v1.0.0 and will publish a new release when it’s ready.
124+ v0.6.3, because VectorCode is not ready for the upgrade to ChromaDB 1.0 yet.
126125
127126FOR WINDOWS USERS ~
128127
@@ -146,6 +145,8 @@ NIX ~
146145
147146A community-maintained Nix package is available here
148147<https://search.nixos.org/packages?channel=unstable&from=0&size=50&sort=relevance&type=packages&query=vectorcode >.
148+ If you’re using nix to install a standalone Chromadb server, make sure to
149+ stick to 0.6.3 <https://github.com/NixOS/nixpkgs/pull/412528 >.
149150
150151
151152GETTING STARTED *VectorCode-cli-vectorcode-command-line-tool-getting-started*
@@ -212,7 +213,7 @@ REFRESHING EMBEDDINGS ~
212213
213214To maintain the accuracy of the vector search, it’s important to keep your
214215embeddings up-to-date. You can simply run the `vectorise` subcommand on a file
215- to refresh the embedding for a particular file, and the CLI provides a
216+ to refresh the embedding for that file. Apart from that, the CLI provides a
216217`vectorcode update` subcommand, which updates the embeddings for all files that
217218are currently indexed by VectorCode for the current project.
218219
@@ -241,8 +242,8 @@ For each project, VectorCode creates a collection (similar to tables in
241242traditional databases) and puts the code embeddings in the corresponding
242243collection. In the root directory of a project, you may run `vectorcode init`.
243244This will initialise the repository with a subdirectory
244- `project_root/.vectorcode/ ` . This will mark this directory a _project root_, a
245- concept that will later be used to construct the collection. You may put a
245+ `project_root/.vectorcode/ ` . This will mark this directory as a _project root_,
246+ a concept that will later be used to construct the collection. You may put a
246247`config.json` file in `project_root/.vectorcode` . This file may be used to
247248store project-specific settings such as embedding functions and database entry
248249point (more on this later). If you already have a global configuration file at
@@ -272,31 +273,22 @@ hooks. The `init` subcommand provides a `--hooks` flag which helps you manage
272273hooks when working with a git repository. You can put some custom hooks in
273274`~/.config/vectorcode/hooks/ ` and the `vectorcode init --hooks` command will
274275pick them up and append them to your existing hooks, or create new hook scripts
275- if they don’t exist yet. The hook files should be named the same as they
276- would be under the `.git/hooks` directory. For example, a pre-commit hook would
277- be named `~/.config/vectorcode/hooks/pre-commit ` .
276+ if they don’t exist yet. The custom hook files should be named the same as
277+ they would be under the `.git/hooks` directory. For example, a pre-commit hook
278+ would be named `~/.config/vectorcode/hooks/pre-commit ` .
278279
279280By default, there are 2 pre-defined hooks:
280281
281- >bash
282- # pre-commit hook that vectorise changed files before you commit.
283- diff_files=$(git diff --cached --name-only)
284- [ -z "$diff_files" ] || vectorcode vectorise $diff_files
285- <
282+ 1. A pre-commit hook that vectorises the modified files.
283+ 2. A post-checkout hook that:- vectorises the full repository if it’s an initial commit/clone and a
284+ `vectorcode.include ` spec is available (either locally in the project or
285+ globally);
286+ - vectorises the files changed by the checkout.
287+
286288
287- >bash
288- # post-checkout hook that vectorise changed files when you checkout to a
289- # different branch/tag/commit
290- files=$(git diff --name-only "$1" "$2")
291- [ -z "$files" ] || vectorcode vectorise $files
292- <
293289
294- When you run `vectorcode init --hooks` in a git repo, these 2 hooks will be
295- added to your `.git/hooks/ ` . Hooks that are managed by VectorCode will be
296- wrapped by `# VECTORCODE_HOOK_START` and `# VECTORCODE_HOOK_END` comment lines.
297- They help VectorCode determine whether hooks have been added, so don’t delete
298- the markers unless you know what you’re doing. To remove the hooks, simply
299- delete the lines wrapped by these 2 comment strings.
290+ Both hooks will only be triggered on repositories that have a `.vectorcode`
291+ directory in them.
300292
301293
302294CONFIGURING VECTORCODE ~
@@ -328,31 +320,32 @@ model_name="nomic-embed-text")`. Default: `{}`; - `db_url`string, the url that
328320points to the Chromadb server. VectorCode will start an HTTP server for
329321Chromadb at a randomly picked free port on `localhost` if your configured
330322`http:// host:port` is not accessible. Default: `http:// 127.0 .0.1 :8000 ` ; -
331- `db_path` string, Path to local persistent database. This is where the files for
332- your database will be stored. Default: `~/.local/share/vectorcode/chromadb/ ` ; -
333- `db_log_path` string, path to the _directory_ where the built-in chromadb server
334- will write the log to. Default: `~/.local/share/vectorcode/ ` ; -
335- `chunk_size` integer, the maximum number of characters per chunk. A larger value
336- reduces the number of items in the database, and hence accelerates the search,
337- but at the cost of potentially truncated data and lost information. Default:
338- `2500 ` . To disable chunking, set it to a negative number; -
339- `overlap_ratio` float between 0 and 1, the ratio of overlapping/shared content
340- between 2 adjacent chunks. A larger ratio improves the coherences of chunks,
341- but at the cost of increasing number of entries in the database and hence
342- slowing down the search. Default: `0.2 ` . _Starting from 0.4.11, VectorCode will
343- use treesitter to parse languages that it can automatically detect. It uses
344- pygments to guess the language from filename, and tree-sitter-language-pack to
345- fetch the correct parser. overlap_ratio has no effects when treesitter works.
346- If VectorCode fails to find an appropriate parser, it’ll fallback to the
347- legacy naive parser, in which case overlap_ratio works exactly in the same way
348- as before;_ - `query_multiplier` integer, when you use the `query` command to
349- retrieve `n ` documents, VectorCode will check `n * query_multiplier` chunks and
350- return at most `n ` documents. A larger value of `query_multiplier` guarantees
351- the return of `n ` documents, but with the risk of including too many
352- less-relevant chunks that may affect the document selection. Default: `-1 ` (any
353- negative value means selecting documents based on all indexed chunks); -
354- `reranker` string, the reranking method to use. Currently supports
355- `CrossEncoderReranker` (default, using sentence-transformers cross-encoder
323+ `db_path` string, Path to local persistent database. If you didn’t set up a
324+ standalone Chromadb server, this is where the files for your database will be
325+ stored. Default: `~/.local/share/vectorcode/chromadb/ ` ; - `db_log_path` string,
326+ path to the _directory_ where the built-in chromadb server will write the log
327+ to. Default: `~/.local/share/vectorcode/ ` ; - `chunk_size` integer, the maximum
328+ number of characters per chunk. A larger value reduces the number of items in
329+ the database, and hence accelerates the search, but at the cost of potentially
330+ truncated data and lost information. Default: `2500 ` . To disable chunking, set
331+ it to a negative number; - `overlap_ratio` float between 0 and 1, the ratio of
332+ overlapping/shared content between 2 adjacent chunks. A larger ratio improves
333+ the coherence of chunks, but at the cost of increasing number of entries in the
334+ database and hence slowing down the search. Default: `0.2 ` . _Starting from
335+ 0.4.11, VectorCode will use treesitter to parse languages that it can
336+ automatically detect. It uses pygments to guess the language from filename, and
337+ tree-sitter-language-pack to fetch the correct parser. overlap_ratio has no
338+ effects when treesitter works. If VectorCode fails to find an appropriate
339+ parser, it’ll fallback to the legacy naive parser, in which case
340+ overlap_ratio works exactly in the same way as before;_ -
341+ `query_multiplier` integer, when you use the `query` command to retrieve `n `
342+ documents, VectorCode will check `n * query_multiplier` chunks and return at
343+ most `n ` documents. A larger value of `query_multiplier` guarantees the return
344+ of `n ` documents, but with the risk of including too many less-relevant chunks
345+ that may affect the document selection. Default: `-1 ` (any negative value means
346+ selecting documents based on all indexed chunks); - `reranker` string, the
347+ reranking method to use. Currently supports `CrossEncoderReranker` (default,
348+ using sentence-transformers cross-encoder
356349<https://sbert.net/docs/package_reference/cross_encoder/cross_encoder.html > )
357350and `NaiveReranker` (sort chunks by the "distance" between the embedding
358351vectors); - `reranker_params` dictionary, similar to `embedding_params` . The
@@ -361,17 +354,16 @@ these are the options passed to the `CrossEncoder`
361354<https://sbert.net/docs/package_reference/cross_encoder/cross_encoder.html#id1 >
362355class. For example, if you want to use a non-default model, you can use the
363356following: `json { "reranker_params": { "model_name_or_path": "your_model_here"
364- } }` ; - `db_settings` dictionary, works in a similar way to `embedding_params` ,
357+ } }` - `db_settings` dictionary, works in a similar way to `embedding_params` ,
365358but for Chromadb client settings so that you can configure authentication for
366359remote Chromadb <https://docs.trychroma.com/production/administration/auth >; -
367360`hnsw` a dictionary of hnsw settings
368361<https://cookbook.chromadb.dev/core/configuration/#hnsw-configuration > that may
369362improve the query performances or avoid runtime errors during queries. **It’s
370363recommended to re-vectorise the collection after modifying these options,
371364because some of the options can only be set during collection creation.**
372- Example: `json5 // the following is the default value. "hnsw": { "hnsw:M": 64,
373- }` - `filetype_map` `dict[str, list[str]]`, a dictionary where keys are language
374- name
365+ Example (and default): `json5 "hnsw": { "hnsw:M": 64, }` -
366+ `filetype_map` `dict[str, list[str]]`, a dictionary where keys are language name
375367<https://github.com/Goldziher/tree-sitter-language-pack?tab=readme-ov-file#available-languages >
376368and values are lists of Python regex patterns
377369<https://docs.python.org/3/library/re.html > that will match file extensions.
@@ -566,7 +558,7 @@ the `VECTORCODE_LOG_LEVEL` variable to one of `ERROR`, `WARN` (`WARNING`),
566558`INFO` or `DEBUG ` . For the CLI that you interact with in your shell, this will
567559output logs to `STDERR` and write a log file to
568560`~/.local/share/vectorcode/logs/ ` . For LSP and MCP servers, because `STDIO` is
569- used for the RPC, only the log file will be written .
561+ used for the RPC, the logs will only be written to the log file, not `STDERR` .
570562
571563For example:
572564
@@ -575,6 +567,9 @@ For example:
575567<
576568
577569
570+ Depending on the MCP/LSP client implementation, you may need to take extra
571+ steps to make sure the environment variables are captured by VectorCode.
572+
578573SHELL COMPLETION*VectorCode-cli-vectorcode-command-line-tool-shell-completion*
579574
580575VectorCode supports shell completion for bash/zsh/tcsh. You can use `vectorcode
@@ -602,9 +597,9 @@ following options in the JSON config file:
602597For Intel users, sentence transformer <https://www.sbert.net/index.html >
603598supports OpenVINO
604599<https://www.intel.com/content/www/us/en/developer/tools/openvino-toolkit/overview.html >
605- backend for supported GPU. Run `pipx install vectorcode[intel]` which will
606- bundle the relevant libraries when you install VectorCode. After that, you will
607- need to configure `SentenceTransformer` to use `openvino` backend. In your
600+ backend for supported GPU. Run `uv install vectorcode[intel]` which will bundle
601+ the relevant libraries when you install VectorCode. After that, you will need
602+ to configure `SentenceTransformer` to use `openvino` backend. In your
608603`config.json` , set `backend` key in `embedding_params` to `" openvino" `
609604
610605>json
0 commit comments