Skip to content

Commit fc4391b

Browse files
authored
docs: Documentation updates (#207)
* feat(docs): Improve documentation and update asciicast * Auto generate docs * feat(readme): Update asciicast script in README * docs: Improve documentation and update asciicast * Auto generate docs --------- Co-authored-by: Davidyz <Davidyz@users.noreply.github.com>
1 parent a03999a commit fc4391b

File tree

6 files changed

+221
-223
lines changed

6 files changed

+221
-223
lines changed

README.md

Lines changed: 9 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -7,18 +7,13 @@
77
VectorCode is a code repository indexing tool. It helps you build better prompt
88
for your coding LLMs by indexing and providing information about the code
99
repository you're working on. This repository also contains the corresponding
10-
neovim plugin because that's what I used to write this tool.
10+
neovim plugin that provides a set of APIs for you to build or enhance AI plugins,
11+
and integrations for some of the popular plugins.
1112

1213
> [!NOTE]
1314
> This project is in beta quality and is undergoing rapid iterations.
1415
> I know there are plenty of rooms for improvements, and any help is welcomed.
1516
16-
> [!NOTE]
17-
> [Chromadb](https://www.trychroma.com/), the vector database backend behind
18-
> this project, supports multiple embedding engines. I developed this tool using
19-
> SentenceTransformer, but if you encounter any issues with a different embedding
20-
> function, please open an issue (or even better, a pull request :D).
21-
2217
<!-- mtoc-start -->
2318

2419
* [Why VectorCode?](#why-vectorcode)
@@ -37,22 +32,23 @@ releases. Their capabilities on these projects are quite limited. With
3732
VectorCode, you can easily (and programmatically) inject task-relevant context
3833
from the project into the prompt. This significantly improves the quality of the
3934
model output and reduce hallucination.
40-
![](./images/codecompanion_chat.png)
35+
36+
[![asciicast](https://asciinema.org/a/8WP8QJHNAR9lEllZSSx3poLPD.svg)](https://asciinema.org/a/8WP8QJHNAR9lEllZSSx3poLPD?t=3)
4137

4238
## Documentation
4339

4440
> [!NOTE]
45-
> The documentation on the `main` branch reflects the code on the latest commit
46-
> (apologies if I forget to update the docs, but this will be what I aim for). To
47-
> check for the documentation for the version you're using, you can [check out
41+
> The documentation on the `main` branch reflects the code on the latest commit.
42+
> To check for the documentation for the version you're using, you can [check out
4843
> the corresponding tags](https://github.com/Davidyz/VectorCode/tags).
4944
5045
- For the setup and usage of the command-line tool, see [the CLI documentation](./docs/cli.md);
5146
- For neovim users, after you've gone through the CLI documentation, please refer to
5247
[the neovim plugin documentation](./docs/neovim.md) for further instructions.
5348
- Additional resources:
5449
- the [wiki](https://github.com/Davidyz/VectorCode/wiki) for extra tricks and
55-
tips that will help you get the most out of VectorCode;
50+
tips that will help you get the most out of VectorCode, as well as
51+
instructions to setup VectorCode to work with some other neovim plugins;
5652
- the [discussions](https://github.com/Davidyz/VectorCode/discussions) where
5753
you can ask general questions and share your cool usages about VectorCode.
5854

@@ -98,7 +94,7 @@ This project follows an adapted semantic versioning:
9894
- [ ] ability to view and delete files in a collection (atm you can only `drop`
9995
and `vectorise` again);
10096
- [x] joint search (kinda, using codecompanion.nvim/MCP);
101-
- [ ] Nix support (#144);
97+
- [x] Nix support (unofficial packages [here](https://search.nixos.org/packages?channel=unstable&from=0&size=50&sort=relevance&type=packages&query=vectorcode));
10298
- [ ] Query rewriting (#124).
10399

104100

doc/VectorCode-cli.txt

Lines changed: 53 additions & 58 deletions
Original file line numberDiff line numberDiff line change
@@ -121,8 +121,7 @@ significantly reduce the IO overhead and avoid potential race condition.
121121

122122

123123
If you’re setting up a standalone ChromaDB server, I recommend sticking to
124-
v0.6.3. ChromaDB recently released v1.0.0, which may not work with VectorCode.
125-
I’m testing with v1.0.0 and will publish a new release when it’s ready.
124+
v0.6.3, because VectorCode is not ready for the upgrade to ChromaDB 1.0 yet.
126125

127126
FOR WINDOWS USERS ~
128127

@@ -146,6 +145,8 @@ NIX ~
146145

147146
A community-maintained Nix package is available here
148147
<https://search.nixos.org/packages?channel=unstable&from=0&size=50&sort=relevance&type=packages&query=vectorcode>.
148+
If you’re using nix to install a standalone Chromadb server, make sure to
149+
stick to 0.6.3 <https://github.com/NixOS/nixpkgs/pull/412528>.
149150

150151

151152
GETTING STARTED *VectorCode-cli-vectorcode-command-line-tool-getting-started*
@@ -212,7 +213,7 @@ REFRESHING EMBEDDINGS ~
212213

213214
To maintain the accuracy of the vector search, it’s important to keep your
214215
embeddings up-to-date. You can simply run the `vectorise` subcommand on a file
215-
to refresh the embedding for a particular file, and the CLI provides a
216+
to refresh the embedding for that file. Apart from that, the CLI provides a
216217
`vectorcode update` subcommand, which updates the embeddings for all files that
217218
are currently indexed by VectorCode for the current project.
218219

@@ -241,8 +242,8 @@ For each project, VectorCode creates a collection (similar to tables in
241242
traditional databases) and puts the code embeddings in the corresponding
242243
collection. In the root directory of a project, you may run `vectorcode init`.
243244
This will initialise the repository with a subdirectory
244-
`project_root/.vectorcode/`. This will mark this directory a _project root_, a
245-
concept that will later be used to construct the collection. You may put a
245+
`project_root/.vectorcode/`. This will mark this directory as a _project root_,
246+
a concept that will later be used to construct the collection. You may put a
246247
`config.json` file in `project_root/.vectorcode`. This file may be used to
247248
store project-specific settings such as embedding functions and database entry
248249
point (more on this later). If you already have a global configuration file at
@@ -272,31 +273,22 @@ hooks. The `init` subcommand provides a `--hooks` flag which helps you manage
272273
hooks when working with a git repository. You can put some custom hooks in
273274
`~/.config/vectorcode/hooks/` and the `vectorcode init --hooks` command will
274275
pick them up and append them to your existing hooks, or create new hook scripts
275-
if they don’t exist yet. The hook files should be named the same as they
276-
would be under the `.git/hooks` directory. For example, a pre-commit hook would
277-
be named `~/.config/vectorcode/hooks/pre-commit`.
276+
if they don’t exist yet. The custom hook files should be named the same as
277+
they would be under the `.git/hooks` directory. For example, a pre-commit hook
278+
would be named `~/.config/vectorcode/hooks/pre-commit`.
278279

279280
By default, there are 2 pre-defined hooks:
280281

281-
>bash
282-
# pre-commit hook that vectorise changed files before you commit.
283-
diff_files=$(git diff --cached --name-only)
284-
[ -z "$diff_files" ] || vectorcode vectorise $diff_files
285-
<
282+
1. A pre-commit hook that vectorises the modified files.
283+
2. A post-checkout hook that:- vectorises the full repository if it’s an initial commit/clone and a
284+
`vectorcode.include` spec is available (either locally in the project or
285+
globally);
286+
- vectorises the files changed by the checkout.
287+
286288

287-
>bash
288-
# post-checkout hook that vectorise changed files when you checkout to a
289-
# different branch/tag/commit
290-
files=$(git diff --name-only "$1" "$2")
291-
[ -z "$files" ] || vectorcode vectorise $files
292-
<
293289

294-
When you run `vectorcode init --hooks` in a git repo, these 2 hooks will be
295-
added to your `.git/hooks/`. Hooks that are managed by VectorCode will be
296-
wrapped by `# VECTORCODE_HOOK_START` and `# VECTORCODE_HOOK_END` comment lines.
297-
They help VectorCode determine whether hooks have been added, so don’t delete
298-
the markers unless you know what you’re doing. To remove the hooks, simply
299-
delete the lines wrapped by these 2 comment strings.
290+
Both hooks will only be triggered on repositories that have a `.vectorcode`
291+
directory in them.
300292

301293

302294
CONFIGURING VECTORCODE ~
@@ -328,31 +320,32 @@ model_name="nomic-embed-text")`. Default: `{}`; - `db_url`string, the url that
328320
points to the Chromadb server. VectorCode will start an HTTP server for
329321
Chromadb at a randomly picked free port on `localhost` if your configured
330322
`http://host:port` is not accessible. Default: `http://127.0.0.1:8000`; -
331-
`db_path`string, Path to local persistent database. This is where the files for
332-
your database will be stored. Default: `~/.local/share/vectorcode/chromadb/`; -
333-
`db_log_path`string, path to the _directory_ where the built-in chromadb server
334-
will write the log to. Default: `~/.local/share/vectorcode/`; -
335-
`chunk_size`integer, the maximum number of characters per chunk. A larger value
336-
reduces the number of items in the database, and hence accelerates the search,
337-
but at the cost of potentially truncated data and lost information. Default:
338-
`2500`. To disable chunking, set it to a negative number; -
339-
`overlap_ratio`float between 0 and 1, the ratio of overlapping/shared content
340-
between 2 adjacent chunks. A larger ratio improves the coherences of chunks,
341-
but at the cost of increasing number of entries in the database and hence
342-
slowing down the search. Default: `0.2`. _Starting from 0.4.11, VectorCode will
343-
use treesitter to parse languages that it can automatically detect. It uses
344-
pygments to guess the language from filename, and tree-sitter-language-pack to
345-
fetch the correct parser. overlap_ratio has no effects when treesitter works.
346-
If VectorCode fails to find an appropriate parser, it’ll fallback to the
347-
legacy naive parser, in which case overlap_ratio works exactly in the same way
348-
as before;_ - `query_multiplier`integer, when you use the `query` command to
349-
retrieve `n` documents, VectorCode will check `n * query_multiplier` chunks and
350-
return at most `n` documents. A larger value of `query_multiplier` guarantees
351-
the return of `n` documents, but with the risk of including too many
352-
less-relevant chunks that may affect the document selection. Default: `-1` (any
353-
negative value means selecting documents based on all indexed chunks); -
354-
`reranker`string, the reranking method to use. Currently supports
355-
`CrossEncoderReranker` (default, using sentence-transformers cross-encoder
323+
`db_path`string, Path to local persistent database. If you didn’t set up a
324+
standalone Chromadb server, this is where the files for your database will be
325+
stored. Default: `~/.local/share/vectorcode/chromadb/`; - `db_log_path`string,
326+
path to the _directory_ where the built-in chromadb server will write the log
327+
to. Default: `~/.local/share/vectorcode/`; - `chunk_size`integer, the maximum
328+
number of characters per chunk. A larger value reduces the number of items in
329+
the database, and hence accelerates the search, but at the cost of potentially
330+
truncated data and lost information. Default: `2500`. To disable chunking, set
331+
it to a negative number; - `overlap_ratio`float between 0 and 1, the ratio of
332+
overlapping/shared content between 2 adjacent chunks. A larger ratio improves
333+
the coherence of chunks, but at the cost of increasing number of entries in the
334+
database and hence slowing down the search. Default: `0.2`. _Starting from
335+
0.4.11, VectorCode will use treesitter to parse languages that it can
336+
automatically detect. It uses pygments to guess the language from filename, and
337+
tree-sitter-language-pack to fetch the correct parser. overlap_ratio has no
338+
effects when treesitter works. If VectorCode fails to find an appropriate
339+
parser, it’ll fallback to the legacy naive parser, in which case
340+
overlap_ratio works exactly in the same way as before;_ -
341+
`query_multiplier`integer, when you use the `query` command to retrieve `n`
342+
documents, VectorCode will check `n * query_multiplier` chunks and return at
343+
most `n` documents. A larger value of `query_multiplier` guarantees the return
344+
of `n` documents, but with the risk of including too many less-relevant chunks
345+
that may affect the document selection. Default: `-1` (any negative value means
346+
selecting documents based on all indexed chunks); - `reranker`string, the
347+
reranking method to use. Currently supports `CrossEncoderReranker` (default,
348+
using sentence-transformers cross-encoder
356349
<https://sbert.net/docs/package_reference/cross_encoder/cross_encoder.html> )
357350
and `NaiveReranker` (sort chunks by the "distance" between the embedding
358351
vectors); - `reranker_params`dictionary, similar to `embedding_params`. The
@@ -361,17 +354,16 @@ these are the options passed to the `CrossEncoder`
361354
<https://sbert.net/docs/package_reference/cross_encoder/cross_encoder.html#id1>
362355
class. For example, if you want to use a non-default model, you can use the
363356
following: `json { "reranker_params": { "model_name_or_path": "your_model_here"
364-
} }` ; - `db_settings`dictionary, works in a similar way to `embedding_params`,
357+
} }` - `db_settings`dictionary, works in a similar way to `embedding_params`,
365358
but for Chromadb client settings so that you can configure authentication for
366359
remote Chromadb <https://docs.trychroma.com/production/administration/auth>; -
367360
`hnsw`a dictionary of hnsw settings
368361
<https://cookbook.chromadb.dev/core/configuration/#hnsw-configuration> that may
369362
improve the query performances or avoid runtime errors during queries. **It’s
370363
recommended to re-vectorise the collection after modifying these options,
371364
because some of the options can only be set during collection creation.**
372-
Example: `json5 // the following is the default value. "hnsw": { "hnsw:M": 64,
373-
}` - `filetype_map``dict[str, list[str]]`, a dictionary where keys are language
374-
name
365+
Example (and default): `json5 "hnsw": { "hnsw:M": 64, }` -
366+
`filetype_map``dict[str, list[str]]`, a dictionary where keys are language name
375367
<https://github.com/Goldziher/tree-sitter-language-pack?tab=readme-ov-file#available-languages>
376368
and values are lists of Python regex patterns
377369
<https://docs.python.org/3/library/re.html> that will match file extensions.
@@ -566,7 +558,7 @@ the `VECTORCODE_LOG_LEVEL` variable to one of `ERROR`, `WARN` (`WARNING`),
566558
`INFO` or `DEBUG`. For the CLI that you interact with in your shell, this will
567559
output logs to `STDERR` and write a log file to
568560
`~/.local/share/vectorcode/logs/`. For LSP and MCP servers, because `STDIO` is
569-
used for the RPC, only the log file will be written.
561+
used for the RPC, the logs will only be written to the log file, not `STDERR`.
570562

571563
For example:
572564

@@ -575,6 +567,9 @@ For example:
575567
<
576568

577569

570+
Depending on the MCP/LSP client implementation, you may need to take extra
571+
steps to make sure the environment variables are captured by VectorCode.
572+
578573
SHELL COMPLETION*VectorCode-cli-vectorcode-command-line-tool-shell-completion*
579574

580575
VectorCode supports shell completion for bash/zsh/tcsh. You can use `vectorcode
@@ -602,9 +597,9 @@ following options in the JSON config file:
602597
For Intel users, sentence transformer <https://www.sbert.net/index.html>
603598
supports OpenVINO
604599
<https://www.intel.com/content/www/us/en/developer/tools/openvino-toolkit/overview.html>
605-
backend for supported GPU. Run `pipx install vectorcode[intel]` which will
606-
bundle the relevant libraries when you install VectorCode. After that, you will
607-
need to configure `SentenceTransformer` to use `openvino` backend. In your
600+
backend for supported GPU. Run `uv install vectorcode[intel]` which will bundle
601+
the relevant libraries when you install VectorCode. After that, you will need
602+
to configure `SentenceTransformer` to use `openvino` backend. In your
608603
`config.json`, set `backend` key in `embedding_params` to `"openvino"`
609604

610605
>json

0 commit comments

Comments
 (0)