Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions architectures/decentralized/client_versioning.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,19 +8,19 @@ There are two ways to specify the client version for a run:
## Docker RepoId hash

Once the client docker image is uploaded to DockerHub, a `RepoId` hash is associated with that image. This string is what should be used for
setting the client version in a run, toguether with the "sha256" part. For example, "sha256:ca978112ca1bbdcafac231b39a23dc4da786eff8147c4e72b9807785afee48bb".
setting the client version in a run, together with the "sha256" part. For example, "sha256:ca978112ca1bbdcafac231b39a23dc4da786eff8147c4e72b9807785afee48bb".

## Docker version tag

For setting a docker version tag, the image should be built with that tag set beforehand. This should be done in the `docker.nix` file, changing the `tag` field in the `docker-psyche-solana-client` docker package.

## Updating client docker version for a run

Once the new docker image uploaded to DockerHub and some version selected, you can update the client version required
Once the new docker image is uploaded to DockerHub and a version is selected, you can update the client version required
for a particular run with the following command:

[!] You should have the run owner solana key to successfully run this command
[!] The run must be paused beforehand to do the client version update
[!] You should have the run owner Solana key to successfully run this command.
[!] The run must be paused beforehand to do the client version update.

```bash
cargo run --release --bin run-manager -- \
Expand Down
16 changes: 8 additions & 8 deletions docker/README.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
# Docker Psyche

This folder contains some of the docker related files and scripts, mostly entrypoint scripts for the docker containers.
All the used docker images are created via nix docker-tools and can be found in the `packages.nix` file inside the `nix` directory.
All the docker images used are created via nix docker-tools and can be found in the `packages.nix` file inside the `nix` directory.

The purpose of using docker is two-fold:

- compartmentalize psyche client to be deployed and used in testing and production environments easily.
- implementing end-to-end tests that are as close as possible to a production environment.
- compartmentalize the Psyche client to be deployed and used in testing and production environments easily.
- implement end-to-end tests that are as close as possible to a production environment.

There are three concrete use-cases for the docker containers:

Expand All @@ -22,9 +22,9 @@ There are three concrete use-cases for the docker containers:
## Psyche Solana client

The `docker-psyche-solana-client` package works as the dockerfile used to build the image for the client that would be used by
end users, in a production-like environment `psyche-solana-cient binary` already built with nix.
end users, in a production-like environment `psyche-solana-client binary` already built with nix.
The `train_entrypoint.sh` script runs as the default entrypoint for the container, which is no more than a
call to the `psyche-solana-client` binary to start training, and some logic for restart the client in case it crashes.
call to the `psyche-solana-client` binary to start training, and some logic to restart the client in case it crashes.

## Psyche Solana test client

Expand All @@ -45,7 +45,7 @@ Its main task is to spawn the `solana-test-validator` and deploy the coordinator

### Starting solana-test-validator and deploying Coordinator

If you want to running a validator in your machine, then you will need to start the `solana-test-validator`
If you want to run a validator on your machine, then you will need to start the `solana-test-validator`
binary and then deploy the coordinator program. If you have started the validator and deployed the Coordinator
in another machine, you can skip to the next section.

Expand Down Expand Up @@ -96,7 +96,7 @@ This can be done using:
just setup_gpu_clients <num_clients>
```

where `<num_clients>` should be replaced with the number of clients you want spawn.
where `<num_clients>` should be replaced with the number of clients you want to spawn.

As soon as you run the previous `just` command, you will be prompted with a message saying something like

Expand Down Expand Up @@ -158,7 +158,7 @@ go directly to the `Join training run with the dockerized Psyche client` step.
To create a run, you will need to specify the model configuration file and the wallet that will be used
to pay for the creation of the run, as well as the devnet/mainnet RPC and websocket endpoint, and the **run ID**
of the training run.
Create an environment file in `config/client/.env`, if you don't already have one. There variables that should be present are:
Create an environment file in `config/client/.env`, if you don't already have one. The variables that should be present are:

- `RPC`: The url to the Solana RPC endpoint
- `WS_RPC`: The url to the Solana websocket endpoint
Expand Down
22 changes: 12 additions & 10 deletions shared/data-provider/README.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,24 @@
# data-provider

there's a bunch of functionality here, but the http stuff is what you probably wanna try out.
There's a bunch of functionality here, but the HTTP components are what you probably want to try out first.

## http data provider fetch example
## HTTP data provider fetch example

### Usage

#### working example
#### Working example

First, an example:

`cargo run --example http -- --file-size 40000004052 --batch-ids 103 --token-size 4 --tokenizer tests/resources/llama3_tokenizer.json urls https://storage.googleapis.com/nous-pretraining-public-us/fineweb-1pct-tokenized-llama3/000_fineweb.ds`

This will fetch some fineweb data & output it using the llama3 tokenizer!
This will fetch some FineWeb data and output it using the LLaMA 3 tokenizer.

#### Basic Command Structure
#### Basic command structure

```bash
cargo run --example http --file-size <SIZE> [--sequence-length <LENGTH>] [--token-size <SIZE>] --batch-ids <IDS> [--tokenizer <PATH>] <SUBCOMMAND>

```

The tool supports two main modes of operation: template-based URLs and explicit URL lists.
Expand All @@ -29,7 +31,7 @@ The tool supports two main modes of operation: template-based URLs and explicit

- `--sequence-length`: Length of each sequence (default: 2048)
- `--token-size`: Size of each token in bytes (default: 2)
- `--tokenizer`: Path to tokenizer file for decoding output
- `--tokenizer`: Path to a tokenizer file for decoding output

#### Subcommands

Expand All @@ -45,11 +47,11 @@ Example:
cargo run --example http --batch-ids 1,2,3 template "http://example.com/{}.ds" --start 0 --end 10
```

this will fetch urls http://example.com/0.ds thru http://example.com/10.ds
This will fetch URLs http://example.com/0.ds through http://example.com/10.ds.

###### left pad zeros
###### Left pad zeros

`--left-pad-zeros 3` will transform fetch URLs http://example.com/000.ds thru http://example.com/010.ds
Using `--left-pad-zeros 3` will transform the fetched URLs to http://example.com/000.ds through http://example.com/010.ds.

##### URL List Mode

Expand All @@ -65,7 +67,7 @@ cargo run --example http --batch-ids 1,2,3 urls "http://example.com/1.ds" "http:

### Examples

1. Fetch data using a template with tokenizer:
1. Fetch data using a template with a tokenizer:

```bash
cargo run --example http --batch-ids 1,2,3 --tokenizer ./tokenizer.json template "http://example.com/{}.ds" --start 0 --end 10
Expand Down
6 changes: 3 additions & 3 deletions tools/rust-tools/run-manager/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Thsi binary is a manager for Psyche client containers. It should allow users to connect to a Psyche without having to worry about client versions, as this performs the necessary checks beforehand.
This binary is a manager for Psyche client containers. It allows users to connect to the Psyche network without having to worry about client versions, as it performs the necessary checks beforehand.

One can run the run manager like this:

Expand All @@ -23,6 +23,6 @@ Where:
WALLET_PRIVATE_KEY_PATH=keys/keypair.json # Optional
```

- If `WALLET_PRIVATE_KEY_PATH` is defined it will use the specified keypair instead of the default `$HOME/.config/solana/id.json`
- If `WALLET_PRIVATE_KEY_PATH` is defined, it will use the specified keypair instead of the default `$HOME/.config/solana/id.json`

The run manager will also try to restart the client a few times in case it encounters an error. If you notice it somehow is stuck you may close the process manually via `ctrl+c` and run it again.
The run manager will also try to restart the client a few times in case it encounters an error. If you notice that it is stuck, you may close the process manually via Ctrl+C and run it again.