SAI is a local chatbot capable of answering queries about your documents, generating proposals and more.
The app expects to find models and user settings in the following directories:
$HOME/SecureAIdata/embeddings/llms
usersanonymous(or a username)/databasessecure-ai.db# this is a Sqlite database containing chats and proposals The correct directory structure is part of the project, so you should get it automatically when yougit cloneorgit pull. Alternatively, you can download it from google drive here: https://drive.google.com/drive/u/4/folders/11rt0F2DNn_VjBhNewOOVe4g6EDzJnUp4
The project does not contain the models themselves - only the directories and the system prompt templates. You need to download the models separately (from the google drive above or from Hugging Face) and put them in the correct directory structure.
- Embeddings models go in
./data/embeddings/EmbeddingModelName/ - Binary
.ggufLLM model files go in./data/llms/. GGUF is a file format used by llama.cpp.
The configuration file is located at ./data/config.yaml. This file is read by the software on startup. It sets the paths to the models. The format is:
trial_license: 'gAAAAABl9GNFdzETJ8jOjRcFUv950SUaFW-Cc7UhKYQznIvo2RPPCf7wlyXFTgGceKcUgZMfTRpCMTXXNIfn2h9mUjIClMnDzQ=='
app_secret: 'eexhVgr6W2LxxoMhTwJgJRAVsWoeOM0ntXP66ZehyQc='
salt: '8pn-8WZZ7PoMB1EeCJvRZDc5wdjZ8TAUN83OEZBv3ic='
base_url: http://localhost:8501
auth_enabled: False
enable_saved_databases: True
#demo_banner: Please contact us at <a href="mailto:admin@secureai.us">admin@secureai.us</a> to deploy Secure AI on your computer or corporate network, for secure use with your data.
inactivity_timeout: 60
cache_size: 20 # This is GB
port: 8501
llama_server_port: 8080
history_size: 3 # Number turns, or pairs of user/assistant, messages to keep in history
# The templates are defined in prompts.yaml
embeddings:
chunk_size: 3000 # characters (match to embeddings model ~4 chars/token)
chunk_overlap: 2000 # characters
embeddings_path: embeddings
embeddings_type: huggingface
embeddings_model_name: BAAI/bge-base-en-v1.5 # max 512 tokens
gpu_device: mps # This can be one of: cpu, cuda or mps (for mac metal). Used for embeddings.
retriever_type: vector
similarity_k: 5
similarity_threshold: 0.25
llms:
- title: llama-3-8b-instruct
llm_type: ollama
llama_server_slots: 4
n_gpu_layers: 100
min_p: 0.05
temperature: 0
n_ctx: 8192
llm_model: llama3:8b-instruct-q8_0You can keep local settings which will override those in config.yaml in a file called config.local.yaml. For example:
llm_few_shot_template:
- role: system
content: |
You are a very powerful assistant, but don't know current events.
llms:
- title: hermes-2-theta-llama-3-8b
llm_type: ollama
llama_server_slots: 4
n_gpu_layers: 100
min_p: 0.05
temperature: 0
n_ctx: 8192
llm_model: taozhiyuai/hermes-2-theta-llama-3:8b-q8_0The settings above would override the system prompt and default llm.
All prompts reside in prompts.yaml.
Any api keys or otherwise "secret" settings should go in secrets.yaml.
The trial_license configuration settings can be generated with the bin/generate_license.py script. It takes one argument, then number of days out for the license to expire. For example a thrity license:
python bin/generate_license.py 30
Generate trial timestamp: Thu Mar 14 15:19:18 2024
b'gAAAAABly9zWipV3RCQnyhahy8TNCzBlTfo9PTkqXYMCGkhctDJ5zuZZeAfOQvSD4HbIdA1BeFYkSQywJuLnHtkTUf6GNTjw0w=='
Building quick start, just run make. This will rebuild ollama in the vendor directory as well as rebuild the client in prod mode.
make clean && make -jThis project uses Poetry for dependency management.
# Create Python environment
$ poetry install
# Launch poetry shell
$ poetry shellIf you just need to build the client in production mode to test you can run the following:
(cd client && npm ci && npm run build)The built client is served when you load localhost:8501 in the browser. (Note: client code changes are not automatically build or served.)
To be able to develop the client and automatically see code changes reloaded in the browser run the Vite dev server:
cd client
npm install
npm run dev
cd ..In this mode you use port localhost:5173. The backend will be proxied correctly to port 8501 (the backend server)
$ python -m secure_aiThis mode will detect code changes and restart the dev server accordingly.
$ adev runserver --port 8501 --app-factory app_dev secure_aiThere are several options for installing Ollama. When you run make a version of Ollama is compiled to vendor/ollama.
You can add this directory to your path and then run the ollama command will be available in your path.
Alternatively you can use one of the Ollama installation methods found at https://github.com/ollama/ollama?tab=readme-ov-file#ollama. When using one of the installers beware that a service may be setup that starts Ollama in the background. You must disable it if that's the case. On Linux systemd based systems you can run the following to disable the ollama service:
sudo systemctl disable --now ollamaTo use models in Ollama you must first "pull" the model so Ollama has a local copy.
Initially Ollama will not have any models by default:
ollama list
NAME ID SIZE MODIFIEDFor example. If in config.yaml, I had the following:
llms:
- title: llama3:8b-instruct-q8_0
llm_type: ollama
llama_server_slots: 4
n_gpu_layers: 100
min_p: 0.05
temperature: 0
n_ctx: 8192
llm_model: llama3:8b-instruct-q8_0I would need to run SAI (which will start Ollama in the background), then use the following command to pull.
ollama pull llama3:8b-instruct-q8_0
pulling manifest
pulling 11a9680b0168... 100% ▕██████████████████████████████████████████████████████████████████████████████████▏ 8.5 GB
pulling 4fa551d4f938... 100% ▕██████████████████████████████████████████████████████████████████████████████████▏ 12 KB
pulling 8ab4849b038c... 100% ▕██████████████████████████████████████████████████████████████████████████████████▏ 254 B
pulling 577073ffcc6c... 100% ▕██████████████████████████████████████████████████████████████████████████████████▏ 110 B
pulling cdf310f424e6... 100% ▕██████████████████████████████████████████████████████████████████████████████████▏ 485 B
verifying sha256 digest
writing manifest
removing any unused layers
success After the pull we can see that now we have the model available:
ollama list
NAME ID SIZE MODIFIED
llama3:8b-instruct-q8_0 1b8e49cece7f 8.5 GB About a minute ago