Skip to content

secure-ai-corp/secure-ai

Repository files navigation

Secure AI

SAI is a local chatbot capable of answering queries about your documents, generating proposals and more.

Setup

Store data locally

The app expects to find models and user settings in the following directories:

  • $HOME/SecureAI
    • data
      • /embeddings
      • /llms
    • users

The project does not contain the models themselves - only the directories and the system prompt templates. You need to download the models separately (from the google drive above or from Hugging Face) and put them in the correct directory structure.

  • Embeddings models go in ./data/embeddings/EmbeddingModelName/
  • Binary .gguf LLM model files go in ./data/llms/. GGUF is a file format used by llama.cpp.

Configuration file

The configuration file is located at ./data/config.yaml. This file is read by the software on startup. It sets the paths to the models. The format is:

trial_license: 'gAAAAABl9GNFdzETJ8jOjRcFUv950SUaFW-Cc7UhKYQznIvo2RPPCf7wlyXFTgGceKcUgZMfTRpCMTXXNIfn2h9mUjIClMnDzQ=='
app_secret: 'eexhVgr6W2LxxoMhTwJgJRAVsWoeOM0ntXP66ZehyQc='
salt: '8pn-8WZZ7PoMB1EeCJvRZDc5wdjZ8TAUN83OEZBv3ic='
base_url: http://localhost:8501
auth_enabled: False
enable_saved_databases: True
#demo_banner: Please contact us at <a href="mailto:admin@secureai.us">admin@secureai.us</a> to deploy Secure AI on your computer or corporate network, for secure use with your data.
inactivity_timeout: 60
cache_size: 20 # This is GB
port: 8501
llama_server_port: 8080
history_size: 3  # Number turns, or pairs of user/assistant, messages to keep in history

# The templates are defined in prompts.yaml

embeddings:
    chunk_size: 3000 # characters (match to embeddings model ~4 chars/token)
    chunk_overlap: 2000  # characters
    embeddings_path: embeddings
    embeddings_type: huggingface
    embeddings_model_name: BAAI/bge-base-en-v1.5  # max 512 tokens
    gpu_device: mps  # This can be one of: cpu, cuda or mps (for mac metal). Used for embeddings.
    retriever_type: vector
    similarity_k: 5
    similarity_threshold: 0.25

llms:
  - title: llama-3-8b-instruct
    llm_type: ollama
    llama_server_slots: 4
    n_gpu_layers: 100
    min_p: 0.05
    temperature: 0
    n_ctx: 8192
    llm_model: llama3:8b-instruct-q8_0

You can keep local settings which will override those in config.yaml in a file called config.local.yaml. For example:

llm_few_shot_template:
  - role: system
    content: |
      You are a very powerful assistant, but don't know current events.

llms:
  - title: hermes-2-theta-llama-3-8b
    llm_type: ollama
    llama_server_slots: 4
    n_gpu_layers: 100
    min_p: 0.05
    temperature: 0
    n_ctx: 8192
    llm_model: taozhiyuai/hermes-2-theta-llama-3:8b-q8_0

The settings above would override the system prompt and default llm.

All prompts reside in prompts.yaml.

Any api keys or otherwise "secret" settings should go in secrets.yaml.

Generate a license string

The trial_license configuration settings can be generated with the bin/generate_license.py script. It takes one argument, then number of days out for the license to expire. For example a thrity license:

python bin/generate_license.py 30
Generate trial timestamp: Thu Mar 14 15:19:18 2024
b'gAAAAABly9zWipV3RCQnyhahy8TNCzBlTfo9PTkqXYMCGkhctDJ5zuZZeAfOQvSD4HbIdA1BeFYkSQywJuLnHtkTUf6GNTjw0w=='

Building quick start, just run make. This will rebuild ollama in the vendor directory as well as rebuild the client in prod mode.

make clean && make -j

Poetry virtual environment

This project uses Poetry for dependency management.

# Create Python environment
$ poetry install

# Launch poetry shell
$ poetry shell

Building the client

Prod build

If you just need to build the client in production mode to test you can run the following:

(cd client && npm ci && npm run build)

The built client is served when you load localhost:8501 in the browser. (Note: client code changes are not automatically build or served.)

Dev build

To be able to develop the client and automatically see code changes reloaded in the browser run the Vite dev server:

cd client
npm install
npm run dev
cd ..

In this mode you use port localhost:5173. The backend will be proxied correctly to port 8501 (the backend server)

Running server

Running in prod-mode (from poetry shell)

config.yaml

$ python -m secure_ai

Running in dev-mode (from poetry shell)

This mode will detect code changes and restart the dev server accordingly.

$ adev runserver --port 8501 --app-factory app_dev secure_ai

Tips and tricks

Ollama

There are several options for installing Ollama. When you run make a version of Ollama is compiled to vendor/ollama. You can add this directory to your path and then run the ollama command will be available in your path.

Alternatively you can use one of the Ollama installation methods found at https://github.com/ollama/ollama?tab=readme-ov-file#ollama. When using one of the installers beware that a service may be setup that starts Ollama in the background. You must disable it if that's the case. On Linux systemd based systems you can run the following to disable the ollama service:

sudo systemctl disable --now ollama

Pulling models

To use models in Ollama you must first "pull" the model so Ollama has a local copy.

Initially Ollama will not have any models by default:

ollama list
NAME	ID	SIZE	MODIFIED

For example. If in config.yaml, I had the following:

llms:
  - title: llama3:8b-instruct-q8_0
    llm_type: ollama
    llama_server_slots: 4
    n_gpu_layers: 100
    min_p: 0.05
    temperature: 0
    n_ctx: 8192
    llm_model: llama3:8b-instruct-q8_0

I would need to run SAI (which will start Ollama in the background), then use the following command to pull.

ollama pull llama3:8b-instruct-q8_0
pulling manifest 
pulling 11a9680b0168... 100% ▕██████████████████████████████████████████████████████████████████████████████████▏ 8.5 GB                         
pulling 4fa551d4f938... 100% ▕██████████████████████████████████████████████████████████████████████████████████▏  12 KB                         
pulling 8ab4849b038c... 100% ▕██████████████████████████████████████████████████████████████████████████████████▏  254 B                         
pulling 577073ffcc6c... 100% ▕██████████████████████████████████████████████████████████████████████████████████▏  110 B                         
pulling cdf310f424e6... 100% ▕██████████████████████████████████████████████████████████████████████████████████▏  485 B                         
verifying sha256 digest 
writing manifest 
removing any unused layers 
success 

After the pull we can see that now we have the model available:

ollama list
NAME                   	ID          	SIZE  	MODIFIED           
llama3:8b-instruct-q8_0	1b8e49cece7f	8.5 GB	About a minute ago	

About

Secure AI: Enterprise Artificial Intelligence

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors