Building upon kennethleungty/Llama-2-Open-Source-LLM-CPU-Inference
with a Streamlit front-end. It allows to load any .txt file or .pdf
document with text and asking any questions about it using LLMs on CPU.
Build the docker image:
docker build -t document-ama .Run it:
docker run -d -p 8501:8501 --name document-ama document-ama:latestOpen the Streamlit app in your browser:
http://localhost:8501/Init a VENV, if desired:
python3 -m venv venv
source venv/bin/activateInstall requirements:
pip install -r requirements.txtRun streamlit:
streamlit run app.pyA new browser window will pop up with the streamlit app.
(Also see the medium article for the original code architecture)
It is currently set up to run with the 8-bit quantized version of Llama2 that runs on GGML as the Q&A model and sentence-transformers/all-MiniLM-L6-v2 as the embeddings model. If those models are not present (e.g. in the first run), it will first download them.
Then, it will ask for a file to be uploaded. It needs to be either a .txt file or a .pdf file with selectable
text in them. It will not try to OCR the document.
Once the document is uploaded, Langchain is being used to:
- Load those documents into it;
- Extract the text with PyPDF;
- Split it into 500-character chunks (with 50-character overlap);
- Calculate the embeddings vector of each chunk;
- and finally load those embeddings with their respective chunks into a FAISS database.
The FAISS files are being saved on disk under a folder with the checksum of each file, so if the exact same file is uploaded again it will just reuse the previously created database instead of re-doing them.
After that is done, it asks for the question. It will then load the LLM model into memory using CTransformers and keep using Langchain to:
- Load the FAISS db to memory;
- Build a prompt template with the question and the hardcoded prompt template string;
- Build a RetrievalQA database with the LLM to load the context relative to the question into the prompt template;
- Ask the RetrievalQA database the question
After the RetrievalQA returns with an answer, it will display the answer and the relevant passages of the text to that answer.
