Intelligent Search Engine for Belief Legend Embeddings
- Clone this repository:
git clone https://github.com/broadwell/isebelle.gitandcd isebelle - Install and run Docker
- Install the just runner for your system
- Create an appropriate
.envfile in theisebellemain folder:
cat > .env <<EOL
DB_NAME=isebelle
DB_USER=isebelle
DB_PASSWORD=$(LC_ALL=C tr -cd 'a-zA-Z0-9' < /dev/urandom | fold -w24 | head -n 1)
# DB_HOST=localhost ## Ignored if using docker-compose.yaml
# DB_PORT=5432 ## Ignored if using docker-compose.yaml
STORIES_SRC_FOLDER=/path/to/source/collections
JUPYTER_PASSWORD=secret_password
LOG_LEVEL=INFO ## Set to DEBUG for additional logging
EOL- Run
docker compose uporjust up - If you'll be working with Icelandic, Frisian or Low German texts, be sure to run
just build-[icelandic|frisian|low-german]-dictionaryonce for each language. - Each collection should be a sub-folder of the main data folder identified by
STORIES_SRC_FOLDERin.env. The actual story files should be in individual .xml files, with the filename providing the story ID, in the following folder structure:[collection_folder]/records/isebel/. It is preferred to use underscores (_) rather than spaces in the file and folder names. Ideally, each collection should contain stories in a single language. The name of the collection folder should be supplied as the[COLLECTION_NAME]argument to the scripts below. - It can be faster to generate the story sentence embeddings outside of the Docker containers, by running
python api/create_collection_embeddings.py --collection-path [PATH_TO_COLLECTION_FOLDER]
but in this case you will need to install the script's dependencies in your local environment. - If the story embeddings have been generated as above, you should load the collection's story texts first by running
just add-collection-xml [COLLECTION_NAME] [ORGANIZATION] [COUNTRY] [SEARCH_LANGUAGE] [DISPLAY_LANGUAGE],
thenjust add-embeddings [COLLECTION_NAME]/[EMBEDDINGS_FILENAME]
For example,
~/isebelle$ just add-collection-xml Evald_Tang_Kristensen "UC Berkeley" Denmark Danish Dansk
then
~/isebelle$ just add-embeddings Evald_Tang_Kristensen/Evald_Tang_Kristensen.embeddings.gte-Qwen2-7B-instruct.jsonl
If you prefer to generate the story embeddings within the Docker containers while simultaneously importing them along with the story texts, run
just add-collection-and-calculate-embeddings [COLLECTION_NAME] [ORGANIZATION] [COUNTRY] [SEARCH_LANGUAGE] [DISPLAY_LANGUAGE]instead. - The search and browse interface for the collections and embeddings should be available at http://localhost:808/isebelle
The installation steps above (including the Docker setup) should be sufficient to run ISEBELLE on a remote server. Some reverse proxy configuration of the web server may be needed to make the site accessible via the web, however. The following configurations have been successfully used with Apache on the host server, including a TLS setup with automatic HTTP->HTTPS mapping. The Jupyter functionality is not fully tested yet.
# Route paths beginning with /isebelle or /jupyter to the Docker containers
ProxyPass /isebelle http://127.0.0.1:8080/isebelle
# Note this will run a live (albeit password-protected) Jupyter server!
ProxyPass /jupyter http://127.0.0.1:8080/jupyter
# Attempt also to route websocket requests to Docker (for Jupyter support)
RewriteEngine On
RewriteCond %{REQUEST_URI} ^/socket.io [NC]
RewriteCond %{QUERY_STRING} transport=websocket [NC]
RewriteRule /(.*) ws://localhost:8080/$1 [P,L]