This is an application that takes WARC files in a given directory and indexes them in Solr. A ready-to-use Solr Docker configuration can be found in solr/
The application is built using gradle. To run the app use the command gradle runApp which should pull the depedencies, compile the code and run it.
A jar can be created by running gradle fatJar and the jar file can be found in build/libs/.
The application can be built and run using the Dockerfile. The watch directory containing the WARC files should be mounted so that the application can access thee files. The application is configured by using environment variables, these are LOCKSS_SOLR_WATCHDIR, LOCKSS_SOLR_URL and LOCKSS_SOLR_BATCH_SIZE. Default values are provided by the application but these can be overridden when necessary. An example Docker command to start the application is given below.
docker run -it --rm -e LOCKSS_SOLR_WATCHDIR=/samples -e LOCKSS_SOLR_URL=http://192.168.56.103:8983/solr/test-core -v /home/rwincewicz/workspace/lockss/lockss-solr/samples:/samples:ro lockss/indexerAlternativily you can use the image from the hub, the following command will start a container with solr and create a tets-core:
docker run --name solr -d -p 8983:8983 solr solr-create -c test-coreThen run the application in Docker with as such:
docker run -it --rm --link solr:solr -e LOCKSS_SOLR_WATCHDIR=/samples -e LOCKSS_SOLR_URL=http://solr:8983/solr/test-core -v $WORKSPACE/lockss-solr/samples:/samples:ro lockss/indexerIt's also posible to use Docker Compose to build and start both containers. You'll need to install Docker Compose, and run the following command:
docker-compose up --buildIf you want to use a different WARCs folder than the default (i.e. ./samples), the can be defined in .env as LOCKSS_SOLR_WATCHDIR
LOCKSS_SOLR_WATCHDIR=/var/data/warc
The application will not automatically pick-up existing WARC files, but a simple touch should trigger the indexing:
touch /var/data/warc/*You should now be able able to query the server at http://localhost:8983/solr/#/test
A Vagrantfile has been added to run the app on a VM.
If you are using a different WARCs folder than ./samples, you'll have to make sure it's shared by updating the Vagrantfile.
config.vm.synced_folder "/var/data/warc", "/var/data/warc"
You need to install Vagrant and run the following command:
vagrant upThis should start a VM running CentOS 7 with Docker and Docker Compose and other software. Please read the original Vagrant box page for details: Docker-enabled Vagrant boxes.
The SolR server running on the VM can be access from the host at http://localhost:58983/solr/ The VM is also running cAdvisor which can be access at http://localhost:58080/containers/