The Catalog API is a Python Django project that provides a customizable REST API layer for Innovative Interfaces' Sierra ILS. This differs from the built-in Sierra API in a number of ways, not least of which is that the API design is fully under your control. In addition to a basic API implementation, a complete toolkit is provided that allows you to turn any of the data exposed via Sierra database views (and even data from other sources) into your own API resources.
-
All 300+ Sierra database views are modeled using the Django ORM.
-
Django Rest Framework provides the API implementation. Serializers and class-based views are easy to extend.
-
The API layer has a built-in browseable API view, and content negotiation is supported. Visit API URLs in a web browser and get nicely formatted, browseable HTML; request resources in JSON format and get JSON.
-
HAL, or Hypertext Application Language (hal+json), is the media type that is used to serve the built-in resources. "HAL is a simple format that gives a consistent and easy way to hyperlink between resources in your API." But you are not restricted to using HAL--you are free to implement the media types and formats that best fit your use cases.
-
The API supports a wide range of query filters, and more are planned: equals, greater than, less than, in, range, regular expressions, keyword searches, and more.
-
Your API data is completely decoupled from your Sierra data. An extensible exporter Django app allows you to define custom ETL processes. Solr instances that store and index data for the API are included. Or, you can set up your own data storage and tie your exporters and REST Framework views and serializers into it.
-
Although Sierra data is read-only, the API framework does allow you to implement POST, PUT, PATCH, and DELETE methods along with GET. So, you can create your own editable fields on API resources that don't get stored in Sierra; in fact, you could create resources that merge data from a variety of sources. Data that isn't sourced from Sierra can be merged when your export jobs run.
-
Accessing API resources only accesses the data in Solr and Redis; it doesn't hit the Sierra database at all. Thus, API performance is isolated from performance of your Sierra database, and API usage has no impact on your Sierra database. You don't have to worry about API users running up against the concurrent connection limit in Sierra.
-
Celery provides an asynchronous task queue and scheduler. Set up your exporters to run as often as you need so that your API stays in synch with Sierra as data in Sierra is added, updated, or deleted.
-
API resources can be grouped and completely compartmentalized into reusable Django apps. New apps can expose new resources and/or override the default base resources. (The shelflist app provides an example of this.)
- Python 2 >= 2.7.5, plus pip, virtualenv, and a number of required libraries. (See the next section for more detail.)
- Java >= 1.7.0_45.
- Redis >= 2.4.10.
- For development, if you are using the provided sqlite database as the Django DB, sqlite3 needs to be installed. Otherwise, be sure to install whatever additional prerequisites are needed for your database software, such as the mysql-development library and mysqlclient if you're using MySQL.
This project is currently in production at UNT, but the architecture as it is in the repository is likely not optimal for production deployment at other institutions. Cleaning this up to help simplify deployment is on our to-do list, but, for now, these instructions assume you'll be deploying to a development environment. Considerations for production deployment are included where applicable, but of course these will heavily depend on your environment and needs.
-
Set up Sierra User(s).
The catalog-api requires access to Sierra to export data. You must create a new Sierra user for each instance of the project that will be running (e.g., for each dev version, for staging, for production). Be sure that each user has the Sierra SQL Access application assigned in the Sierra admin interface.
You must also set the Sierra user's
search_pathin PostgreSQL so that the user can issue queries without specifying thesierra_viewprefix. Without doing this, the SQL generated by the Django models will not work.For each Sierra user you created, log into the database as that user and send the following query, replacing
userwith the name of that user:ALTER ROLE user SET search_path TO sierra_view; -
Install prerequisites.
-
Python 2 >= 2.7.5.
-
Latest version of pip. Python >=2.7.9 from python.org includes pip. Otherwise, go here for installation instructions.
Once pip is installed, be sure to update to the latest version:
pip install -U pip -
virtualenv. If you've already installed pip:
pip install virtualenvNote: Virtualenv also includes pip, so you could install virtualenv first, without using pip.
-
Requirements for psycopg2. In order for psycopg2 to build correctly, you'll need to have the appropriate dev packages installed.
On Ubuntu/Debian:
sudo apt-get install libpq-dev python-devOn Red Hat:
sudo yum install python-devel postgresql-develOn Mac, with homebrew:
brew install postgresql -
Java.
On Ubuntu/Debian:
sudo apt-get install openjdk-8-jreOn Red Hat:
sudo yum install java-1.8.0-openjdk -
Redis is required to serve as a message broker for Celery. It's also used to store some application data. For a dev environment, you can just follow the quickstart guide. Please make sure to take steps to secure your Redis instance, found in the quickstart guide under the heading Securing Redis!
Production Note: The quickstart section Installing Redis more properly contains useful information for deploying Redis in a production environment.
-
Your database of choice to serve as the Django database. In development, sqlite3 works fine. (This is the assumed database backend.) You can get precompiled binaries for your OS from the sqlite downloads page. Just make sure the command-line shell (sqlite3) goes in a directory that is on your PATH.
Production Note: In production, using sqlite is not recommended. Use PostgreSQL or MySQL instead.
-
-
Set up a virtual environment.
virtualenv
virtualenv is commonly used with Python, and especially Django, projects. It allows you to isolate the Python environment for projects on the same machine from each other (and, importantly, from the system Python). Using virtualenv is not strictly required, but it is strongly recommended.
(Optional) virtualenvwrapper
virtualenvwrapper is very useful if you need to manage several different virtual environments for different projects. At minimum, it makes creation, management, and activation of virtualenvs easier. The instructions below assume that you are not using virtualenvwrapper.
Without virtualenvwrapper
First generate the virtual environment you're going to use for the project. Create a directory where it will live (<DIR>), and then:
virtualenv <DIR>This creates a clean copy of whatever Python version you installed virtualenv on in that directory.
Next, activate the new virtual environment.
source <DIR>/bin/activateOnce it's activated, any time you run Python, it will use the Python that's in the virtual environment. This means any pip installations or other modules you install (e.g., via a setup.py file) while this virtualenv is active will be stored and used only in this virtual environment. You can create multiple virtualenvs on the same machine for different projects in order to keep their requirements nicely separated.
You can deactivate an active virtual environment with:
deactivateYou'll probably want to add the
source <DIR>/bin/activatestatement to your shell startup script, e.g. such as~/.bash_profileand/or~/.bashrc(if you're using bash), that will activate the appropriate environment on startup. -
Fork catalog-api on GitHub and clone to your local machine.
The catalog-api project is intended to be modified for use at your institution, so it's recommended that you fork the repository before creating a working copy.
- Go to GitHub, and log into your account.
- Go to the UNT Libraries' catalog-api repository.
- Click the Fork button to copy the repository to your local account.
Now create your working copy (replace [your-github-account] with your GitHub account name):
git clone https://github.com/[your-github-account]/catalog-api.gitOr, if you're authenticating via SSH:
git clone git@github.com:[your-github-account]/catalog-api.gitIf you're new to git and/or GitHub, see the GitHub help pages about how to fork, how to synch your fork with the original repository, managing branches, and how to submit pull requests (for when you want to contribute back).
Aside about Project Structure
There are two primary directories in the project root:
djangoandsolr.The
djangodirectory contains the Django project and related code, indjango\sierra. Themanage.pyscript for issuing Django commands is located here, along with the apps for the project:api: Contains the Django REST Framework implementation of the default API resources, which include apiusers, bibs, items, eresources, itemstatuses, itemtypes, locations, and marc.base: Contains the Django ORM models for Sierra.export: Contains code for exporters, including definitions of the exporters themselves, models related to export jobs, changes to the Django admin interface to allow you to manage and track export jobs, and tasks for running export jobs through Celery.sierra: Contains configuration and settings for the project.shelflist: Implements a shelflistitems resource in the API; contains overrides for export classes and api classes that implement the resource and add links to shelflistitems from item resources and location resources. This provides an example of how you could create Django apps with self- contained functionality for building new features onto existing API resources.
The
solrdirectory contains the included Solr instance and a fork of SolrMarc from Naomi Dushay of Stanford University Libraries, which is used for loading MARC data into Solr. (See the SolrMarc documentation for more information.) -
Install all python requirements.
pip install -r requirements/requirements-base.txt -
Set environment variables.
- To
PATH, add the path to your JRE/bindirectory and the path to your Redis/srcdirectory, where theredis-serverbinary lives. Example:/home/developer/jdk1.7.0_45/bin:/home/developer/redis-2.8.9/src JAVA_HOME-- Should contain the path to your JRE.
Optional environment variables can be set for development cases where multiple instances of the project will run on the same server concurrently.
SOLR_PORT(Optional) -- Defaults to 8983.- If you change this to something other than 8983, you'll have to make
sure to set a custom SOLRMARC
config.propertiesfile with asolr.hosturlvalue pointing to the correct port. (This is addressed in the next step.)
- If you change this to something other than 8983, you'll have to make
sure to set a custom SOLRMARC
DJANGO_PORT(Optional) -- Defaults to 8000.REDIS_PORT(Optional) -- Defaults to 6379.
If adding environment variables to your
.bash_profile, be sure to refresh it after you save changes:. ~/.bash_profileIf using virtualenvwrapper, environment variables can be set each time a virtual environment is activated. See the virtualenvwrapper documentation for more details.
- To
-
Set local configuration settings.
Django settings for the catalog-api project are in
<project_root>/django/sierra/sierra/settings. Global settings are included in the repository inbase.py,dev.py, andproduction.py. These attempt to pull local settings from asettings.jsonfile that is not in the repository--in fact, it's ignored entirely.There is a
settings_template.jsonfile in the repository. This contains all of the possible local settings that you may want to set. Some are required and some are optional. Some are needed only if you're deploying the project in a production environment. Note that many of these are things you want to keep secret.In order to use the catalog-api, you'll need to create your
settings.jsonfor each environment. Simply copysettings_template.jsontosettings.json(in thesettingsdirectory). Fill in the settings that are required and any others that are applicable to your environment, and then delete the rest. (Non-required settings get a default value specified in, e.g.,base.py.) When you're finished, be sure to delete all comments from the file.In
settings.json, you'll find the following:- Required Settings -- Your settings file won't load without these.
SECRET_KEY-- Leave this for now. You'll generate a new secure secret key in the next step.SETTINGS_MODULE-- The settings module that you want Django to use in the current environment, in Python path syntax (e.g., sierra.settings.ENVFILE). Unless you create new settings files that import frombase.py, this will either besierra.settings.devorsierra.settings.production. See the Django documentation for more information.SIERRA_DB_USER-- The username for your Sierra user that has SQL access that you set up in step 1.SIERRA_DB_PASSWORD-- Password for your Sierra user.SIERRA_DB_HOST-- The hostname for your Sierra database server.LOG_FILE_DIR-- The full path to the directory where you want Django log files stored. You must create this directory if it does not already exist; Django won't create it for you, and it will error out if it doesn't exist.MEDIA_ROOT-- Full path to the directory where downloads and user-uploaded files are stored. MARC files that are generated (e.g., to be loaded by SolrMarc) are stored here. LikeLOG_FILE_DIR, you must create this directory if it does not already exist.
- Optional Settings, Development or Production -- These are settings you
may need to set in a development or production environment, depending on
circumstances. Remove the key from the JSON file if you want to use the
default value.
ADMINS-- An array of arrays, where each nested array follows the pattern['name', 'email@email.com']. These are the people that will be emailed if there are errors. Default is an empty array.EXPORTER_EMAIL_ON_ERROR-- true or false. If true, the Admins will be emailed when an exporter program generates an error. Default isTrue.EXPORTER_EMAIL_ON_WARNING-- true or false. If true, the Admins will be emailed when an exporter program generates a warning. Default isTrue.TIME_ZONE-- String representing the server timezone. Default isAmerica/Chicago(central timezone).CORS_ORIGIN_REGEX_WHITELIST-- An array containing regular expressions that should match URLs for which you want to allow cross-domain JavaScript requests to the API. If you're going to have JavaScript apps on other servers making Ajax calls to your API, then you'll have to whitelist those domains here. Default is an empty array.SOLRMARC_CONFIG_FILE-- The name of the file that contains configuration settings for SolrMarc for a particular environment. This will match up with aconfig.propertiesfile in<project_root>/solr/solrmarc. (See "SolrMarc Configuration," below, for more information.) Default isdev_config.properties.
- Production Settings -- These are settings you'll probably only need to
set in production. If your development environment is very different than
the default setup, then you may need to set these there as well.
STATIC_ROOT-- Full path to the location where static files are put when you run thecollectstaticadmin command. Note that you generally won't need this in development: when theDEBUGsetting isTrue, then static files are discovered automatically. Otherwise, you need to make sure the static files are available via a web-accessible URL, which this helps you do. Default isNone.SITE_URL_ROOT-- The URL prefix for the site home. You'll need this if your server is set to serve this application in anything but the root of the website (like/catalog/). Default is/.MEDIA_URL-- The URL where user-uploaded files can be accessed. Default is/media/.STATIC_URL-- The URL where static files can be accessed. Default is/static/.SOLR_HAYSTACK_URL-- The URL pointing to your Solr instance where thehaystackcore can be accessed. Default ishttp://localhost:{SOLR_PORT}/solr/haystack.SOLR_BIBDATA_URL-- The URL pointing to your Solr instance where thebibdatacore can be accessed. Default ishttp://localhost:{SOLR_PORT}/solr/bibdata.SOLR_MARC_URL-- The URL pointing to your Solr instance where themarccore can be accessed. Default ishttp://localhost:{SOLR_PORT}/solr/marc.REDIS_CELERY_URL-- The URL (using theredisprotocol rather thanhttp) pointing the the Redis database you're using as your Celery messge broker. Default isredis://localhost:{REDIS_PORT}/0.REDIS_APPDATA_HOST-- The hostname for the Redis instance you're using to store application data. It's strongly recommended that you use a different port or database for app data than you use for your Celery message broker. Default islocalhost.REDIS_APPDATA_PORT-- The port for the Redis instance you're using to store app data. Default is yourREDIS_PORTvalue.REDIS_APPDATA_DATABASE-- The number of the Redis database you're using to store app data. Default is1.ADMIN_ACCESS-- true or false. Default isTrue, but you can set toFalseif you want to disable the Django Admin interface for a particular catalog-api instance.ALLOWED_HOSTS-- An array of hostnames that represent the domain names that this Django instance can serve. This is a security measure that is required to be set in production. Defaults to an empty array.EXPORTER_AUTOMATED_USERNAME-- The name of the Django user that should be tied to scheduled (automated) export jobs. Make sure that the Django user actually exists (if it doesn't, create it). It can be helpful to have a unique Django user tied to automated exports so that you can more easily differentiate between scheduled exports and manually-run exports in the admin export interface. Defaults todjango_admin.DEFAULT_DATABASE-- Specifies the setup for the default Django database. (Note that is different than your Sierra database.) The default setup is to create a sqlite database file calleddjango_sierrain the project directory. Set up the ENGINE, NAME, USER, PASSWORD, and HOST keys if you want to use a different database.
SolrMarc Configuration
SolrMarc is used to index bib records in Solr. The SolrMarc code is located in
<project_root>/solr/solrmarc/.Files that control SolrMarc configuration include the following.
*_config.properties-- Contains settings for SolrMarc. There are two settings here that are of immediate concern.solrmarc.hosturl-- Should contain the URL for the Solr index that SolrMarc loads into.solrmarc.indexing.properties-- Points to the*_index.propertiesfile used by your SolrMarc instance, described below.
*_index.properties-- Defines how MARC fields translate to fields in your Solr index. You'll change this file if/when you want to change how bib API resources are created.
If you've set a
SOLR_PORTother than the default (8983), then you must make a change to the SolrMarcconfig.properties. Create a copy of<project_root>/solr/solrmarc/dev_config.properties. In the copy, change the port of thesolr.hosturlvalue to match the correct port. Or, if you're using your own Solr instance, change the URL to point to that instead.In your
settings.jsonfile, set theSOLRMARC_CONFIG_FILEsetting to the filename of theconfig.propertiesfile you just created.Production Note: You'll likely want to keep the URL for your production Solr instance out of GitHub. The
production_config.propertiesfile is in.gitignorefor that reason. There is aproduction_config.properties.templatefile that you can copy over toproduction_config.propertiesand fill in thesolr.hosturlvalue. - Required Settings -- Your settings file won't load without these.
-
Generate a new secret key for Django.
cd <project_root>/django/sierra manage.py generate_secret_keyCopy/paste the new secret key into
SECRET_KEYinsettings.json. -
Run migrations and install fixtures.
cd <project_root>/django/sierra manage.py migrate manage.py loaddata export/fixtures/starting_metadata.jsonThis creates the default Django database and populates certain tables with needed data. If you haven't overridden the default database setup in your
settings.jsonfile, then it will create a sqlite database nameddjango_sierrain the<project_root>/django/sierradirectory. This filename is in.gitignore. If you set a different database name and store it anywhere within the project, be sure to add the filename to.gitignore. -
Create a superuser account for Django.
cd <project_root>/django/sierra manage.py createsuperuserRun through the interactive setup. Remember your username and password, as you'll use this to log into the Django admin screen for the first time. (You can create additional users from there.)
-
(Optional) Run Sierra database tests.
cd <project_root>/django/sierra manage.py test baseThis runs a series of tests over each of the ORM models for Sierra to ensure that the models match the structures in the database. You should run this when you first install the catalog-api to ensure the models match your Sierra setup--systems may differ from institution to institution based on what products you have.
Note: If any
*_maps_to_databasetests fail, it indicates that there are fields on the model that aren't present in the database. These are more serious (but often easier to fix) than*_sanity_checktests, which test to ensure that relationship fields work properly. In either case, you can check the models against the SierraDNA documentation and your own database to see where the problems lie and decide if they're worth trying to fix in the models. If tests fail on models that are central, likeRecordMetadataorBibRecord, then it's a problem. If tests fail on models that are more peripheral, likeLocationChange, then finding and fixing those problems may be less of a priority, especially if you never intend to use those models in your API.Presumably because the Sierra data that customers have access to is implemented in views instead of tables, proper data integrity is lacking in a few cases. (E.g., the views sometimes don't implement proper primary key / foreign key relationships.) This can cause
*_sanity_checktests to fail in practice on live data when they should work in theory. -
Start servers and processes: Solr, Redis, Django Development Web Server, and Celery.
In the root of the repository a few bash scripts are included that let you start and stop servers quickly and easily in the default development environment. But the first time you install the project, you may want to try starting the servers manually so you can test individually and make sure they work.
-
Solr
cd <project_root>/solr/instances java -jar start.jar -Djetty.port=$SOLR_PORT -Dlog4j.configuration=file:resources/console_log4j.properties(If you didn't set the
$SOLR_PORTenvironment variable, you can leave out-Djetty.port=$SOLR_PORT, and it will run on the default port, 8983.)This will start Solr, using a logging configuration file that outputs to the console. (The default logging configuration will output to a file in
<catalog_api_root>/solr/instances/logs.) You should see a bunch of INFO logs scroll by for a second or two.Try going to
http://localhost:SOLR_PORT/solr/in a Web browser. (ReplaceSOLR_PORTwith the value of theSOLR_PORTenvironment variable, and, if testing from an external machine, replacelocalhostwith your hostname.) You should see an Apache Solr admin screen.You can stop Solr with CTRL-C in the terminal where it's running in the foreground. For now, just leave it up.
-
Redis
Open a new terminal to test Redis.
cd <project_root> redis-server --port $REDIS_PORT(Leave the
--port $REDIS_PORTout if you didn't set the$REDIS_PORTenvironment variable. It will run on 6379 by default.)If you followed the Redis quick-install guide, then you've probably already tested this. But, we want Redis running when we test Celery, so leave it up for the moment.
-
Django Development Web Server
Open another terminal to test Django.
cd <project_root>/django/sierra manage.py runserver 0.0.0.0:$DJANGO_PORT(In this case, if you didn't set the
$DJANGO_PORTenvironment variable, replace$DJANGO_PORTwith8000.)If all goes well, you should see something like this:
System check identified no issues (0 silenced). February 10, 2016 - 11:40:40 Django version 1.7, using settings 'sierra.settings.my_dev' Starting development server at http://0.0.0.0:8000/ Quit the server with CONTROL-C.Try going to
http://localhost:DJANGO_PORT/api/v1/in a browser. (Replacelocalhostwith your hostname if accessing from an external computer, and replaceDJANGO_PORTwith yourDJANGO_PORTvalue.) You should see a DJANGO REST Framework page displaying the API Root.Production Note: The Django Development Web Server is intended to be used only in development environments. Never ever use it in production! Configuring Django to work with a real web server like Apache is a necessary step when moving into production. See the Django documentation for more details.
-
Celery
Open up another terminal to test Celery.
cd <project_root>/django/sierra celery -A sierra worker -l info -c 4You'll get some INFO logs, as well as a UserWarning about not using the DEBUG setting in a production environment. Since this is development, it's nothing to worry about. You should get a final log entry with
celery@hostname ready. -
Celery Beat
Celery Beat is the task scheduler that's built into Celery. It's what lets you schedule your export jobs to run at certain times. You generally won't have Celery Beat running in a development environment, so for now we just want to test to make sure it will start up.
With Celery still running, open another terminal.
cd <project_root>/django/sierra celery -A sierra beat -S djcelery.schedulers.DatabaseSchedulerYou should see a brief summary of your Celery configuration, and then a couple of INFO log entries showing that Celery Beat has started.
Production Note: See the Celery documentation for how to set up periodic tasks. In our production environment, we use the DatabaseScheduler and store periodic-task definitions in the Django database. These are then editable in the Django Admin interface.
Once you've confirmed each of the above processes runs, then you can stop them. (Ctrl-C in each of the running terminals.) From now on, you can just start/stop them from the provided shell scripts.
Convenience Scripts
-
start_servers.sh-- Starts Solr, Redis, and Django on the ports you've specified in your environment variables as background processes. Optionally, you can issue an argument,djangoorsolrorredis, to run one of those as a foreground process (and direct output for that process to stdout). Often, in development,start_servers.sh djangocan be useful so that you get Django web server logs output to the console. Solr output and Redis output are often not as immediately useful. Solr output will still be logged to a file in<project_root>/solr/instances/logs, and Redis output will be logged based on how you've configured your Redis instance. -
stop_servers.sh-- Stops Solr, Redis, and Django (if they're currently running). -
start_celery.sh-- Starts Celery as a foreground process. Often, in development, you'll want Celery logged to the console so you can keep an eye on output. (Use CTRL-C to stop Celery.)
Production Notes: Daemonizing Processes for a Production Environment
In a production environment, you'll want to have all of these servers and processes daemonized using init.d scripts.
-
Redis ships with usable init scripts. See the quickstart guide for more info.
-
For both Celery and Celery Beat, there are example init scripts available, although you'll have to edit some variables. See the Celery documentation for details.
-
For Solr, you'll need to create your init.d file yourself, but there are a number of tutorials available on the Web. The Solr instance provided with this project uses a straightforward multi-core setup, so the init.d file should be straightforward.
-
-
Test record exports.
If everything up to this point has worked, then let's try triggering a few record exports and then making sure data shows up in the API.
-
Start up your servers and Celery.
-
Go to
http://localhost:DJANGO_PORT/admin/export/in a web browser (using the appropriate hostname and port). -
Log in using the superuser username and password you set up in step 10.
-
Under the heading Manage Export Jobs, click Trigger New Export.
-
First thing we want to do is export administrative metadata (like Location codes, ITYPEs, and Item Statuses).
- Run this Export: "Load ALL III administrative metadata-type data into Solr."
- Filter Data By: "None (Full Export)"
- Click Go.
- You'll see some activity in the Celery log, and the export should be done within a second or two. Refresh your browser and you should see a Status of Successful.
-
Next, try exporting one or a few bib records and any attached items.
- Run this Export: "Load bibs and attached records into Solr."
- Filter Data By: "Record Range (by record number)."
- Enter a small range of bib record IDs in the From and to fields. Be sure to omit the dot and check digit. E.g., from b4371440 to b4371450.
- Click Go.
- You'll see activity in the Celery log, and the export should complete within a couple of seconds. Refresh your browser and you should see a status of Successful.
-
Finally, try viewing the data you exported in the API.
- Go to
http://localhost:DJANGO_PORT/api/v1/in your browser. - Click the URL for the
bibsresource, and make sure you see data for the bib records you loaded. - Navigate the various related resources in
_links, such asmarcanditems. Anything that's linked should take you to the data for that resource.
- Go to
-
See LICENSE.txt.