Rails API only backend for scraping and creating a Postgres database of OSCN data. Leverages the oscn_scraper gem to scrape data and return as json.
TODO - Create sample .env file, Instructions for booting up
You can configure the following ENV variables:
Comma separated string of the county names.
Example
COUNTIES=Tulsa,Oklahoma
Comma separated string of Case type abbreviations:
Example
CASE_TYPES_ABBREVIATION=CF,CM,TR,TRI,AM,CPC,DTR # default
Number of requests to send to OSCN per minute
OSCN_THROTTLE=120
Number of threads to run concurrently
OSCN_CONCURRENCY=120 # default
MAX_REQUESTS, MEDIUM_PRIORITY, LOW_PRIORITY, DAYS_AGO, DAYS_AHEAD
High Priority Cases - Any case that has appear on the docket in the past 7 days will be scraped nightly
Medium Priority Cases - Any open case (closed_on = nil). Scrapes the oldest first.
Low Priority Cases - Closed cases that likely will not be updated as often.
- Find the date to use for the run by downloading the file to your local (see the quarterly_data.rb importer for location) and looking at the bottom of the sentence extract file for the maxiumum sentencing date. It will be in the format 20250402 and may run into the case number (e.g, 20250402CF-2021-4596). Use that year and month for the folder name.
- run
rake "doc:scrape['2025-04']"(replace 2025-04 with the year and month from the last step for the folder name) - if there are any failures in validation update the code to address them.
- If there are no failures run the import command.
For best results run this in detached mode on a scaled heroku dyno, e.g.,
heroku run:detached -a oscn --size=performance-l rake "doc:import['2025-04']"(replacing 2025-04 again). Use the code provided to tail the logs for monitoring. - Run
rake "doc:link"to link the imported data to other counties.
Entity resolution is accomplished via the Roster tables. These are all generated via Postgres materialized views then connected to rails models. Do not use any ids from these as they can change. The views are stacked for legibility To understand the order of creation and see views in use see scheduler.rb To see the current state of the views themselves the safest place is to inspect the database directly It's possible for their to be multiple parties and dlms for the same person, which we attempt to merge using https://dba.stackexchange.com/questions/157715/grouping-on-any-one-of-multiple-columns-in-postgres
See the section "Running rails console" then run:
emails = ["developer@9bcorp.com"] # update this
emails.each do |email|
pass = SecureRandom.urlsafe_base64
user = User.new({email: email, password: pass, password_confirmation: pass})
user.otp_required_for_login = true
user.otp_secret = User.generate_otp_secret # provide this to them
puts "email: #{email}"
puts "pass: #{pass}"
puts "one time code: #{user.otp_secret}"
user.save!
endThe one time code is their code to link up a multi-factor auth app.
This application has partial support for Elastic Beanstalk.
To connect use eb ssh
To run rails console login as sudo first sudo su -
Cd to rails directory: cd /var/app/current
then the normal: bundle exec rails c