Skip to content

OK-Data-Exchange/oscn

Repository files navigation

README

Rails API only backend for scraping and creating a Postgres database of OSCN data. Leverages the oscn_scraper gem to scrape data and return as json.

Getting started

TODO - Create sample .env file, Instructions for booting up

Configurations

You can configure the following ENV variables:

COUNTIES

Comma separated string of the county names.

Example

COUNTIES=Tulsa,Oklahoma

CASE_TYPES_ABBREVIATION

Comma separated string of Case type abbreviations:

Example

CASE_TYPES_ABBREVIATION=CF,CM,TR,TRI,AM,CPC,DTR # default

OSCN_THROTTLE

Number of requests to send to OSCN per minute

OSCN_THROTTLE=120

OSCN_CONCURRENCY=10 # default

Number of threads to run concurrently

OSCN_CONCURRENCY=120 # default

TODO ENVs

MAX_REQUESTS, MEDIUM_PRIORITY, LOW_PRIORITY, DAYS_AGO, DAYS_AHEAD

Scraping Methodology

High Priority Cases - Any case that has appear on the docket in the past 7 days will be scraped nightly

Medium Priority Cases - Any open case (closed_on = nil). Scrapes the oldest first.

Low Priority Cases - Closed cases that likely will not be updated as often.

Manual Scraping/Imports

DOC

  1. Find the date to use for the run by downloading the file to your local (see the quarterly_data.rb importer for location) and looking at the bottom of the sentence extract file for the maxiumum sentencing date. It will be in the format 20250402 and may run into the case number (e.g, 20250402CF-2021-4596). Use that year and month for the folder name.
  2. run rake "doc:scrape['2025-04']" (replace 2025-04 with the year and month from the last step for the folder name)
  3. if there are any failures in validation update the code to address them.
  4. If there are no failures run the import command. For best results run this in detached mode on a scaled heroku dyno, e.g., heroku run:detached -a oscn --size=performance-l rake "doc:import['2025-04']" (replacing 2025-04 again). Use the code provided to tail the logs for monitoring.
  5. Run rake "doc:link" to link the imported data to other counties.

Roster Tables

Entity resolution is accomplished via the Roster tables. These are all generated via Postgres materialized views then connected to rails models. Do not use any ids from these as they can change. The views are stacked for legibility To understand the order of creation and see views in use see scheduler.rb To see the current state of the views themselves the safest place is to inspect the database directly It's possible for their to be multiple parties and dlms for the same person, which we attempt to merge using https://dba.stackexchange.com/questions/157715/grouping-on-any-one-of-multiple-columns-in-postgres

Users

User Creation

See the section "Running rails console" then run:

emails = ["developer@9bcorp.com"] # update this
emails.each do |email|
  pass = SecureRandom.urlsafe_base64
  user = User.new({email: email, password: pass, password_confirmation: pass})
  user.otp_required_for_login = true
  user.otp_secret = User.generate_otp_secret # provide this to them
  puts "email: #{email}"
  puts "pass: #{pass}"
  puts "one time code: #{user.otp_secret}"
  user.save!
end

The one time code is their code to link up a multi-factor auth app.

Elastic beanstalk

This application has partial support for Elastic Beanstalk.

Running rails console on EB

To connect use eb ssh To run rails console login as sudo first sudo su - Cd to rails directory: cd /var/app/current then the normal: bundle exec rails c

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 5

Languages