diff --git a/.gitignore b/.gitignore index 76fb9c4..a3bd71e 100644 --- a/.gitignore +++ b/.gitignore @@ -48,3 +48,5 @@ bower.json .byebug_history .DS_Store + +/config/master.key diff --git a/.ruby-gemset b/.ruby-gemset index eedd89b..fcf5595 100644 --- a/.ruby-gemset +++ b/.ruby-gemset @@ -1 +1 @@ -api +api-v2 diff --git a/.ruby-version b/.ruby-version index 7bde84d..434c481 100644 --- a/.ruby-version +++ b/.ruby-version @@ -1 +1 @@ -ruby-3.1.2 +ruby-3.1.7 diff --git a/CHANGELOG.md b/CHANGELOG.md index 25426e2..682ba26 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -27,84 +27,120 @@ Markdown Spec](https://github.github.com/gfm/). ### Contributors --> -## [2.0.0] - new nested bucket aggregation/query functionality for Habeas release -[Unreleased]: https://github.com/CDRH/api/compare/v1.0.4...dev + +## [v2.0.0] - Nested bucket aggregation/query functionality +[v2.0.0]: https://github.com/CDRH/api/compare/v1.0.4...v2.0.0 ### Added -- "api_version" added to all response "res" objects -- support for elasticsearch 8.5 -- user/password basic authentication with ES 8.5, when querying the index or posting from Datura -- better support for nested fields -- support for nested bucket aggregations, matching a nested value on another nested value. `person.name[person.role#judge]` will return all names of persons where role="judge". -- "api_version" added to all response "res" objects + +- `api_version` added to all response `res` objects +- Support for Elasticsearch 8.5+ +- User/password basic authentication support when credentials present +- Better support for nested fields +- Support for nested bucket aggregations, matching a nested value on another + nested value. For example, `person.name[person.role#judge]` will return + all names of persons where `role="judge"` +- Updated documentation for new features +- `track total hits` option added to ES queries, to return counts of search + results higher than 10000 + +### Changed + +- Gemset changed to `api-v2` +- Changes reflect new api schemas in Datura, which make heavy use of nested fields - Added support for aggregating buckets by normalized keyword and returning - the "top_hits" first document result for a non-normalized display -- Changes response format of `facets` key - + the `top_hits` first document result for a non-normalized display. Internal logic has been changed because of nested fields, this may cause subtle differences in how facet labels are displayed +- Changes response format of `facets` key. Not only is the response format + itself different, but there may be fewer facets returned since matching + normalized values are combined + From: - ``` + + ```json "facets": { "WILLA CATHER": 10, "Willa Cather": 50 } ``` + To: - ``` + + ```json "facets": { "willa cather": { "num" : 60, source: "Willa Cather" } } ``` - Not only is the response format itself different, but there may be fewer - facets returned since normalized values which match are combined -### Changed -- upgraded to Rails 6.1.7 and Ruby 3 -- changes reflect new api schemas in Datura, which make heavy use of nested fields ### Migration -- in Datura repos config `private.yml` api to `"api_version": "2.0"` to take advantage of new bucket aggregation functionality (or `"api_version": "1.0"` for legacy repos that have not been updated for the new schema). Please note that a running API index can only use one ES index at a time, and each ES index is restricted to one version of the schema. -- Use Elasticsearch 8.5 or later -- If you are using ES with security enabled, you must configure credentials with Rails in the API repo. See https://guides.rubyonrails.org/v6.1/security.html. Configure the VSCode editor. Run `EDITOR="code --wait" rails credentials:edit` and add + +- Add nested facets as described above, if desired +- Orchid apps that connect to the API should use `facet_limit` instead of `facet_num` in options +- In the config files of your Datura repos, (`private.yml` or `public.yml`, set + the api to `"api_version": "2.0"` to take advantage of new bucket aggregation + functionality (or `"api_version": "1.0"` for legacy repos that have not been + updated for the new schema). Please note that a running API index can only use + one ES index at a time, and each ES index is restricted to one version of the + schema. See [new schema (2.0) + documentation](https://github.com/CDRH/datura/docs/schema_v2.md). +- Connect to Elasticsearch 8.5 or later +- If you are using ES with security enabled, you must configure credentials + with Rails in the API repo. See + https://guides.rubyonrails.org/v6.1/security.html. To configure with VSCode + editor run `EDITOR="code --wait" rails credentials:edit` and add to the + secrets file and then close the window to save. + Do not commit `config/master.key` (it should be in `.gitignore`) + ``` elasticsearch: user: username password: ***** ``` -to the secrets file and then close the window to save. Do not commit `config/master.key` (it should be in `gitignore`) -- Orchid apps that connect to the API should use `facet_limit` instead of `facet_num` in options. -- Add nested facets as described above, if desired -- Orchid apps should update the local version of `config/initializers/config.rb` to match latest version of Orchid (see `lib/generators/templates/config.rb` and in particular the `get_facets` method). This is required if using nested bucket aggregation functionality. -## [v1.0.4](https://github.com/CDRH/api/compare/v1.0....v1.0.4) - Updates & license +## [v1.0.5] - API v1 on Ruby 3.1.6, Rails 6.1.7 +[v1.0.5]: https://github.com/CDRH/api/compare/v1.0.4...v1.0.5 ### Changed +- Ruby 3.1.6 +- Rails 6.1.7 + +## [v1.0.4](https://github.com/CDRH/api/compare/v1.0.3...v1.0.4) - Updates & license + +### Changed + - Updated Ruby version, gems (which addresses mimemagic dependency problem), and -license added + license added ### Added + - Documentation on facets and highlighting ## [v1.0.3](https://github.com/CDRH/api/compare/v1.0.2...v1.0.3) - gem updates ### Changed + - updates to rails and other gems ## [v1.0.2](https://github.com/CDRH/api/compare/v1.0.1...v1.0.2) - escapes and sorting ### Fixed + - question mark and asterisk behavior in queries - order of expected, actual in tests - sort behavior for relevancy ### Added + - support for multivalued and nested field sorting - documentation moved back into apium from henbit location in order to version it with software ### Changed + - ruby, rails, and other gem versions ## [v1.0.1](https://github.com/CDRH/api/compare/v1.00...v1.0.1) - version 1.0.1 ### Changed + - ruby, rails, and other gem versions - version moved to initializer @@ -113,4 +149,3 @@ license added ### Contributors - Jessica Dussault (jduss4) - diff --git a/Gemfile b/Gemfile index 6e5dcda..1f15323 100644 --- a/Gemfile +++ b/Gemfile @@ -1,44 +1,34 @@ source 'https://rubygems.org' - -git_source(:github) do |repo_name| - repo_name = "#{repo_name}/#{repo_name}" unless repo_name.include?("/") - "https://github.com/#{repo_name}.git" -end - +git_source(:github) { |repo| "https://github.com/#{repo}.git" } # Bundle edge Rails instead: gem 'rails', github: 'rails/rails' gem 'rails', '~> 6.1.7' # Use sqlite3 as the database for Active Record gem 'sqlite3', '~> 1.4' # Use Puma as the app server -gem 'puma', '~> 5.0' -# Transpile app-like JavaScript. Read more: https://github.com/rails/webpacker -# gem 'webpacker', '~> 5.0' -# Turbolinks makes navigating your web application faster. Read more: https://github.com/turbolinks/turbolinks -# gem 'turbolinks', '~> 5' +gem 'puma', '>= 5.0' # Build JSON APIs with ease. Read more: https://github.com/rails/jbuilder -gem 'jbuilder', '~> 2.7' +# gem 'jbuilder', '~> 2.7' # Use Redis adapter to run Action Cable in production # gem 'redis', '~> 4.0' -# Use Active Model has_secure_password +# Use ActiveModel has_secure_password # gem 'bcrypt', '~> 3.1.7' -# Use Active Storage variant +# Use ActiveStorage variant # gem 'image_processing', '~> 1.2' +# Reduces boot times through caching; required in config/boot.rb gem 'bootsnap', '>= 1.4.4', require: false +# Use Rack CORS for handling Cross-Origin Resource Sharing (CORS), making cross-origin AJAX possible +# gem 'rack-cors' + group :development, :test do # Call 'byebug' anywhere in the code to stop execution and get a debugger console gem 'byebug', platforms: [:mri, :mingw, :x64_mingw] end group :development do - # Access an interactive console on exception pages or by calling 'console' anywhere in the code. - gem 'web-console', '>= 4.1.0' - # Display performance information such as SQL time and flame graphs for each request in your browser. - # Can be configured to work on production as well see: https://github.com/MiniProfiler/rack-mini-profiler/blob/master/README.md - gem 'rack-mini-profiler', '~> 2.0' gem 'listen', '~> 3.3' # Spring speeds up development by keeping your application running in the background. Read more: https://github.com/rails/spring gem 'spring' @@ -47,6 +37,11 @@ end # Windows does not include zoneinfo files, so bundle the tzinfo-data gem gem 'tzinfo-data', platforms: [:mingw, :mswin, :x64_mingw, :jruby] +# Additions to Rails defaults + +# Note: Above list different from other Rails apps +# because this app was created with --api option + # using rest-client because I've had far more luck than the # stlib net/http -gem 'rest-client', '>= 2.1.0.rc1', '< 2.2' +gem 'rest-client', '~> 2.1' diff --git a/Gemfile.lock b/Gemfile.lock index 8d0b7f3..db17c93 100644 --- a/Gemfile.lock +++ b/Gemfile.lock @@ -60,70 +60,63 @@ GEM minitest (>= 5.1) tzinfo (~> 2.0) zeitwerk (~> 2.3) - bindex (0.8.1) - bootsnap (1.18.4) + base64 (0.3.0) + bootsnap (1.19.0) msgpack (~> 1.2) builder (3.3.0) - byebug (11.1.3) - concurrent-ruby (1.3.4) + byebug (12.0.0) + concurrent-ruby (1.3.6) crass (1.0.6) - date (3.4.1) + date (3.5.1) domain_name (0.6.20240107) erubi (1.13.1) - ffi (1.17.1-arm64-darwin) - ffi (1.17.1-x86_64-darwin) - globalid (1.2.1) + ffi (1.17.2-x86_64-linux-gnu) + globalid (1.3.0) activesupport (>= 6.1) http-accept (1.7.0) - http-cookie (1.0.8) + http-cookie (1.1.0) domain_name (~> 0.5) - i18n (1.14.6) + i18n (1.14.7) concurrent-ruby (~> 1.0) - jbuilder (2.13.0) - actionview (>= 5.0.0) - activesupport (>= 5.0.0) listen (3.9.0) rb-fsevent (~> 0.10, >= 0.10.3) rb-inotify (~> 0.9, >= 0.9.10) - logger (1.6.4) - loofah (2.24.0) + logger (1.7.0) + loofah (2.25.0) crass (~> 1.0.2) nokogiri (>= 1.12.0) - mail (2.8.1) + mail (2.9.0) + logger mini_mime (>= 0.1.1) net-imap net-pop net-smtp - marcel (1.0.4) + marcel (1.1.0) method_source (1.1.0) - mime-types (3.6.0) + mime-types (3.7.0) logger - mime-types-data (~> 3.2015) - mime-types-data (3.2024.1203) + mime-types-data (~> 3.2025, >= 3.2025.0507) + mime-types-data (3.2025.0924) mini_mime (1.1.5) - minitest (5.25.4) - msgpack (1.7.5) - net-imap (0.5.4) + minitest (5.27.0) + msgpack (1.8.0) + net-imap (0.5.13) date net-protocol net-pop (0.1.2) net-protocol net-protocol (0.2.2) timeout - net-smtp (0.5.0) + net-smtp (0.5.1) net-protocol netrc (0.11.0) - nio4r (2.7.4) - nokogiri (1.18.1-arm64-darwin) + nio4r (2.7.5) + nokogiri (1.18.10-x86_64-linux-gnu) racc (~> 1.4) - nokogiri (1.18.1-x86_64-darwin) - racc (~> 1.4) - puma (5.6.9) + puma (7.1.0) nio4r (~> 2.0) racc (1.8.1) - rack (2.2.10) - rack-mini-profiler (2.3.4) - rack (>= 1.2.0) + rack (2.2.21) rack-test (2.2.0) rack (>= 1.3) rails (6.1.7.10) @@ -141,7 +134,7 @@ GEM bundler (>= 1.15.0) railties (= 6.1.7.10) sprockets-rails (>= 2.0.0) - rails-dom-testing (2.2.0) + rails-dom-testing (2.3.0) activesupport (>= 5.0.0) minitest nokogiri (>= 1.6) @@ -154,7 +147,7 @@ GEM method_source rake (>= 12.2) thor (~> 1.0) - rake (13.2.1) + rake (13.3.1) rb-fsevent (0.11.2) rb-inotify (0.11.1) ffi (~> 1.0) @@ -163,47 +156,39 @@ GEM http-cookie (>= 1.0.2, < 2.0) mime-types (>= 1.16, < 4.0) netrc (~> 0.8) - spring (4.2.1) - sprockets (4.2.1) + spring (4.4.0) + sprockets (4.2.2) concurrent-ruby (~> 1.0) + logger rack (>= 2.2.4, < 4) sprockets-rails (3.5.2) actionpack (>= 6.1) activesupport (>= 6.1) sprockets (>= 3.0.0) - sqlite3 (1.7.3-arm64-darwin) - sqlite3 (1.7.3-x86_64-darwin) - thor (1.3.2) - timeout (0.4.3) + sqlite3 (1.7.3-x86_64-linux) + thor (1.4.0) + timeout (0.6.0) tzinfo (2.0.6) concurrent-ruby (~> 1.0) - web-console (4.2.1) - actionview (>= 6.0.0) - activemodel (>= 6.0.0) - bindex (>= 0.4.0) - railties (>= 6.0.0) - websocket-driver (0.7.6) + websocket-driver (0.8.0) + base64 websocket-extensions (>= 0.1.0) websocket-extensions (0.1.5) zeitwerk (2.6.18) PLATFORMS - arm64-darwin-23 - x86_64-darwin-21 + x86_64-linux DEPENDENCIES bootsnap (>= 1.4.4) byebug - jbuilder (~> 2.7) listen (~> 3.3) - puma (~> 5.0) - rack-mini-profiler (~> 2.0) + puma (>= 5.0) rails (~> 6.1.7) - rest-client (>= 2.1.0.rc1, < 2.2) + rest-client (~> 2.1) spring sqlite3 (~> 1.4) tzinfo-data - web-console (>= 4.1.0) BUNDLED WITH - 2.3.7 + 2.3.27 diff --git a/app/controllers/application_controller.rb b/app/controllers/application_controller.rb index 15ff0d8..4ac8823 100644 --- a/app/controllers/application_controller.rb +++ b/app/controllers/application_controller.rb @@ -1,35 +1,2 @@ -require 'rest-client' - class ApplicationController < ActionController::API - - def post_search(json, error_method=method(:display_error)) - res = RestClient.post("#{ES_URI}/_search", json.to_json, { "content-type" => "json" }) - raise - return JSON.parse(res.body) - rescue => e - error_method.call(e, json) - return nil - end - - # I am so pleased that this works - # as a default error handler - def display_error(error, req_body) - render(status: 500, json: JSON.pretty_generate({ - "res" => { - "code" => 500, - "api_version" => Api::Application::VERSION, - "message" => "TODO", - "info" => { - "documentation" => "TODO", - "error" => error.inspect, - "suggestion" => "TODO" - } - }, - "req" => { - "query_string" => request.fullpath, - "query_obj" => req_body - } - })) and return - end - end diff --git a/app/services/search_item_req.rb b/app/services/search_item_req.rb index e9260c0..b271c3e 100644 --- a/app/services/search_item_req.rb +++ b/app/services/search_item_req.rb @@ -17,6 +17,7 @@ def build_request start = @params["start"].blank? ? SETTINGS["start"] : @params["start"] req = { + "track_total_hits": true, "aggs" => {}, "from" => start, "highlight" => {}, @@ -51,6 +52,8 @@ def build_request # add bool to request body req["query"]["bool"] = bool + # uncomment below line to log ES query for debugging + # puts req.to_json() return req end @@ -72,19 +75,17 @@ def facets dir = "desc" if @params["facet_sort"].present? sort_type, sort_dir = @params["facet_sort"].split(@@filter_separator) - type = "_term" if sort_type == "term" + type = "term" if sort_type == "term" dir = sort_dir if sort_dir == "asc" end # FACET_SETTINGS["start"] - size = SETTINGS["num"] - size = @params["facet_num"].blank? ? SETTINGS["num"] : @params["facet_num"] + size = @params["facet_limit"].blank? ? SETTINGS["num"] : @params["facet_limit"] aggs = {} Array.wrap(@params["facet"]).each do |f| # histograms use a different ordering terminology than normal aggs - f_type = type == "_term" ? "_key" : "_count" - + f_type = (type == "term") ? "_key" : "_count" if f.include?("date") || f[/_d$/] # NOTE: if nested fields will ever have dates we will # need to refactor this to be available to both @@ -98,33 +99,104 @@ def facets aggs[f] = { "date_histogram" => { "field" => field, - "interval" => interval, + "calendar_interval" => interval, "format" => formatted, "min_doc_count" => 1, "order" => { f_type => dir }, } } - # if nested, has extra syntax + # nested facet, matching on another nested facet + + elsif f.include?("[") + # will be an array including the original, and an alternate aggregation name + + + options = JSON.parse(f) + original = options[0] + agg_name = options[1] + + facet = original.split("[")[0] + condition = original[/(?<=\[).+?(?=\])/] + subject = condition.split("#").first + predicate = condition.split("#").last + if f_type == "_count" + # make sure sort is on the actual count of documents + f_type = "field_to_item" + end + aggregation = { + # common to nested and non-nested + "filter" => { + "term" => { + subject => predicate + } + }, + "aggs" => { + agg_name => { + "terms" => { + "field" => facet, + "order" => {f_type => dir}, + "size" => size + }, + "aggs" => { + "field_to_item" => { + "reverse_nested" => {}, + "aggs" => { + "top_matches" => { + "top_hits" => { + "_source" => { + "includes" => [ facet ] + }, + "size" => 1 + } + } + } + } + } + } + } + } + # interpolate above hash into nested query + if facet.include?(".") + aggs[agg_name] = { + "nested" => { + "path" => facet.split(".").first + }, + "aggs" => { + agg_name => aggregation + } + } + else + # otherwise it is the whole query + aggs[agg_name] = aggregation + end elsif f.include?(".") - path = f.split(".").first + if f_type == "_count" + # make sure sort is on the acutal count of documents + f_type = "field_to_item" + end aggs[f] = { "nested" => { - "path" => path + "path" => f.split(".").first }, "aggs" => { f => { "terms" => { "field" => f, - "order" => { type => dir }, + "order" => {f_type => dir}, "size" => size }, "aggs" => { - "top_matches" => { - "top_hits" => { - "_source" => { - "includes" => [ f ] - }, - "size" => 1 + "field_to_item" => { + "reverse_nested" => {}, + "aggs" => { + "top_matches" => { + "top_hits" => { + "_source" => { + "includes" => [ f ] + }, + "size" => 1 + } + } } } } @@ -135,7 +207,7 @@ def facets aggs[f] = { "terms" => { "field" => f, - "order" => { type => dir }, + "order" => { f_type => dir }, "size" => size }, "aggs" => { @@ -161,13 +233,43 @@ def filters # (type 2 will only be used for dates) filters = fields.map {|f| f.split(@@filter_separator, 3) } filters.each do |filter| - # NESTED FIELD FILTER - if filter[0].include?(".") - path = filter[0].split(".").first + # filter aggregation with nesting + if filter[0].include?("[") + original = filter[0] + facet = original.split("[")[0] + condition = original[/(?<=\[).+?(?=\])/] + subject = condition.split("#").first + predicate = condition.split("#").last + term_match = { + # "person.name" => "oliver wendell holmes" + # Remove CR's added by hidden input field values with returns + facet => filter[1].gsub(/\r/, "") + } + term_filter = { + subject => predicate + } + if facet.include?(".") + query = { + "nested" => { + "path" => facet.split(".").first, + "query" => { + "bool" => { + "must" => [ + { "match" => term_filter }, + { "match" => term_match } + ] + } + } + } + } + end + filter_list << query + # ordinary nested facet + elsif filter[0].include?(".") # this is a nested field and must be treated differently nested = { "nested" => { - "path" => path, + "path" => filter[0].split(".").first, "query" => { "term" => { # Remove CR's added by hidden input field values with returns diff --git a/app/services/search_item_res.rb b/app/services/search_item_res.rb index a82a199..104bd9a 100644 --- a/app/services/search_item_res.rb +++ b/app/services/search_item_res.rb @@ -18,7 +18,6 @@ def build_response # strip out only the fields for the item response items = combine_highlights facets = reformat_facets - { "code" => 200, "count" => count, @@ -42,12 +41,25 @@ def combine_highlights def find_source_from_top_hits(top_hits, field, key) # elasticsearch stores nested source results without the "path" - nested_child = field.split(".").last - hit = top_hits.first.dig("_source", nested_child) + + parent = field.split(".").first + if field.include?(".") + nested_child = field.split(".").last + end + hit = top_hits.first.dig("_source", parent) # if this is a multivalued field (for example: works or places), # ALL of the values come back as the source, but we only want # the single value from which the key was derived - if hit.class == Array + if hit.class == Hash + hit = [hit] + end + if !hit + key + elsif hit.class == Array + if nested_child + #TODO solve bug where this returns a hash value instead of an array + hit = hit.map { |i| i[nested_child] }.compact + end # I don't love this, because we will have to match exactly the logic # that got us the key to get this to work match_index = hit @@ -55,31 +67,42 @@ def find_source_from_top_hits(top_hits, field, key) .index(remove_nonword_chars(key)) # if nothing matches the original key, return the entire source hit # should return a string, regardless - return match_index ? hit[match_index] : hit.join(" ") + if match_index + #matching item may be an array + if hit[match_index].class == Array + hit[match_index][0] + else + #just return the match + hit[match_index] + end + else + # if there is an array of values but no match, just return the key + key + end else # it must be single-valued and therefore we are good to go - return hit + hit end end def format_bucket_value(facets, field, bucket) # dates return in wonktastic ways, so grab key_as_string instead of gibberish number # but otherwise just grab the key if key_as_string unavailable - key = bucket.key?("key_as_string") ? bucket["key_as_string"] : bucket["key"] - val = bucket["doc_count"] + key = bucket.key?("key_as_string") ? bucket["key_as_string"].titleize : bucket["key"].titleize + val = bucket.key?("field_to_item") ? bucket["field_to_item"]["doc_count"] : bucket["doc_count"] source = key # top_matches is a top_hits aggregation which returns a list of terms # which were used for the facet. # Example: "Willa Cather" and "WILLA CATHER" # Those terms will both have been normalized as "willa cather" but # we will want to display one of the non-normalized terms instead - top_hits = bucket.dig("top_matches", "hits", "hits") + top_hits = bucket.key?("field_to_item") ? bucket.dig("field_to_item", "top_matches", "hits", "hits") : bucket.dig("top_matches", "hits", "hits") if top_hits source = find_source_from_top_hits(top_hits, field, key) end facets[field][key] = { "num" => val, - "source" => source + "source" => source.to_s } end @@ -89,8 +112,7 @@ def reformat_facets facets = {} raw_facets.each do |field, info| facets[field] = {} - # nested fields do not have buckets at this level of response structure - buckets = info.key?("buckets") ? info["buckets"] : info.dig(field, "buckets") + buckets = get_buckets(info, field) if buckets buckets.each { |b| format_bucket_value(facets, field, b) } else @@ -104,10 +126,30 @@ def reformat_facets end def remove_nonword_chars(term) - # transliterate to ascii (ø -> o) - transliterated = I18n.transliterate(term) - # remove html tags like em, u, and strong, then strip remaining non-alpha characters - transliterated.gsub(/<\/?(?:em|strong|u)>|\W/, "").downcase + + if term.class == Array + #ensure that term is a string value, not an array + term = term[0] + end + if term.class == String + # it should not be a hash, but this is a failsafe + # transliterate to ascii (ø -> o) + transliterated = I18n.transliterate(term) + # remove html tags like em, u, and strong, then strip remaining non-alpha characters + transliterated.gsub(/<\/?(?:em|strong|u)>|\W/, "").downcase + end end + def get_buckets(info, field) + # ordinary facet + if info.key?("buckets") + info["buckets"] + # nested facet + elsif info.dig(field, "buckets") + info.dig(field, "buckets") + # filtered facet + else + info.dig(field, field, "buckets") + end + end end diff --git a/app/services/search_service.rb b/app/services/search_service.rb index dbd8877..78fc536 100644 --- a/app/services/search_service.rb +++ b/app/services/search_service.rb @@ -11,7 +11,29 @@ def initialize(url, params={}, user_req) end def post(url_ending, json) - res = RestClient.post("#{@url}/#{url_ending}", json.to_json, { "content-type" => "json" } ) + # Add Basic Authentication header if credentials present + if Rails.application.credentials.elasticsearch.present? && + Rails.application.credentials.elasticsearch[:user].present? && + Rails.application.credentials.elasticsearch[:password].present? + auth_hash = { + "Authorization" => "Basic " + + Base64::encode64( + Rails.application.credentials.elasticsearch[:user] + + ":" + Rails.application.credentials.elasticsearch[:password] + ) + } + res = RestClient.post( + @url + "/" + url_ending, + json.to_json, + auth_hash.merge({ "content-type" => "json" }) + ) + else + res = RestClient.post( + @url + "/" + url_ending, + json.to_json, + {"content-type" => "json"} + ) + end JSON.parse(res.body) rescue => e e @@ -57,7 +79,8 @@ def search_item(id) raw_res = post("_search", req) if raw_res.class == RuntimeError on_error(raw_res, req) - elsif raw_res.class == RestClient::BadRequest + elsif raw_res.class == RestClient::ExceptionWithResponse || + raw_res.class == RestClient::BadRequest on_error(JSON.parse(raw_res.response), req) else res = build_item_response(raw_res) @@ -70,7 +93,8 @@ def search_items raw_res = post("_search", req) if raw_res.class == RuntimeError on_error(raw_res.inspect, req) - elsif raw_res.class == RestClient::BadRequest + elsif raw_res.class == RestClient::ExceptionWithResponse || + raw_res.class == RestClient::BadRequest on_error(JSON.parse(raw_res.response), req) else res = build_item_response(raw_res) diff --git a/config/application.rb b/config/application.rb index 03f5085..db0b615 100644 --- a/config/application.rb +++ b/config/application.rb @@ -22,7 +22,7 @@ module Api class Application < Rails::Application # Initialize configuration defaults for originally generated Rails version. - config.load_defaults 5.0 + config.load_defaults 5.2 # Configuration for the application, engines, and railties goes here. # diff --git a/config/boot.rb b/config/boot.rb index 3cda23b..676662a 100644 --- a/config/boot.rb +++ b/config/boot.rb @@ -2,3 +2,4 @@ require "bundler/setup" # Set up gems listed in the Gemfile. require "bootsnap/setup" # Speed up boot time by caching expensive operations. +require "logger" diff --git a/config/config.example.yml b/config/config.example.yml index a06b13c..bd03bdc 100644 --- a/config/config.example.yml +++ b/config/config.example.yml @@ -1,11 +1,11 @@ default: &default metadata: # api metadata / description - api_updated: "TODO 2017" + api_updated: "TODO" contact: "cdrhdev@unl.edu" description: "API to access all public Center for Digital Research in the Humanities resources" - documentation: "https://cdrhapi.unl.edu/docs" - license: "TODO" + documentation: "https://github.com/CDRH/api/tree/main/docs" + license: "MIT License" terms_of_service: "TODO" settings: diff --git a/config/environments/development.rb b/config/environments/development.rb index fc3ea89..1b2b7ec 100644 --- a/config/environments/development.rb +++ b/config/environments/development.rb @@ -61,7 +61,7 @@ # routes, locales, etc. This feature depends on the listen gem. config.file_watcher = ActiveSupport::EventedFileUpdateChecker - # LOCAL + # LOCAL CHANGES # Custom dev env logger to empty log more frequently config.logger = ActiveSupport::TaggedLogging.new( ActiveSupport::Logger.new(File.join(Rails.root.to_s, "log", "development.log"), @@ -69,8 +69,5 @@ 1, 32 * 1024 * 1024 ) ) - - # CDRH CONFIGURATION - - config.hosts << "cdrhdev1.unl.edu" + config.hosts << ENV.fetch("RAILS_DEV_HOST") { "localhost" } end diff --git a/config/environments/production.rb b/config/environments/production.rb index 50bdaf0..e98aaed 100644 --- a/config/environments/production.rb +++ b/config/environments/production.rb @@ -111,11 +111,21 @@ # config.active_record.database_resolver = ActiveRecord::Middleware::DatabaseSelector::Resolver # config.active_record.database_resolver_context = ActiveRecord::Middleware::DatabaseSelector::Resolver::Session - # CDRH CONFIGURATION - - # Force all access to the app over SSL, use Strict-Transport-Security, and use secure cookies. - config.force_ssl = true - # Handle STS here instead of Apache, or Rails duplicates header contents - # Also unset cache-control header in HTTPS vhost for same reason - config.ssl_options = { hsts: { preload: true } } + # LOCAL CHANGES + # Secure HTTPS config + # Can be toggled off with env var for local production env testing + config.force_ssl = ENV['RAILS_PROD_NOSSL'].blank? + # HSTS here instead of Apache, otherwise Rails duplicates header contents + # Also unset cache-control header in Apache HTTPS vhost for same reason + config.ssl_options = { + hsts: { expires: 31536000, preload: true, subdomains: true } + } + # In case we ever need to disable HSTS on a public site, switch to this + # config.ssl_options = { hsts: false } + + # Reduce log bloat + config.log_level = :warn + + # Allow access via non-public domain + config.hosts << ENV.fetch("RAILS_PROD_HOST") { "localhost" } end diff --git a/config/initializers/new_framework_defaults.rb b/config/initializers/new_framework_defaults.rb deleted file mode 100644 index e943ba9..0000000 --- a/config/initializers/new_framework_defaults.rb +++ /dev/null @@ -1,15 +0,0 @@ -# Be sure to restart your server when you modify this file. -# -# This file contains migration options to ease your Rails 5.0 upgrade. -# -# Read the Guide for Upgrading Ruby on Rails for more info on each option. - -# Make Ruby 2.4 preserve the timezone of the receiver when calling `to_time`. -# Previous versions had false. -ActiveSupport.to_time_preserves_timezone = true - -# Require `belongs_to` associations by default. Previous versions had false. -Rails.application.config.active_record.belongs_to_required_by_default = true - -# Configure SSL options to enable HSTS with subdomains. Previous versions had false. -#Rails.application.config.ssl_options = { hsts: { subdomains: true } } diff --git a/config/initializers/new_framework_defaults_5_2.rb b/config/initializers/new_framework_defaults_5_2.rb deleted file mode 100644 index 421e5a2..0000000 --- a/config/initializers/new_framework_defaults_5_2.rb +++ /dev/null @@ -1,35 +0,0 @@ -# Be sure to restart your server when you modify this file. -# -# This file contains migration options to ease your Rails 5.2 upgrade. -# -# Once upgraded flip defaults one by one to migrate to the new default. -# -# Read the Guide for Upgrading Ruby on Rails for more info on each option. - -# Make Active Record use stable #cache_key alongside new #cache_version method. -# This is needed for recyclable cache keys. -# Rails.application.config.active_record.cache_versioning = true - -# Use AES-256-GCM authenticated encryption for encrypted cookies. -# Also, embed cookie expiry in signed or encrypted cookies for increased security. -# -# This option is not backwards compatible with earlier Rails versions. -# It's best enabled when your entire app is migrated and stable on 5.2. -# -# Existing cookies will be converted on read then written with the new scheme. -# Rails.application.config.action_dispatch.use_authenticated_cookie_encryption = true - -# Use AES-256-GCM authenticated encryption as default cipher for encrypting messages -# instead of AES-256-CBC, when use_authenticated_message_encryption is set to true. -# Rails.application.config.active_support.use_authenticated_message_encryption = true - -# Add default protection from forgery to ActionController::Base instead of in -# ApplicationController. -# Rails.application.config.action_controller.default_protect_from_forgery = true - -# Store boolean values are in sqlite3 databases as 1 and 0 instead of 't' and -# 'f' after migrating old data. -# Rails.application.config.active_record.sqlite3.represent_boolean_as_integer = true - -# Use SHA-1 instead of MD5 to generate non-sensitive digests, such as the ETag header. -# Rails.application.config.active_support.use_sha1_digests = true diff --git a/config/initializers/new_framework_defaults_6_1.rb b/config/initializers/new_framework_defaults_6_1.rb index 9526b83..89165aa 100644 --- a/config/initializers/new_framework_defaults_6_1.rb +++ b/config/initializers/new_framework_defaults_6_1.rb @@ -7,61 +7,61 @@ # Read the Guide for Upgrading Ruby on Rails for more info on each option. # Support for inversing belongs_to -> has_many Active Record associations. -# Rails.application.config.active_record.has_many_inversing = true +Rails.application.config.active_record.has_many_inversing = true # Track Active Storage variants in the database. -# Rails.application.config.active_storage.track_variants = true +Rails.application.config.active_storage.track_variants = true # Apply random variation to the delay when retrying failed jobs. -# Rails.application.config.active_job.retry_jitter = 0.15 +Rails.application.config.active_job.retry_jitter = 0.15 # Stop executing `after_enqueue`/`after_perform` callbacks if # `before_enqueue`/`before_perform` respectively halts with `throw :abort`. -# Rails.application.config.active_job.skip_after_callbacks_if_terminated = true +Rails.application.config.active_job.skip_after_callbacks_if_terminated = true # Specify cookies SameSite protection level: either :none, :lax, or :strict. # # This change is not backwards compatible with earlier Rails versions. # It's best enabled when your entire app is migrated and stable on 6.1. -# Rails.application.config.action_dispatch.cookies_same_site_protection = :lax +Rails.application.config.action_dispatch.cookies_same_site_protection = :lax # Generate CSRF tokens that are encoded in URL-safe Base64. # # This change is not backwards compatible with earlier Rails versions. # It's best enabled when your entire app is migrated and stable on 6.1. -# Rails.application.config.action_controller.urlsafe_csrf_tokens = true +Rails.application.config.action_controller.urlsafe_csrf_tokens = true # Specify whether `ActiveSupport::TimeZone.utc_to_local` returns a time with an # UTC offset or a UTC time. -# ActiveSupport.utc_to_local_returns_utc_offset_times = true +ActiveSupport.utc_to_local_returns_utc_offset_times = true # Change the default HTTP status code to `308` when redirecting non-GET/HEAD # requests to HTTPS in `ActionDispatch::SSL` middleware. -# Rails.application.config.action_dispatch.ssl_default_redirect_status = 308 +Rails.application.config.action_dispatch.ssl_default_redirect_status = 308 # Use new connection handling API. For most applications this won't have any # effect. For applications using multiple databases, this new API provides # support for granular connection swapping. -# Rails.application.config.active_record.legacy_connection_handling = false +Rails.application.config.active_record.legacy_connection_handling = false # Make `form_with` generate non-remote forms by default. -# Rails.application.config.action_view.form_with_generates_remote_forms = false +Rails.application.config.action_view.form_with_generates_remote_forms = false # Set the default queue name for the analysis job to the queue adapter default. -# Rails.application.config.active_storage.queues.analysis = nil +Rails.application.config.active_storage.queues.analysis = nil # Set the default queue name for the purge job to the queue adapter default. -# Rails.application.config.active_storage.queues.purge = nil +Rails.application.config.active_storage.queues.purge = nil # Set the default queue name for the incineration job to the queue adapter default. -# Rails.application.config.action_mailbox.queues.incineration = nil +Rails.application.config.action_mailbox.queues.incineration = nil # Set the default queue name for the routing job to the queue adapter default. -# Rails.application.config.action_mailbox.queues.routing = nil +Rails.application.config.action_mailbox.queues.routing = nil # Set the default queue name for the mail deliver job to the queue adapter default. -# Rails.application.config.action_mailer.deliver_later_queue_name = nil +Rails.application.config.action_mailer.deliver_later_queue_name = nil # Generate a `Link` header that gives a hint to modern browsers about # preloading assets when using `javascript_include_tag` and `stylesheet_link_tag`. -# Rails.application.config.action_view.preload_links_header = true +Rails.application.config.action_view.preload_links_header = true diff --git a/config/initializers/version.rb b/config/initializers/version.rb index b17ba8e..3144cc8 100644 --- a/config/initializers/version.rb +++ b/config/initializers/version.rb @@ -1,5 +1,5 @@ module Api class Application < Rails::Application - VERSION = "1.0.4" + VERSION = "2.0.0" end end diff --git a/docs/README.md b/docs/README.md index 422f40e..4ec92a2 100644 --- a/docs/README.md +++ b/docs/README.md @@ -50,6 +50,12 @@ __Nested fields__ facet[]=creator.name facet[]=creator.name&facet[]=creator.role ``` +you can also match on another nested field with the new API schema +`facet[]=nested_field.keyword_field1[nested_field.keyword_field2#value]` +``` +facet[]=person.name[person.role#judge] +``` +the above will select all names of persons, where the role of that person is "judge". __Date ranges__ (currently supports days or years) diff --git a/test/services/search_item_req_test.rb b/test/services/search_item_req_test.rb index 29bf323..a5b3eab 100644 --- a/test/services/search_item_req_test.rb +++ b/test/services/search_item_req_test.rb @@ -39,18 +39,18 @@ def test_facets # normal with pagination overrides, multiple facets facets = SearchItemReq.new({ - "facet_num" => 10, + "facet_limit" => 10, "facet_sort" => "term|asc", "facet" => [ "title", "subcategory" ] }).facets assert_equal( - {"title"=>{"terms"=>{"field"=>"title", "order"=>{"_term"=>"asc"}, "size"=>10}, "aggs"=>{"top_matches"=>{"top_hits"=>{"_source"=>{"includes"=>["title"]}, "size"=>1}}}}, "subcategory"=>{"terms"=>{"field"=>"subcategory", "order"=>{"_term"=>"asc"}, "size"=>10}, "aggs"=>{"top_matches"=>{"top_hits"=>{"_source"=>{"includes"=>["subcategory"]}, "size"=>1}}}}}, + {"title"=>{"terms"=>{"field"=>"title", "order"=>"asc", "size"=>10}, "aggs"=>{"top_matches"=>{"top_hits"=>{"_source"=>{"includes"=>["title"]}, "size"=>1}}}}, "subcategory"=>{"terms"=>{"field"=>"subcategory", "order"=>"asc", "size"=>10}, "aggs"=>{"top_matches"=>{"top_hits"=>{"_source"=>{"includes"=>["subcategory"]}, "size"=>1}}}}}, facets ) # should be blank if there are no facets provided facets = SearchItemReq.new({ - "facet_num" => 1, + "facet_limit" => 1, "facet_sort" => "nonterm|asc", "facet" => [] }).facets @@ -69,7 +69,7 @@ def test_facets "facet" => [ "creator.name" ] }).facets assert_equal( - {"creator.name"=>{"nested"=>{"path"=>"creator"}, "aggs"=>{"creator.name"=>{"terms"=>{"field"=>"creator.name", "order"=>{"_term"=>"desc"}, "size"=>20}, "aggs"=>{"top_matches"=>{"top_hits"=>{"_source"=>{"includes"=>["creator.name"]}, "size"=>1}}}}}}}, + {"creator.name"=>{"nested"=>{"path"=>"creator"}, "aggs"=>{"creator.name"=>{"terms"=>{"field"=>"creator.name", "order"=>"desc", "size"=>20}, "aggs"=>{"top_matches"=>{"top_hits"=>{"_source"=>{"includes"=>["creator.name"]}, "size"=>1}}}}}}}, facets ) @@ -83,14 +83,14 @@ def test_facets # sort term order specified facets = SearchItemReq.new({ "facet" => ["title", "format"], "facet_sort" => "term|desc" }).facets assert_equal( - {"title"=>{"terms"=>{"field"=>"title", "order"=>{"_term"=>"desc"}, "size"=>20}, "aggs"=>{"top_matches"=>{"top_hits"=>{"_source"=>{"includes"=>["title"]}, "size"=>1}}}}, "format"=>{"terms"=>{"field"=>"format", "order"=>{"_term"=>"desc"}, "size"=>20}, "aggs"=>{"top_matches"=>{"top_hits"=>{"_source"=>{"includes"=>["format"]}, "size"=>1}}}}}, + {"title"=>{"terms"=>{"field"=>"title", "order"=>"desc", "size"=>20}, "aggs"=>{"top_matches"=>{"top_hits"=>{"_source"=>{"includes"=>["title"]}, "size"=>1}}}}, "format"=>{"terms"=>{"field"=>"format", "order"=>"desc", "size"=>20}, "aggs"=>{"top_matches"=>{"top_hits"=>{"_source"=>{"includes"=>["format"]}, "size"=>1}}}}}, facets ) # sort term no order specified facets = SearchItemReq.new({ "facet" => ["title", "format"], "facet_sort" => "term" }).facets assert_equal( - {"title"=>{"terms"=>{"field"=>"title", "order"=>{"_term"=>"desc"}, "size"=>20}, "aggs"=>{"top_matches"=>{"top_hits"=>{"_source"=>{"includes"=>["title"]}, "size"=>1}}}}, "format"=>{"terms"=>{"field"=>"format", "order"=>{"_term"=>"desc"}, "size"=>20}, "aggs"=>{"top_matches"=>{"top_hits"=>{"_source"=>{"includes"=>["format"]}, "size"=>1}}}}}, + {"title"=>{"terms"=>{"field"=>"title", "order"=>"desc", "size"=>20}, "aggs"=>{"top_matches"=>{"top_hits"=>{"_source"=>{"includes"=>["title"]}, "size"=>1}}}}, "format"=>{"terms"=>{"field"=>"format", "order"=>"desc", "size"=>20}, "aggs"=>{"top_matches"=>{"top_hits"=>{"_source"=>{"includes"=>["format"]}, "size"=>1}}}}}, facets )