Skip to content

Conversation

@williamge
Copy link
Contributor

@williamge williamge commented Mar 19, 2025

this PR is simply this branch: #31 but rebased off of master, kept as a separate PR for now to retain the original PR's code, and I can verify that there are no rebase artifacts or anything weird.

here is the original PR's description:

Support elastic search 8 while still using elastix.

Note that updating documents is supported via PUT /<target>/_doc/<_id> and so updates that use scripts are not supported. Note also that massdriver doesn't use ExlasticSearch.Repo.update/4.

what I did in this PR, from that:

  • rebased off of master
  • code changes? shouldn't be different from ES8 #31
  • test changes? yes changes, sort of:
    • the original PR was done before CI tests were introduced, for some reason the tests passed perfectly locally but not on CI
    • es7 and es8 are tested separately using the test matrix in the gh action, with the test suite run once under es7, then again under es8
      • the tests that involve models with mapping types will fail under es8 (except when run locally, why/how is that?) as mapping types are deprecated (or dropped entierly?) in es8
        • as a result, es7-only tests are split out in to their own module, and can be deleted if we ever stop supporting es7
        • es8 tests remain in repo_test.exs, should be testing the same kind of thing, but since they don't use mapping types, will pass under es7 and es8

@williamge williamge mentioned this pull request Mar 19, 2025
Closed
…es8 instance are included in our github action matrices, and tests are passing in es7 and es8 (except ones relying on mapping types, which aren't supported by es8)

cleanup (+15 squashed commits)
Squashed commits:
[3fb4418] start excluding mapping types in es8 tests again
[4dc2703] alright separating and cleaning
[34590cc] I want to check something cheeky
[2fec4a1] flip reverse it so only es7 runs mapping type tests
[ce87d3d] getting weird with it
[fc17334] ok next level
[3ce0564] first pass on a matrix
[7a88a3a] bringing back es8
[bb819f6] bringing the port back
[224e686] skipping es8 tests, not running its action
[0e46124] whatever I'll just skip those tests again
[c0afa77] do I get a different error at least
[5842e27] I call this throwing stuff at the wall and hoping for the best
[364c0e1] container name overlap
[eb6d57a] es8 in gh actions
@williamge williamge changed the title Es8 support (rebased off master) ES8 (and OpenSearch) support (rebased off master) Mar 25, 2025
@williamge williamge changed the title ES8 (and OpenSearch) support (rebased off master) ES8 (and OpenSearch) support Mar 25, 2025
@williamge williamge marked this pull request as ready for review March 25, 2025 18:04
def bulk_operation({op_type, struct}), do: bulk_operation_default({op_type, struct, :index})

defp bulk_operation_default({op_type, %{__struct__: model} = struct, index}) do
op = %{
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from here until the tests the code should be unchanged from #31

@@ -0,0 +1,325 @@
defmodule ExlasticSearch.Repo.MappingTypesTest do
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and here's where the code starts diverging a bit from #31

this module is just tests that used mapping types, which es7 supported but are deprecated. these tests seem to be the original tests for the library, with new ones brought on that use typeless models (and whose tests include "es8+" in the title). those new tests remain in repo_test.exs, and these tests that rely on mapping types and will fail in es8+ have been quarantined here, in a module that I spent very little time thinking of the naming or folder structure of.

this module should align very closely with the original tests, and repo_test.exs should have similar coverage (they probably are the same tests) but using typeless models.

setup_all do
Repo.delete_index(TestModel)
Repo.create_index(TestModel)
{:ok, %{status_code: 200}} = Repo.create_mapping(TestModel, :index, include_type_name: true)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the test setups weren't matching against anything for the return value of Repo.create_mapping(...), but that also covered up that they were silently failing on creating mappings for deprecated mapping types, so I put in matching against {:ok, ...} and that the status code is 200, so if these fail to create we'd at least know about it.

Comment on lines +155 to +157
# can we unskip? fails from mapping types
@tag :skip
test "It can perform top_hits aggregations, even when nested" do
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test (and the other test) were skipped before I got here, they seem to fail in CI so I think a previous commit (before this PR) skipped them. I tried unskipping them after fixing the Repo.create_mapping(...) calls but they still seemed to fail. I suppose we should just delete them if we don't expect them to ever pass, so I'm open on votes to do so.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like Milton added the skip. If he was planning to come back and fix the tests we could leave them. If not, I think deleting them makes sense.

test "It will index an element in es" do
model = %ExlasticSearch.TestModel{id: Ecto.UUID.generate()}
{:ok, _} = Repo.index(model)
@tag :elasticsearch8
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these tags aren't functionally needed anymore, I added them because I was originally excluding es8 tests on es7, but es8 tests will pass on es7 (it's es7 tests, i.e. ones relying on mapping types, that will fail in es8).

I'll go ahead and remove these :elasticsearch8 tests unless someone feels strongly otherwise.

- elixir: "1.18"
otp: "27"
lint: true
elasticsearch: "8.13.4"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose it would be nice to have testing against an opensearch docker container would be nice too, but I'm not sure of an elegant way to add that in here.

any suggestions on how to do that in a simple way, or on how to clean up my heavy-handed matrix approach, would be much appreciated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To start an OpenSearch container for testing, you can follow a pattern like this (adding Kafka container for testing): https://github.com/Frameio/massdriver/pull/18087/files#diff-95557d53bf91069e59d82b9d8fcfaf52ec84a762858983edd5ef3c9b3e4c8191R72

Comment on lines 193 to 199
match?(%{doc: _}, data) ->
model
|> struct(Map.put(data.doc, :id, id))
|> index(index)

true ->
{:error, "ExlasticSearch only supports doc updates for ES 8+"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a hack I put in the original PR based on the way I know massdriver is using Exlasticsearch. In general this is not correct.

An "update" operation will only update fields that are present in data.doc, but an "index" operation will replace the entire indexed document with whatever is in data.doc. If data.doc is an incomplete document, that means the fields that are not included will be deleted. This would be very surprising behavior for someone using this library.

Several options:

  1. Leave the original Exlasticsearch.Repo.update function in place and update massdriver so that we never call it, but instead always call Exlasticsearch.Repo.index.
  2. Fork Exlasticsearch
  3. Submit a PR to Elastix that allows a nil doc_type to be given to Document.update. This would be a fairly small PR. This is the function that would need to be updated to construct the correct path when there is a nil doc type: https://github.com/werbitzky/elastix/blob/5963511a5b5b5ccc57e2ff70e64021fd3a4b795f/lib/elastix/document.ex#L197

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any downside to option number 1? seems the most straightforward of them, although I see fabian has made some recent commits to elastix so maybe that's not hard to swing 🤔

Copy link
Contributor

@jekelly-adobe jekelly-adobe Mar 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok good news, massdriver already doesn't use Exlasticsearch.Repo.update anywhere. What it does use is bulk, so we just need to update the bulk_operation_update function in lib/exlasticsearch/bulk.ex to handle the case when doc type is nil.

I recommend reverting the changes to Exlasticsearch.Repo.update. We're not making any claims that Exlasticsearch is "fully compatible with ES8/OpenSearch". I think it'd be better to have near compatibility than "compatibility" with incorrect behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking in to bulk_operation_update, weirdly enough, the typelesstestmodel tests running on es8 pass, even with the new ones that use Repo.bulk. they're still passing _type, just with a value of nil. if I add a fake value like _dummy_value_not_used_in_api: "huh", then the bulk api returns an error. so it doesn't accept unknown fields, but does still accepts _type 🤔 .

still think I'm going to only include the _type field if __doc_type__() is not nil, for sanity's sake, but that's kinda weird isn't it.

@jekelly-adobe jekelly-adobe self-requested a review April 1, 2025 19:00
@williamge williamge merged commit 21f24f7 into master Apr 2, 2025
4 checks passed
@williamge williamge mentioned this pull request Apr 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants