-
Notifications
You must be signed in to change notification settings - Fork 26
Description
When a user enters a search, they have an intent in mind about what they want to find. This intent is typed in their own words and may not match the text in the search index. An area of search meant to assist with interpretation is query understanding. A technique in query understanding is called query rewriting. Before the index is searched, the query is examined to provide the search with more context and then the query is rewritten with this new context. This RFC suggests ways to integrate a specific library used for query rewriting and also attempts to define proposals for more generic interfaces for query rewriting in search pipelines so that builders can bring their own rewriting logic while still taking advantage of the benefits of search pipelines - logical separation,
Creating rules to refine queries in search applications is a standard practice. Users enter free text search queries with the intent to find something specific. For example, a search query on a site selling home goods could be “gas grill weber.” Through query rewriting, the engine could interpret “weber” as the brand Weber and rewrite the query to boost “gas grill” matches where the Brand field in the index is “Weber.” My assumption and my experience tells me that many search application builders do this with work with custom code or don’t know that they could do this type of rewrites at all. Querqy was developed as a plugin for ElasticSearch and Solr to help centralize and reduce complexity of rewriting. Later it was ported to OpenSearch. The plugin currently lives in the querqy Github repo and does not get upgraded with each release because this is difficult to do unless the plugin is in the opensearch-project org and has access to all of the CI infrastructure as other plugins.
Querqy comes with these rewriters that may be usable implemented as a SearchRequestProcessor:
(copied & pasted from https://docs.querqy.org/querqy/rewriters/common-rules.html)
Common Rules Rewriter
Query-dependent rules for synonyms, result boosting (up/down), filters; ‘decorate’ result with additional information.
Replace Rewriter
Replace query terms. Used as a query normalisation step, usually applied before the query is processed further, for example, before the Common Rules Rewriter is applied
Word Break Rewriter
(De)compounds query tokens. Splits compound words or creates compounds from separate tokens.
Number-Unit Rewriter
Recognises numerical values and units of measurement in the query and matches them with indexed fields. Allows for range matches and boosting of the exactly matching value.
Shingle Rewriter
Creates shingles (compounds) from adjacent query tokens and adds them as synonyms.
I propose that OpenSearch's Search Pipelines feature (https://opensearch.org/docs/latest/search-plugins/search-pipelines/index/) in combination with Querqy's library based implementation, Querqy Unplugged: https://github.com/querqy/querqy-unplugged be used to integrate multiple query rewriting components as processors. So, this could also reveal a clearer way to bring backend functionality into OpenSearch without having to move repositories into the project itself:
- refactor an existing plugin so we can separate plugin (OpenSearch hooks) concerns from the functionality of the plugin (Querqy itself). The plugin can also be a SearchProcessor and depend on several libraries.
- leave the functionality in the originating repo to be maintained there. this means we can
- build a plugin/search processor in the opensearch-project org that uses the functionality as a library. In the spirit of Search Pipelines, we could build a single processor for each type of rewriting operation for use.
Benefits
- Querqy Unplugged is a library. We can create searchrequestprocessors for one or more of the Querqy rewriters on the OpenSearch side to whatever unplugged version we like; we can pin to a specific version of Querqy or upgrade as we see fit for the OpenSearch project.
- Keep it Open: We can begin to incorporate abstracted SearchRequestProcessors that will allow non-Querqy rewriters to be incorporated; these can be good candidates long term for inclusion in Querqy or stand-alone rewriters.
Drawbacks
- Managing the same dependency over different components could be a challenge.
Other possibilities
- Move the plugin as is and do not take a dependency on querqy unplugged.
- Move the plugin as is and do not integrate with search
Questions:
- Are there other libraries besides Querqy with similar functionality that could be open-sourced or adapted to Search Pipelines? This could strengthen the use case for using Search Pipelines and could also create a path for builders on OpenSearch with custom rewriters that don’t want to maintain this functionality. They could then just move to Querqy processors.
- What options for query rewriter integration do we have?
- Plugin
- SearchRequestProcessor (note that Plugins and SearchRequestProcessors are not mutually exclusive)
- Other?
- Would you be more inclined to use rewriters as part of a search pipeline or as a standalone plugin?
Metadata
Metadata
Assignees
Labels
Type
Projects
Status