Skip to content

Latest commit

 

History

History
331 lines (241 loc) · 14.1 KB

File metadata and controls

331 lines (241 loc) · 14.1 KB

protosearch reference

This document describes the complete protosearch API.

API

protosearch exposes two extensions.

Extension Message Description
protosearch.field protosearch.Field Manage field configuration
protosearch.index protosearch.Index Manage index configuration

field

protosearch.Field is a message with the following fields:

Field Type Description
name string Rename a field in the mapping.
mapping protosearch.FieldMapping Define mapping field parameters.
target repeated protosearch.Target Configure a literal mapping for a specific target.

The protoc-gen-protosearch plugin compiles these message options to a JSON file containing the document mapping.

The simplest way to annotate a field is:

string uid = 1 [(protosearch.field) = {}];

This will generate a basic field mapping with no parameters except for type. See type inference below.

If you do not annotate a protobuf field with (protosearch.field) options, it will be excluded from the mapping.

name

The name field lets you rename a protobuf field in the compiled mapping.

string uid = 1 [(protosearch.field).name = "user_uid"];
{
  "properties": {
    "user_uid": {
      "type": "keyword"
    }
  }
}

mapping

In most cases, you will need to use mapping to define field parameters. FieldMapping supports the most common mapping parameters with one important difference:

  • It does not support properties, because the plugin supports defining object and nested fields as protobuf message fields.

Certain fields, namely dynamic, index_options, and term_vector, are enums. All provide a default UNSPECIFIED value. The plugin will not output an enum parameter if it has the default UNSPECIFIED value.

If you need to generate a parameter that is not in this list, see target below.

Field Type Description
type string The field type. If omitted, the plugin infers the type from the protobuf field type.
analyzer string Analyzer used at index time. Applies to text fields.
boost double Boost a field's score at index time.
coerce bool Whether to coerce values to the declared mapping type. Applies to numeric and date fields.
copy_to repeated string Copy this field's value to the named field.
doc_values bool Whether to store doc values for sorting and aggregation.
dynamic protosearch.Dynamic How to handle unknown subfields. Applies to object fields.
eager_global_ordinals bool Whether to load global ordinals at refresh time.
enabled bool Whether to parse and index the field.
fielddata bool Whether to use in-memory fielddata for sorting and aggregations. Applies to text fields.
fields map<string, FieldMapping> A multi-field mapping.
format string The date format. Applies to date and date_nanos fields.
ignore_above int32 Do not index strings longer than this length. Applies to keyword fields.
ignore_malformed bool Ignore invalid values instead of rejecting the document.
index_options protosearch.IndexOptions Which information to store in the index. Applies to text fields.
index_phrases bool Whether to index bigrams separately. Applies to text fields.
index_prefixes protosearch.IndexPrefixes Index term prefixes to speed up prefix queries. Applies to text fields.
index bool Whether to index the field.
meta map<string, string> Metadata about the field.
normalizer string Normalize keyword fields with this normalizer.
norms bool Whether to store field length norms for scoring.
null_value google.protobuf.Value Replace explicit null values with this value at index time.
position_increment_gap int32 A gap inserted between elements in an array to prevent spurious matches. Applies to text fields.
search_analyzer string Analyzer used at search time.
similarity string The scoring algorithm.
store bool Whether to store this field separately from _source.
subobjects bool Whether dotted field names are interpreted as nested subobjects.
term_vector protosearch.TermVector Whether to store term vectors.
dynamic

protosearch.Dynamic is an enum with the following values:

  • DYNAMIC_TRUE
  • DYNAMIC_FALSE
  • DYNAMIC_STRICT
  • DYNAMIC_RUNTIME
index_options

protosearch.IndexOptions is an enum with the following values:

  • INDEX_OPTIONS_DOCS
  • INDEX_OPTIONS_FREQS
  • INDEX_OPTIONS_POSITIONS
  • INDEX_OPTIONS_OFFSETS
index_prefixes

protosearch.IndexPrefixes is a message with the following fields:

Field Type Description
min_chars int32 Minimum prefix length to index.
max_chars int32 Maximum prefix length to index.
term_vector

protosearch.TermVector is an enum with the following values:

  • TERM_VECTOR_NO
  • TERM_VECTOR_YES
  • TERM_VECTOR_WITH_POSITIONS
  • TERM_VECTOR_WITH_OFFSETS
  • TERM_VECTOR_WITH_POSITIONS_OFFSETS
  • TERM_VECTOR_WITH_POSITIONS_PAYLOADS
  • TERM_VECTOR_WITH_POSITIONS_OFFSETS_PAYLOADS

target

The target field gives you complete control over how a protobuf field compiles to a mapping property.

It is a message with the following fields:

Field Type Description
label string A human-readable label used to target that particular mapping with --protosearch_opt=target=<label>.
json string A literal JSON string containing the mapping.

Use this to define more complex mapping types, or specify parameters that are not supported in FieldMapping. You can also use this to define mappings for different clusters or vendors. You can specify this field more than once.

For example, you might want to represent a Point object as a geo_point in Elasticsearch and an xy_point in OpenSearch. You can create targets for both mappings:

Point origin = 1 [(protosearch.field) = {
  target: {
    label: "elasticsearch"
    json: '{"type": "point"}'
  }
  target: {
    label: "opensearch"
    json: '{"type": "xy_point"}'
  }
}];

With --protosearch_opt=target=elasticsearch:

{
  "properties": {
    "origin": {
      "type": "point"
    }
  }
}

With --protosearch_opt=target=opensearch:

{
  "properties": {
    "origin": {
      "type": "xy_point"
    }
  }
}

If target does not match an existing label, the plugin falls back on the common mapping parameters.

index

protosearch.Index is a message with the following fields:

Field Type Description
mapping protosearch.IndexMapping Define index mapping parameters.

mapping

protosearch.IndexMapping is a message with the following fields:

Field Type Description
date_detection bool Whether to detect date strings as date fields.
dynamic protosearch.Dynamic How to handle unknown fields.
dynamic_date_formats repeated string Date formats to use for dynamic date detection.
_field_names protosearch.IndexFieldNames Controls the _field_names metadata field.
_meta map<string, string> Metadata about the index mapping.
numeric_detection bool Whether to detect numeric strings as numeric fields.
_routing protosearch.IndexRouting Controls the _routing metadata field.
_source protosearch.IndexSource Controls the _source metadata field.

dynamic uses the same protosearch.Dynamic enum as field.mapping.dynamic.

_field_names

protosearch.IndexFieldNames is a message with the following fields:

Field Type Description
enabled bool Whether to enable the _field_names metadata field.
_routing

protosearch.IndexRouting is a message with the following fields:

Field Type Description
required bool Whether to require routing for all document operations.
_source

protosearch.IndexSource is a message with the following fields:

Field Type Description
compress bool Whether to compress stored source data. OpenSearch only.
compress_threshold string Minimum source size to trigger compression. OpenSearch only.
enabled bool Whether to store the _source field.
excludes repeated string Fields to exclude from the stored _source.
includes repeated string Fields to include in the stored _source.
mode protosearch.SourceMode How to store the _source field.
mode

protosearch.SourceMode is an enum with the following values:

  • SOURCE_MODE_DISABLED
  • SOURCE_MODE_STORED
  • SOURCE_MODE_SYNTHETIC

Type inference

If type is not specified, protoc-gen-protosearch will infer a field type from the protobuf type.

Protobuf Elasticsearch
string keyword
bool boolean
int32, sint32, sfixed32 integer
uint32, fixed32 long
int64, sint64, sfixed64 long
uint64, fixed64 unsigned_long
float float
double double
bytes binary
message object
enum keyword
google.protobuf.Timestamp date

Diagnostics

The plugin validates some field options and collects diagnostics during compilation. Errors (EXXX) are fatal; protoc will exit with an error code and will not produce any output. The plugin prints warnings (WXXX) to standard output.

Errors

E001

The specified value is invalid for this parameter. The plugin will report the reason.

E002

target.json is not valid JSON.

E003

target.json is not a JSON object.

Warnings

W001

name is invalid.

Names must match the pattern [@a-z][a-z0-9_]*(\.[a-z0-9_]+)*. These are all allowed names:

@timestamp
foo
foo_bar
foo.bar.baz
foo_123

W002

The target label does not correspond to a known target.

protoc-gen-protosearch

With protoc-gen-protosearch installed on your $PATH, you can compile mappings like so:

protoc -I proto/ --plugin=protoc-gen-protosearch --protosearch_out=. proto/example/article.proto

Specify --protosearch_opt=target=<label> to compile the mapping for a specific target.

protoc -I proto/ --plugin=protoc-gen-protosearch --protosearch_out=. --protosearch_opt=target=<label> proto/example/article.proto

The plugin pretty-prints output by default. Specify --protosearch_opt=pretty=false to disable this.