This sample application demonstrates how to use Vespa predicate fields for indexing boolean document constraints. A predicate is a specification of a boolean constraint in the form of a boolean expression. Vespa's predicate fields are used to implement targeted advertising systems at scale.
For example, this predicate using three target properties or attributes (not to be confused with Vespa attributes):
gender in ['male'] and age in [30..40] and income in [200..50000]
This sample application demonstrates an imaginary two-sided dating marketplace where users can control visibility in the search or recommendation result page. For example, Bob only wants to be displayed for users that satisfy the following predicate:
gender in ['male'] and age in [20..40] and hobby in ['climbing', 'sports']
Users who do not satisfy those properties would not be able to see Bob's profile in the marketplace. Like Bob; Alice is picky, she only wants to be shown for males in their thirties with a high income (measured in thousands).
gender in ['male'] and age in [30..40] and income in [200..50000]
Both Bob and Alice are indexed in the marketplace's index system (powered by Vespa of course) as Vespa documents.
The predicate expression in the document determines which queries (other users) they would be retrieved for.
The marketplace owner is responsible for managing available targeting properties (e.g gender, age and income) and
at query or recommendation time, set all known properties of the query side user.
We also demonstrate how the marketplace can implement query side filter over regular Vespa fields, so a user Karen
can also specify regular query side constraints (for example, searching for users in a certain age group).
This way, the marketplace system has two-sided filtering.
Imagine if deployed ad systems would allow the user to also have constraints on the ads shown,
and not just the other way around?
Finally, we demonstrate how the marketplace can rank users using marketplace business metrics like cost-per-click (CPC) and user interest embeddings.
Requirements:
- Docker Desktop installed and running. 6 GB available memory for Docker is recommended. Refer to Docker memory for details and troubleshooting
- Alternatively, deploy using Vespa Cloud
- Operating system: Linux, macOS or Windows 10 Pro (Docker requirement)
- Architecture: x86_64 or arm64
- Homebrew to install Vespa CLI, or download a vespa cli release from GitHub releases.
- Java 17 installed.
- Apache Maven. This sample app uses custom Java components and Maven is used to build the application.
Validate Docker resource settings, should be minimum 4 GB:
$ docker info | grep "Total Memory" or $ podman info | grep "memTotal"
Install Vespa CLI:
$ brew install vespa-cli
For local deployment using the docker image:
$ vespa config set target local
Pull and start the vespa docker container image:
$ docker pull vespaengine/vespa $ docker run --detach --name vespa --hostname vespa-container \ --publish 127.0.0.1:8080:8080 --publish 127.0.0.1:19071:19071 \ vespaengine/vespa
Verify that configuration service (deploy api) is ready:
$ vespa status deploy --wait 300
Download this sample application:
$ vespa clone examples/predicate-fields my-app && cd my-app
Build the sample app:
$ mvn package -U
Deploy the app:
$ vespa deploy --wait 300
It is possible to deploy this app to Vespa Cloud.
The users in the imaginary marketplace:
{"put": "id:s:user::alice", "fields": {"target": "gender in ['male'] and age in [30..40] and income in [200..50000]", "age": 23, "gender": ["female"]}}
{"put": "id:s:user::bob", "fields": {"target": "gender in ['male'] and age in [20..40] and hobby in ['climbing', 'sports']", "age":41, "gender":["male"]}}
{"put": "id:s:user::karen", "fields": {"target": "gender in ['male'] and age in [30..55]", "age":55, "gender": ["female"]}}
{"put": "id:s:user::mia", "fields": {"target": "gender in ['male'] and age in [50..80]", "age":56,"gender": ["female"]}}target is the predicate field, the rest are regular fields.
The target predicate field specifies which users the indexed user want's to be shown to,
and the regular fields like age and gender could be searched by other users in the marketplace.
Feed the documents:
$ vespa feed users.jsonl
To update one document:
$ vespa document -v user.json
A user, Ronald, enters the marketplace home page and the marketplace knows the following properties about Ronald:
- gender: male
- age: 32
- income 3000
The marketplace uses these properties when matching against the index of users using the predicate query operator:
$ vespa query 'yql=select * from sources * where predicate(target, {"gender":["male"]}, {"age":32, "income": 3000})'
The above request will retrieve both Karen and Alice as their target predicate matches the user properties.
If Ronald's income estimate drops to 100K, Alice will no longer match since Alice
has specified a picky income limitation.
$ vespa query 'yql=select * from sources * where predicate(target, {"gender":["male"]}, {"age":32, "income": 100})'
Another user, Jon, enters the marketplace's search page. The marketplace knows the following properties about Jon:
- gender: male
- age: 32
- income 100
- hobby: sports
The marketplace search page will fill in the known properties and perform a search against the index of users:
$ vespa query 'yql=select * from sources * where predicate(target, {"gender":["male"], "hobby":["sports"]}, {"age":32, "income": 100})'
The query returns both Bob and Karen. Jon is mostly interested in men, so the marketplace can
specify a regular filter on the gender field using regular YQL filter syntax, adding and gender contains "male"
as a query constraint:
$ vespa query 'yql=select * from sources * where predicate(target, {"gender":["male"], "hobby":["sports"]}, {"age":32, "income": 100}) and gender contains "male"'
This is an example of two-sided filtering, both the search user and the indexed user has constraints.
Predicate fields control matching and as we have seen from the above examples, can also be used with regular query filters.
The combination of document side predicate and query filters determines what documents are returned, but also which documents (users) are exposed to Vespa's ranking framework.
Feed data with user profile embedding vectors, and the marketplace business user cpc:
$ vespa feed users_with_ranking_data.jsonl
Ronald, enters the marketplace home page again
- gender: male
- age: 32
- income 3000
- Interest embedding representation based on past user to user interactions, or explicit preferences.
And the marketplace runs a recommendation query to display users for Ronald:
$ vespa query 'yql=select * from sources * where predicate(target, {"gender":["male"]}, {"age":32, "income": 3000})'
Notice that we match both Alice and Karen, but Karen is ranked higher because karen has paid more,
her cpc score is higher. Notice that the relevance is now non-zero, in all the previous examples, the ordering
of the users was non-deterministic. The ranking formula is expressed in the user
schema default rank-profile
If we now add personalization to the ranking mix, Alice is ranked higher than Karen, as Alice is closer to Ronald in the interest embedding vector space.
This query combines the predicate with the nearestNeighbor
query operator. The marketplace sends the interest embedding vector representation of Ronald with the query
as a query tensor.
$ vespa query 'yql=select documentid from sources * where (predicate(target, {"gender":["male"]}, {"age":32, "income": 3000})) and ({targetHits:10}nearestNeighbor(profile,profile))' \
'input.query(profile)=[0.10958350208504841, 0.4642735718813399, 0.7250558657395969, 0.1689946673589695]'
$ docker rm -f vespa