Conversation
# Conflicts: # search/docker-compose.yml # search/search/es.py # search/search/harvester.py # search/search/schemas/pieces.py
BREAKING CHANGE: changes docker-compose
.env.example
Outdated
| S3_ACCESS_KEY=minioadmin | ||
| S3_SECRET_KEY=minioadmin | ||
|
|
||
| S3_START_PATH=annotation |
There was a problem hiding this comment.
What exactly these S3_START_PATH and S3_TEXT_PATH are doing? Is it required for all services?
There was a problem hiding this comment.
these are folder names in Minio. I guess if service works directly with S3 it should have known it.
There was a problem hiding this comment.
@khyurri
I have cleaned up all unnecessary properties. here's the scope of changes:
- moved search configs from root env to search env
- extracted embeddings service
- adjust docker-compose & README notes
.env.example
Outdated
| MANIFEST=manifest.json | ||
| TEXT_PIECES_PATH=/pieces | ||
| INDEXATION_PATH=/indexation | ||
| JOBS_URL=http://jobs |
There was a problem hiding this comment.
Use JOBS_SERVICE_* in search microservice to connect co jobs microservices
.env.example
Outdated
| ES_HOST=badgerdoc-elasticsearch | ||
| ES_PORT=9200 | ||
| APP_TITLE=Badgerdoc Search | ||
| ANNOTATION_URL=http://annotation |
There was a problem hiding this comment.
Use ANNOTATION_SERVICE_* in search microservice to connect to annotation microservice
.env.example
Outdated
| # Search Service | ||
| ES_HOST=badgerdoc-elasticsearch | ||
| ES_PORT=9200 | ||
| APP_TITLE=Badgerdoc Search |
There was a problem hiding this comment.
Do we need to use APP_TITLE in search microservice? Or it's badgerdoc configuration?
.env.example
Outdated
| ES_PORT=9200 | ||
| APP_TITLE=Badgerdoc Search | ||
| ANNOTATION_URL=http://annotation | ||
| ANNOTATION_CATEGORIES=/categories |
There was a problem hiding this comment.
You can move URI inside search microservice, we really don't need to configure path for all services globally
There was a problem hiding this comment.
Ok, I'd agree. Am I right to put in .env for microservice is enough?
.env.example
Outdated
| ANNOTATION_URL=http://annotation | ||
| ANNOTATION_CATEGORIES=/categories | ||
| ANNOTATION_CATEGORIES_SEARCH=/categories/search | ||
| MANIFEST=manifest.json |
There was a problem hiding this comment.
Why this MANIFEST exists globally? Is it for search microservice only?
.env.example
Outdated
| JOBS_URL=http://jobs | ||
| JOBS_SEARCH=/jobs/search | ||
| COMPUTED_FIELDS=["job_id", "category"] | ||
| EMBED_URL=http://embeddings:3334/api/use |
There was a problem hiding this comment.
If EMBED is new microservice, ensure configuration aligned with other host configuration, for example:
EMBEDINGS_SERVICE_SCHEME=http
EMBEDINGS_SERVICE_HOST=badgerdoc-embeddings
EMBEDINGS_SERVICE_PORT=8080
docker-compose-dev.yaml
Outdated
|
|
||
| badgerdoc-elasticsearch: | ||
| container_name: badgerdoc-elasticsearch | ||
| image: amazon/opendistro-for-elasticsearch:latest |
There was a problem hiding this comment.
Better to bind exact version here
docker-compose-dev.yaml
Outdated
| # file for local needs) | ||
| # - 8082 for Keycloak. | ||
| # - 8083 for BadgerDoc web | ||
| # 8084 for BadgerDoc search |
There was a problem hiding this comment.
What exactly exposed in 8084?
There was a problem hiding this comment.
this is search API microservice
There was a problem hiding this comment.
web talks to that service via RP
docker-compose-dev.yaml
Outdated
| networks: | ||
| - badgerdoc | ||
| ports: | ||
| - "0.0.0.0:3334:3334" |
There was a problem hiding this comment.
Why do you expose port here? Do we really need this service outside of reproxy ?
docker-compose-dev.yaml
Outdated
| - ROOT_PATH=/search | ||
| working_dir: /opt/search | ||
| ports: | ||
| - 8084:8080 |
There was a problem hiding this comment.
All microservices proxies from reproxy to keep the same port for all microservices. You need to remove ports bindings here and add EXPOSE to dockerfile
search/create_dataset.py
Outdated
| import csv | ||
| from opensearchpy import OpenSearch, helpers | ||
|
|
||
| PATH_PRODUCTS_DATASET = "data/" |
There was a problem hiding this comment.
All hardcode must be moved to configuration or global configuration
search/create_dataset.py
Outdated
| annotation_dataset = load_annotation_dataset() | ||
|
|
||
| #### Use the embedding model to calculate vectors for all annotation texts | ||
| print("Computing embeddings for %d sentences" % len(annotation_dataset)) |
There was a problem hiding this comment.
Use logging instead of print
search/create_dataset.py
Outdated
| es = OpenSearch([{"host": ES_HOST, "port": ES_PORT}]) | ||
|
|
||
| #### load test data set | ||
| annotation_dataset = load_annotation_dataset() |
There was a problem hiding this comment.
Is it expected, that we load annotation_dataset globally here?
search/embeddings/main.py
Outdated
| @@ -0,0 +1,66 @@ | |||
| from flask import request | |||
There was a problem hiding this comment.
We don't use flask in Badgerdoc, use Fast API instead
khyurri
left a comment
There was a problem hiding this comment.
Very good job and great progress, however PR must be changed according current Badgerdoc requirements for development!
Thank you!
No description provided.