Ultra-fast search and analytics engine purposely built for Žejn GROUP - Codemania (TL - Hack) - hackathon in January 2021.
- Detailed competition requirements and instructions can be found in INSTRUCTIONS.md.
- GitHub Project Repository - pinkstack/cp-finder
- Author: Oto Brglez, @otobrglez
In other to achieve incredible speed, performance, responsiveness and scalability I'm proposing that the system for this challenge uses the concepts of CQRS - Command Query Responsibility Segregation - a clear separation of read and write sides of the business domain.
With this project I was aiming to address these challenges
-
How to have incredibly fast writes?
Akka HTTP framework routes request to internal
PeopleActor; that actor then internally instantly spawns new child actors. Those child actors then have enough time to process request (in for of commands) before they are written to LevelDB storage. -
How to have incredibly fast updates?
PeopleActorkeepsNnumber of live persistentPersonActors alive; so any update, change or delete will hit the live actor or will spawn new one to process the designated command.PersonActoris domain actor that represents one core entity of the system. -
How to have fast endpoints for analytics?
AggregateActoris another actor that represents effectively the "read side". Internally it runs "Persistence Query", a query that pulls events/snapshots from LevelDB storage and in parallel generates "aggregates" for the user to consume. These aggregates are then constantly updated with changes from "write side" as they can be live feed updates. So; whenever user is fetching statistics on these analytical endpoints he is reading "in-memory" representation. -
Trade-offs. In other to achieve these incredible results some trade-offs needed to be made. There is a delay between the time that user writes on "write" side and to "analytics" side to represent that change. The system follows the so called patterns of "eventual consistency" and event sourcing.
Although in the real-world these test would be executed with something like Gatling
to stress test the whole application. I've resulted to curl to get some measurements after the system
has already loaded the whole test CSV file (11000 records).
Creating a person with POST on /people endpoint:
curl --header "Content-Type: application/json" \
--request POST \
--data '{"id":"11000","gender":"M","birthDate":"23.06.1953","isoCountry":"gb","testDate":"1.03.2021","testResult":"P","intervention":"quarantine"}' \
-w "@curl-format.txt" \
-o /dev/null \
-s http://localhost:8080/people time_namelookup: 0.011883s
time_connect: 0.012124s
time_appconnect: 0.000000s
time_pretransfer: 0.012156s
time_redirect: 0.000000s
time_starttransfer: 0.000000s
----------
time_total: 0.017672s
Fetching the person with GET:
curl --header "Content-Type: application/json" \
-w "@curl-format.txt" \
-o /dev/null \
-s http://localhost:8080/people/11006 time_namelookup: 0.004118s
time_connect: 0.004303s
time_appconnect: 0.000000s
time_pretransfer: 0.004331s
time_redirect: 0.000000s
time_starttransfer: 0.005596s
----------
time_total: 0.005677s
The PATCH used to update a person via id and JSON payload
curl --header "Content-Type: application/json" \
--request PATCH \
--data '{"gender":"F","birthDate":"23.06.1953","isoCountry":"gb","testDate":"1.03.2021","testResult":"P","intervention":"quarantine"}' \
-w "@curl-format.txt" \
-o /dev/null \
-s http://localhost:8080/people/11006 time_namelookup: 0.004437s
time_connect: 0.004721s
time_appconnect: 0.000000s
time_pretransfer: 0.004759s
time_redirect: 0.000000s
time_starttransfer: 0.000000s
----------
time_total: 0.006930s
... and now the impressive part, numbers and stats endpoints...
-
GET /analytics/positiveByDates- all positive cases, grouped by date.time_namelookup: 0.004067s time_connect: 0.004297s time_appconnect: 0.000000s time_pretransfer: 0.004332s time_redirect: 0.000000s time_starttransfer: 0.006747s ---------- time_total: 0.006836s -
GET /analytics/positiveByGenderAndState- positive cases grouped by the gender and "state"time_namelookup: 0.004433s time_connect: 0.004663s time_appconnect: 0.000000s time_pretransfer: 0.004700s time_redirect: 0.000000s time_starttransfer: 0.006081s ---------- time_total: 0.006171s -
GET /analytics/quarantine- number of cases, grouped by country and counted by number of time-left in quarantine.time_namelookup: 0.004133s time_connect: 0.004374s time_appconnect: 0.000000s time_pretransfer: 0.004409s time_redirect: 0.000000s time_starttransfer: 0.005778s ---------- time_total: 0.005875s
Fast,... 🐇
The service can run as fat JAR on top of any modern JVM or via pre-packaged Docker Image.
🐇 Although out of the scope of the assigment; this project can easily be compiled with GraalVM to also run as "native-image"; that would further reduce memory footprint and improve boot-up time and possibly performance.
-
Install any modern JDK, although it is suggested to use SDKMAN with OpenJDK (14).
- Development was done on
openjdk version "14.0.2" 2020-07-14
- Development was done on
-
Install Scala with Scala Built Tool (SBT)
- Scala version used
2.13.4 - SBT version used
1.4.6
- Scala version used
To run the server please use the following SBT commands, that will spawn the server on http://127.0.0.1:8080 and put everything online.
$ sbt runTo compile the project into fat JAR invoke assembly task
$ sbt assemblyThe assembly task will compile everything and build a jar that can be invoked like so from the root of the project
$ java -jar cp-finder.jarThe servers also supports the following environment variables that can be interchangeably
PORT=8080- HTTP port where the service will listen to.JOURNAL_LEVELDB_DIR=tmp/journal- File path for LevelDB embedded storage directory.SNAPSHOT_DIR=tmp/snapshots- File path for snapshots storage directory
They can also be passed prior to Java command invocation i.e.:
$ PORT=3030 java -jar cp-finder.jarTo run the full testsuite including integration tests please run
$ sbt testTo populate the service with seed data from CSV - the script bin/feed-csv.rb can be used like so:
./bin/feed-csv.rb data/covidPeople.csvPOST http://127.0.0.1:8080/people
Content-Type: application/json
{
"id": 42,
"gender": "M",
"isoCountry": "svn",
"testResult": "P",
"intervention": "quarantine",
"birthDate": "23.06.1953",
"testDate": "31.12.2020"
}GET http://127.0.0.1:8080/people/42PATCH http://127.0.0.1:8080/people/42
Content-Type: application/json
{
"gender": "M",
"isoCountry": "svn",
"testResult": "N",
"intervention": "quarantine",
"birthDate": "23.06.1953",
"testDate": "31.12.2020"
}DELETE http://127.0.0.1:8080/people/42GET http://127.0.0.1:8080/analytics/positive{
"count": 5457
}GET http://127.0.0.1:8080/analytics/positiveByGender{
"female": 2779,
"male": 2678
}Number of positive test cases grouped by the gender, and their current state (quarantine, medical care, hospitalized)
GET http://127.0.0.1:8080/analytics/positiveByGenderAndState{
"female": {
"hospitalized": 919,
"medical care": 865,
"quarantine": 995
},
"male": {
"hospitalized": 898,
"medical care": 916,
"quarantine": 864
}
}GET http://127.0.0.1:8080/analytics/positiveByDates{
"dates": {
"8.12.2020": 20,
"9.01.2021": 12,
"9.03.2020": 14,
"9.04.2020": 18,
"9.05.2020": 20,
"9.06.2020": 14,
"9.07.2020": 15,
"9.08.2020": 14,
"9.09.2020": 16,
"9.10.2020": 19,
"9.11.2020": 12,
"9.12.2020": 14
...This endpoint is to be used for following requirements:
- The number of positive cases by date for all data.
- Filtering subset of countries and return the number of total cases for all the countries
- Endpoint groups results per country
GET http://127.0.0.1:8080/analytics/positiveByCountryAndDates{
"countries": {
"aus": {
"1.03.2020": 3,
"2.03.2020": 3,
"3.03.2020": 3,
"4.03.2020": 9,
"5.03.2020": 5,
"6.03.2020": 2,
"7.03.2020": 9,
"8.03.2020": 4,
"9.03.2020": 2,
"10.03.2020": 6,
"11.03.2020": 5,
"12.03.2020": 6,
"13.03.2020": 1,
...This endpoint is to be used for following requirements:
- The number of individuals currently in the quarantine.
- Results are grouped by country
- Filtering is set to only show cases that are quarantined
- Time of quarantine is set to 14 days.
- The response is "live" and changes as time passes by...
With the help of country query parameters the results can be further filtered.
GET http://127.0.0.1:8080/analytics/quarantine{
"countries": {
"aus": {
"0": 3,
"1": 2,
"2": 1,
"3": 2,
"4": 3,
"5": 1,
"6": 1,
"7": 3,
"8": 1,
...
}
}