cs262a-paper/peerdb_description.tex at master · aksp/cs262a-paper · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
\section{PeerDB implementation}

PeerDB is implemented as a library for Meteor web framework.
It provides a declarative way to define database schema for documents and relations between document for your Meteor web application.
Together with specifying relations, you can also specify which fields should be embedded as subdocuments for both forward and backwards relations.
Currently, PeerDB requires programmer to decide if and which fields to embed.
That decision can often be made on program's logic and profiling of queries, in a similar way one would be deciding about existence of an index on a field.
Both embedding and indexes make a trade-off between faster read times on an expense of write times.
In later sections we analyze this trade-off and present some general guidelines and automatic algorithm to help deciding if you want to embed fields or not.

Additionally, PeerDB provides an easy way to define client-side generator fields (fields which value is computed based on values of other fields) and generic triggers, which can be implemented in the client-side language and are run in PeerDB process instead of inside a database management system. This allows easy integration and code reuse with the rest of the application.

It uses an abstraction over MongoDB oplog to implement all above mentioned features.
MongoDB oplog originally serves for replication among multiple MongoDB instances, sending a stream of all changes from master to slaves.
We can connect to this same oplog to observe all changes in the database and determine if a change is connected to a field which is embedded in a related document, or otherwise observed as part of generated fields and triggers.
If this is so, PeerDB issues update queries which update all those fields in subdocuments in related documents.

In this way original queries return immediately.
Writes finish as soon as they would finish without using PeerDB.
PeerDB then in an asynchronous way make data consistent, or better eventually consistent.
So while writes themselves are still as fast as they can be under MongoDB, an important metric to observe is time to consistency, when PeerDB stops sending any more queries to update the data after a modification is made.

Such asynchronous and decoupled architecture allows us to use PeerDB with other programs modifying the database.
We can run PeerDB as a dedicated instance while the rest of the program can be written even in another language or system.
Multiple different systems can all work on the same database and PeerDB will still make sure that data is consistent.
This allows us to support legacy applications and modern web applications in the cloud which are often created from multiple separate services working together on one database through various APIs.

Because PeerDB is decoupled from the database management system itself, it does not interfere with their internal operations.
This makes the whole system more robust because interfaces between components have clean borders, but it leaves some optimizations opportunities for future work where PeerDB could be smarter in update queries it makes based on information available to the database management system internally.

Asynchronous operation addresses issues with possible infinite loops which might happen in some other implementation, for example where updates would be made synchronously inside a post-query hook.
Updates to data by PeerDB updates can lead to further updates by PeerDB if they modify data observed in another relations, generated fields, or triggers.
A loop could occur in a post-query hook implementation if PeerDB updates would run synchronously unconditionally, running queries against other observed fields (but not necessary modifying them) which would then immediately trigger more queries, trigger more, and so on.
By observing modifications of the data directly, instead of hooking into queries, PeerDB does not have to deal with parsing and understanding of queries, and updates are run only when data really changes, not when query merely uses an observed field.
Loops can still occur if programmer links triggers in a way which mutually modify the data, for example, fields which increment the value based on the another field which is itself generated by incrementing the value from the first field.
Detection of such kind of loops is left for future work.

As currently implemented, PeerDB issues updates queries in a straightforward way, without any optimizations which might reduce the number of unnecessary updates queries.
Currently queries are issued and are left to MongoDB to determine that a particular query does not have anything to update.

PeerDB runs as a background process inside Meteor application, observing changes in the database and issuing the updates.
To scale, it provides a way to run as multiple separate processes/instances to distribute the load, each instance observing and reacting to just a subset of all documents based on their ID.

It is implemented in CoffeeScript and available as open source library at \url{https://github.com/peerlibrary/meteor-peerdb}.