cs262a-paper/future_work.tex at master · aksp/cs262a-paper · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
\section{Future work}
Our work leads to three main directions for future work: improvements of PeerDB, additional benchmarking, improvements to the automatic embedding algorithm and evaluation.
This section will address each of these future work areas in turn.

\subsection{Improving PeerDB}
One way to improve PeerDB is to further optimize queries.
As PeerDB is decoupled from the database management system, it is unable to take advantage of its information to optimize queries.
In the future, we could more closely integrate PeerDB with the underlying database to better optimize queries.
Further, PeerDB does not handle detection of update loops.
In the future, we could apply existing loop detection algorithms to figure out when a loop occurs in the system.

\subsection{Benchmarking PeerDB}
We benchmarked PeerDB and compared it to other databases considering several variables and implementations. In the future, we will run more real-world benchmarks based on modeling our data to reflect real world trends (e.g. long tail comment distributions).
We will also collect data from users of PeerDB to conduct comparisons of PeerDB to MongoDb and PostgreSQL using real world data.

\subsection{Improving the automatic algorithm}
One direction for improving the automatic algorithm is extending it's applicability to more use cases. For instance, the algorithm does not currently consider all possible sets of queries (only those that strictly augment the smallest set). In future work, we extend the model to encompass more query sets. As mentioned before, future work may also extend the algorithm to handle cases where we embed entire subdocuments without having the document separately referenced (no help from PeerDB). In addition, we want to extend it to handle reverse queries.
Another direction for improving the automatic algorithm is verifying and enhancing the accuracy of the cost model and input parameters.
Future work could learn parameters, simulate possible configurations for greater accuracy, and compare our current cost model closely to actual system performance.
We could also evaluate low cost configurations using our benchmarking system against high cost configurations to see if relative performance for these configurations confirms intuition.
One final direction for improving the automatic algorithm is assuring that it works at scale. Currently, our algorithm uses brute force to calculate the lowest cost embeddings. In cases where there are more possible combinations of embeddings we could use an existing optimization technique to find an approximate configuration. For instance, we could apply simulated annealing or a greedy approach.