Benchmark Transcriptions

Create automated tests to determine model performance.

* Test should measure performance via websocket to a running `node-server` and `whisper-service` instance and via directly instantiation of model in python.
* Should measure latency, accuracy, and other relevant stats of returned transcriptions