-
Notifications
You must be signed in to change notification settings - Fork 0
Remote Inference server
Memory and computer resources are very limited on mobile devices. Remote inference allows users to offload LLM models from their devices to a remote server, reducing the burden on the device and enabling the use of larger models for better responses. At the same time, it also provides more control over model choices, whether to use managed services like OpenAI and Gemini for ease of use, or self-hosted models like Llama, Gemma, and Mistral for enhanced privacy.
There are two different formats in the NoteUltra inferencing server: question and summary. Both of them are separated by paths your.server.com/question, your.server.com/summary. And they communicate in JSON format.
/question is used to answer questions based on transcription of the day.
-
contextfield contains the information retrieved from the user's vector database; the retrieval process is performed entirely on the user's device to ensure privacy. The amount of context is not specified, but you can set a limit manually on the server to prevent abuse when hosting an inference node publicly. -
chatHistorycontains the user's chat history, which has a fixed number of entries - you will only be able to receive a total of 6 entries. -
questioncontains the question the user is currently asking
Input Format:
{
"context":["meow","meow","meow"],
"chatHistory": [
{ "USER": "meow" },
{ "Assistant": "meow" }
],
"question": "meow"
}- Output contains one key
resultwhich contains the AI's response.
Output Format:
{
"result": "meow"
}/summary is used to summarize AI's response generated from /question to allow user to save them as notes.
-
contextfield contains a single string generated from the AI's response.
Input Format:
{
"context": "NoteUltra is an app that transcripts voices in the background. Keep user's life organized without any effort. User can simply ask question like \"What item should I buy\". NoteUltra will retrieve the information from the transcription that contains thousands of words in few seconds. And also put user privacy first. Everything stored locally, and user have 100% control over their data."
}- Output contains two keys
titleandsummary.-
titleis the title of the note, must be short and concise; ideally, it should be under 15 characters. -
summaryis the content of the note, There is no limit on the length of the summary; however, it should be structured and easy to read. Note that NoteUltra currently does not support markdown, so please avoid using it.
-
Output Format:
{
"title": "About NoteUltra",
"summary": "NoteUltra is a privacy-first app that uses transcription and AI to help users stay organized."
}