A modern, stateful communication protocol for conversational AI models. AIP provides an efficient WebSocket-based alternative to traditional stateless HTTP REST APIs for AI interactions.
- Stateful Sessions: Maintains conversation context automatically
- Real-time Streaming: Token-by-token response streaming for better UX
- Developer-Friendly: Simple SDK abstracts WebSocket complexity
- Efficient: WebSocket connections reduce overhead compared to HTTP polling
- Extensible: Easy to add new message types and features
- Multiple Providers: Support for both OpenAI and local Ollama models
Run AI models completely on your machine - no API key needed!
✅ Free - No API costs ✅ Private - Data never leaves your machine ✅ Fast - No network latency ✅ Offline - Works without internet
📖 See OLLAMA_GUIDE.md for complete instructions
Quick Start:
# 1. Make sure Ollama is running (with your models)
ollama list
# 2. Start AIP server with Ollama
./run_with_model.sh
# or: python server_ollama.py
# 3. Run examples
python example_ollama.pyUse OpenAI's GPT models (requires API key and costs money)
The AIP MVP consists of two main components:
- Server: FastAPI-based WebSocket server that manages sessions
server.py- OpenAI version (requires API key)server_ollama.py- Ollama version (free, local)
- Client SDK (
aip_sdk.py): Python library that provides a simple interface for connecting and communicating
- Python 3.8 or higher
- OpenAI API key
-
Clone or download this repository
-
Install dependencies:
pip install -r requirements.txt- Set your OpenAI API key:
export OPENAI_API_KEY="your-api-key-here"Start the AIP server:
python server.pyThe server will be available at ws://localhost:8000/aip
You can verify it's running by visiting http://localhost:8000 in your browser.
In a new terminal, run the example client:
python example.pyThis will demonstrate:
- Connecting to the server
- Asking questions with streamed responses
- Context retention across multiple questions
- Clean disconnection
import asyncio
from aip_sdk import AIPClient
async def main():
# Connect to the server
client = await AIPClient.connect("ws://localhost:8000/aip")
# Ask a question and print the streamed response
await client.ask(
"What is Python?",
on_token=lambda token: print(token, end="", flush=True)
)
# Ask a follow-up (context is maintained)
await client.ask(
"What are its main uses?",
on_token=lambda token: print(token, end="", flush=True)
)
# Disconnect
await client.disconnect()
asyncio.run(main())For automatic cleanup, use the async context manager:
async def main():
async with await AIPClient.connect("ws://localhost:8000/aip") as client:
await client.ask("Hello!", on_token=print)
# Automatically disconnected when exiting the blockInstead of printing tokens, you can accumulate them:
response_tokens = []
await client.ask(
"Tell me a story",
on_token=response_tokens.append
)
full_response = "".join(response_tokens)
print(full_response)- Client initiates WebSocket connection to
/aip - Server accepts connection and generates unique Session ID
- Server sends
SESSION_IDmessage to client - Connection is ready for use
ASK: Send a prompt to the AI
{
"type": "ASK",
"payload": "Your question here"
}SESSION_ID: Initial message containing session identifier
{
"type": "SESSION_ID",
"payload": "550e8400-e29b-41d4-a716-446655440000"
}TOKEN: A single token from the streaming response
{
"type": "TOKEN",
"payload": "Hello"
}DONE: Indicates the response stream is complete
{
"type": "DONE",
"payload": ""
}ERROR: Error message
{
"type": "ERROR",
"payload": "Error description"
}aprotoc/
├── server.py # FastAPI WebSocket server
├── aip_sdk.py # Python client SDK
├── example.py # Usage examples
├── requirements.txt # Python dependencies
└── README.md # This file
Connect to an AIP server.
- Parameters:
host: WebSocket URL (e.g.,ws://localhost:8000/aip)
- Returns: Authenticated
AIPClientinstance - Raises:
Exceptionif connection fails
Send a prompt and receive streamed response.
- Parameters:
prompt: The question or prompt to sendon_token: Callback function called for each token (optional)
- Raises:
Exceptionif an error occurs
Close the WebSocket connection.
client.session_id: Get the current session IDclient.is_connected: Check if client is connected
Edit server.py to customize:
- Port: Change in
uvicorn.run()(default: 8000) - Host: Change in
uvicorn.run()(default: 0.0.0.0) - Model: Change
modelparameter in OpenAI call (default: gpt-4o) - Temperature: Change
temperatureparameter (default: 0.7)
Currently uses in-memory storage. For production, consider:
- Redis for distributed session storage
- Database for persistent conversation history
- TTL-based session expiration
(Tests to be implemented)
pytest tests/Follow PEP 8 guidelines. Format with:
black server.py aip_sdk.py example.pyFuture enhancements:
- Authentication and authorization
- Multiple AI provider support (Anthropic, Google, etc.)
- Persistent session storage
- Rate limiting
- Client SDKs in other languages (JavaScript, Go, etc.)
- Message history retrieval
- Custom system prompts per session
- Conversation branching
- File/image upload support
- Ensure the server is running:
python server.py - Check the server URL in your client code
- Verify no firewall is blocking port 8000
- Verify your API key is set:
echo $OPENAI_API_KEY - Check your OpenAI account has credits
- Ensure you have access to the gpt-4o model
- Install dependencies:
pip install -r requirements.txt - Verify you're using Python 3.8+:
python --version
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
MIT License - feel free to use this in your own projects!
Built with:
- FastAPI - Modern web framework
- OpenAI - AI model provider
- websockets - WebSocket client library
Note: This is an MVP (Minimum Viable Product). It's designed for demonstration and development purposes. For production use, consider adding authentication, rate limiting, persistent storage, and comprehensive error handling.