Simple Flask web application demonstrating real-time streaming of LLM responses from OpenAI ChatGPT
This is a simple Flask web application that demonstrates how to stream responses from OpenAI's ChatGPT API in real-time using Server-Sent Events (SSE). The app showcases the integration of Flask, LangChain, and OpenAI to create an interactive chat interface with streaming responses.
Note: This is a learning/demonstration project. For production use, implement proper security practices including environment variables for API keys, rate limiting, and user authentication.
- Real-time Streaming: Stream LLM responses token-by-token to the frontend
- Server-Sent Events: Uses SSE for efficient real-time communication
- OpenAI Integration: Powered by ChatGPT (GPT-3.5-turbo)
- LangChain Support: Built with LangChain framework (with examples for vector DB integration)
- Simple UI: Clean, minimal interface for testing
- Flask Backend: Lightweight Python web server
- Backend: Flask 2.0+, Flask-RESTful
- LLM Framework: LangChain
- AI Model: OpenAI GPT-3.5-turbo
- Frontend: Vanilla JavaScript, HTML, CSS
- Streaming: Server-Sent Events (SSE)
- Optional: Pinecone (commented out in code for vector database)
- Python 3.8 or higher
- OpenAI API key (Get one here)
- pip (Python package manager)
- Clone the repository:
git clone https://github.com/figlesias221/llm_simple_app.git
cd llm_simple_app- Install dependencies:
pip install -r requirements.txt- Set up your OpenAI API key:
Create a .env file in the root directory:
OPENAI_API_KEY=your_openai_api_key_hereOr set it as an environment variable:
export OPENAI_API_KEY='your_openai_api_key_here'.env file (add .env to .gitignore).
- Run the application:
python3 main.py- Open your browser and navigate to:
http://localhost:5000
llm_simple_app/
βββ main.py # Flask server with streaming endpoint
βββ templates/
β βββ index.html # Frontend interface
βββ requirements.txt # Python dependencies
βββ .env.example # Example environment variables
βββ README.md
Renders the main chat interface.
Streams ChatGPT responses in real-time.
Request: Form data (currently uses hardcoded prompt)
Response: Server-Sent Events stream
Content-Type: text/event-stream
- Frontend: User submits a query through the HTML form
- Request: JavaScript sends POST request to
/completion - Backend: Flask receives request and calls OpenAI API with streaming enabled
- Streaming: OpenAI streams response tokens back to Flask
- SSE: Flask yields each token as Server-Sent Events
- Display: JavaScript reads the stream and updates the UI in real-time
In main.py, modify the model parameter:
completion = openai.ChatCompletion.create(
model="gpt-4", # Change to gpt-4, gpt-3.5-turbo, etc.
# ...
)Edit the prompt function in main.py:
def gen_prompt(query) -> str:
return f"""Your custom system prompt here.
Question: {query}
Context: Your context here
Answer:
"""The code includes commented-out sections for Pinecone integration:
- Uncomment lines 35-88 in
main.py - Add your Pinecone API key and configuration
- Install additional dependencies:
pip install pinecone-client
Streaming implementation:
def stream(input_text):
completion = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=[
{"role": "system", "content": "You're an assistant."},
{"role": "user", "content": f"{prompt(input_text)}"},
],
stream=True,
max_tokens=500,
temperature=0
)
for line in completion:
if 'content' in line['choices'][0]['delta']:
yield line['choices'][0]['delta']['content']Frontend streaming consumption:
const response = await fetch('/completion', {
method: 'POST',
body: formData
});
const reader = response.body.getReader();
while (true) {
const {done, value} = await reader.read();
if (done) break;
const text = new TextDecoder().decode(value);
document.getElementById("result").innerHTML += text;
}- Never commit API keys: Use environment variables
- Rate Limiting: Implement rate limiting for production
- Input Validation: Validate and sanitize user inputs
- Authentication: Add user authentication for production
The current main.py contains commented-out sections with exposed API keys. These should be:
- Removed from the code
- Moved to environment variables
- Never committed to version control
This is a demonstration project. For production use, consider:
- Environment variable management (use
python-dotenv) - Error handling and logging
- Rate limiting and request throttling
- User authentication and authorization
- HTTPS/SSL
- Database for conversation history
- Proper async handling
- Cost monitoring for API usage
Key dependencies (see requirements.txt for full list):
flask- Web frameworkflask-restful- REST API extension for Flaskopenai- OpenAI API clientlangchain- LLM frameworkpython-dotenv- Environment variable management (recommended)
This is a learning project. Feel free to fork and experiment!
This project is private and not licensed for public use.
Educational Project: Built to demonstrate LLM streaming with Flask and OpenAI