Professional Python client for interacting with high-capacity multimodal Large Language Models hosted on the COE AI GPU cluster.
β οΈ Network Requirement: This API is only accessible from UPES's internal network (UPESNET). Ensure you are connected to UPES Wi-Fi to use this package.
The coeai package provides seamless LLM inference over the UPES local network, supporting both text-to-text and image-to-text operations with advanced streaming capabilities.
pip install coeaifrom coeai import LLMinfer
# Initialize client (requires UPESNET connectivity)
llm = LLMinfer(api_key="your-api-key")
# Generate text
response = llm.generate(
model="tinyllama:latest",
prompt="Explain quantum computing in simple terms",
max_tokens=256
)
print(response)| Feature | Description |
|---|---|
| Text Generation | Support for all available LLM models |
| Vision Models | Image-to-text with multimodal models |
| Streaming | Real-time response streaming |
| Error Handling | Comprehensive error messages with actionable guidance |
| File Management | Automatic cleanup of file handles |
| Logging | Optional debug logging for troubleshooting |
| Model Discovery | Programmatic model listing |
Get the latest list programmatically:
models = llm.list_models()
print(f"Available models: {', '.join(models)}")Current models (as of February 2026):
tinyllama:latest- Compact model for basic taskstinyllama:1.1b- Small efficient modeldeepseek-r1:70b- Advanced reasoning modelgpt-oss:120b- Large general-purpose modelllama4:16x17b- High-quality multimodal modelllama4:128x17b- Largest available model
llama3.2-vision:11b- Vision-capable modelllama4:16x17b- Recommended for image analysis
bge-m3:567m- Text embeddings model
from coeai import LLMinfer
llm = LLMinfer(api_key="your-api-key")
response = llm.generate(
model="tinyllama:latest",
prompt="Write a haiku about programming",
max_tokens=100,
temperature=0.8
)
print(response)response = llm.generate(
model="deepseek-r1:70b",
prompt="Explain the theory of relativity",
max_tokens=512,
stream=True,
print_stream=True # Print as it generates
)response = llm.generate(
model="llama3.2-vision:11b",
inference_type="image-to-text",
files=["/path/to/image.jpg"],
prompt="Describe this image in detail",
max_tokens=512
)
print(response)response = llm.generate(
model="llama4:16x17b",
inference_type="image-to-text",
files=["/path/to/image1.jpg", "/path/to/image2.jpg"],
prompt="Compare these images and identify differences",
max_tokens=1024,
temperature=0.7
)messages = [
{"role": "system", "content": [{"type": "text", "text": "You are a helpful coding assistant."}]},
{"role": "user", "content": [{"type": "text", "text": "Explain Python decorators"}]}
]
response = llm.generate(
model="gpt-oss:120b",
messages=messages,
max_tokens=512
)response = llm.generate(
model="deepseek-r1:70b",
prompt="Solve: What is 15! (factorial)?",
max_tokens=400,
context_window=4096, # Increase context for complex tasks
temperature=0.1, # Lower = more deterministic
top_p=0.9 # Nucleus sampling
)LLMinfer(api_key: str, host: str = "http://10.9.6.165:8000")| Parameter | Type | Description | Default |
|---|---|---|---|
api_key |
str | Your COE AI API key (required) | - |
host |
str | API server URL | http://10.9.6.165:8000 |
generate(
model: str,
inference_type: str = "text-to-text",
prompt: Optional[str] = None,
messages: Optional[List[Dict]] = None,
files: Optional[List[str]] = None,
max_tokens: int = 512,
temperature: float = 0.7,
top_p: float = 1.0,
context_window: int = 2048,
stream: bool = False,
print_stream: bool = True
) -> Dict| Parameter | Type | Description |
|---|---|---|
model |
str | Model name (required) |
inference_type |
str | "text-to-text" or "image-to-text" |
prompt |
str | Text prompt (optional if messages provided) |
messages |
list | Custom conversation history |
files |
list | Image file paths for vision models |
max_tokens |
int | Maximum tokens to generate |
temperature |
float | Sampling temperature (0.0β2.0) |
top_p |
float | Nucleus sampling (0.0β1.0) |
context_window |
int | Context size (num_ctx), default 2048 |
stream |
bool | Enable streaming response |
print_stream |
bool | Print stream to console |
Returns: Dictionary with API response
list_models() -> List[str]Returns list of all available model names.
- Connect to UPESNET (UPES Wi-Fi)
- Visit https://coeai.ddn.upes.ac.in
- Sign in with your UPES credentials
- Generate an API key from the dashboard
Send an email to hpc-access@ddn.upes.ac.in from your UPES account:
Subject: API Key Request for COE AI LLM Access
Dear COE AI Team,
I am requesting access to the LLM API for my project work.
Project Details:
- Project Name: <Your Project>
- Description: <Brief description>
- Expected Usage: <How you'll use the API>
Name: <Your Name>
Email: <Your UPES Email>
Department: <Your Department>
Thank you!
Allow 2-3 business days for processing.
The client provides detailed error messages:
from coeai import LLMinfer, AuthenticationError, ModelNotFoundError, InferenceError
llm = LLMinfer(api_key="your-key")
try:
response = llm.generate(
model="tinyllama:latest",
prompt="Hello world"
)
except AuthenticationError as e:
print(f"Auth failed: {e}")
except ModelNotFoundError as e:
print(f"Model not found: {e}")
except InferenceError as e:
print(f"Inference failed: {e}")
except FileNotFoundError as e:
print(f"Image file missing: {e}")| Error | Cause | Solution |
|---|---|---|
AuthenticationError |
Invalid API key | Check key or get new one |
ModelNotFoundError |
Model doesn't exist | Use llm.list_models() to see available models |
InferenceError (429) |
Rate limit exceeded | Wait and retry |
FileNotFoundError |
Image path wrong | Verify file exists |
ConnectionError |
Can't reach server | Verify you're on UPESNET, check server status |
import logging
logging.basicConfig(level=logging.DEBUG)
from coeai import LLMinfer
llm = LLMinfer(api_key="your-key")import coeai
print(f"coeai version: {coeai.__version__}")llm = LLMinfer(api_key="your-key")
try:
models = llm.list_models()
print(f"β
Connected! Found {len(models)} models")
except Exception as e:
print(f"β Connection failed: {e}")-
Model Selection
- Use
tinyllamafor quick responses - Use
deepseek-r1:70bfor reasoning tasks - Use
llama4:16x17borllama3.2-visionfor image analysis
- Use
-
Temperature Settings
- 0.1-0.3: Factual/technical content
- 0.7-0.9: Balanced creativity
- 1.0-2.0: Maximum creativity
-
Token Limits
- Set
max_tokensappropriately to balance quality and speed - Typical: 100-256 for summaries, 512-1024 for detailed responses
- Set
-
Streaming
- Enable
stream=Truefor long responses to see progress
- Enable
β οΈ Important: Requires UPESNET (UPES internal network) connectivity
Before (v2.x/v3.x):
llm = LLMinfer(api_key="key", host="http://10.9.6.165:8001") # Wrong portAfter (v4.0.0):
# Correct - uses port 8000 (default)
llm = LLMinfer(api_key="key")
# Or explicitly specify
llm = LLMinfer(api_key="key", host="http://10.9.6.165:8000")- β Automatic file handle cleanup
- β Comprehensive error handling with custom exceptions
- β
list_models()method for model discovery - β Debug logging support
- β Relaxed vision model restrictions
- β
Version export (
coeai.__version__)
- BREAKING: Fixed default port from 8001 to 8000
- REQUIREMENT: Only accessible from UPES internal network (UPESNET)
- API Keys: Get your key from https://coeai.ddn.upes.ac.in
- Added
list_models()method - Improved error handling with custom exceptions
- Fixed file handle leaks
- Added logging support
- Relaxed vision model restrictions
- Added
__version__export - Updated documentation with current models
- Production release with text-to-text and image-to-text support
- Streaming capabilities
- Multiple image processing
Released under the MIT License. See LICENSE for details.
Konal Puri & Sawai Pratap Khatri
Centre of Excellence: AI (COE AI), HPC Project, UPES
- PyPI: https://pypi.org/project/coeai
- GitHub: https://github.com/pkonal23/COE-AI-HPC-Project
- API Server: http://10.9.6.165:8000 (UPESNET only)
- Get API Key: https://coeai.ddn.upes.ac.in
Contributions are welcome! Please feel free to submit issues or pull requests on GitHub.
For issues or questions:
- Open an issue on GitHub
- Contact:
hpc-access@ddn.upes.ac.in
Made with β€οΈ at UPES Centre of Excellence: AI