A service that allows Cursor to use Azure GPT-5 deployments by:
- Adapting incoming Cursor completions API requests to the Responses API
- Forwarding the requests to Azure
- Adapting outgoing Azure Responses API streams into completions API streams
This project originates from Cursor's lack of support for Azure models that are only served through the Responses API. It will hopefully become obsolete as Cursor continues to improve its model support.
Warning
You still need an active paid Cursor subscription to be able to use this project.
Important
Azure now supports the Completions API for the models gpt-5, gpt-5-mini, and gpt-5-nano.
They can now be used directly in Cursor, but without the ability to change the Reasoning Effort / Verbosity / Summary Level. To do so, you can still use this project.
The models gpt-5-pro and gpt-5-codex remain available only through the Responses API, but work great with this project (see list of specific model limitations in the next section).
The entire gpt-5 series is supported, although some models have some limitations on the reasoning effort / verbosity / truncation / summary values they accept:
| Variable | Value | 5.2 | 5.2-chat | 5.1 | 5.1-codex | 5.1-codex-mini | 5.1-codex-max | 5 | 5-nano | 5-mini | 5-pro | 5-codex |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Reasoning | minimal |
❌ | ❌ | ❌ | ❌ | ❌ | ❌ | ✅ | ✅ | ✅ | ❌ | ❌ |
low |
✅ | ❌ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | |
medium |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | |
high |
✅ | ❌ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | |
| Verbosity | low |
✅ | ❌ | ✅ | ❌ | ❌ | ❌ | ✅ | ✅ | ✅ | ❌ | ❌ |
medium |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | |
high |
✅ | ❌ | ✅ | ❌ | ❌ | ❌ | ✅ | ✅ | ✅ | ❌ | ❌ | |
| Truncation | auto |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ |
disabled |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | |
| Summary | auto |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ |
detailed |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | |
concise |
✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ✅ | ✅ | ✅ | ❌ | ❌ |
(This matrix is automatically generated, and updated after every new model release.)
- Switching between
high/medium/low/minimalreasoning effort levels by selecting different models in Cursor. - Configuring different reasoning summary levels (
auto,detailed,concise). - Displaying reasoning summaries in Cursor natively, like any other reasoning model.
- Production-ready, so you can share the service among different users in an organization.
- When running from a terminal, rich logging of the model's context on every request, including Markdown rendering, syntax highlighting, tool calls/outputs, and more.
Feel free to create or vote on any project issues, and star the project to show your support.
If you prefer to deploy the service (for example, to allow multiple members of your team to use it), check the Production section, as the project comes with production-ready containers using supervisord and gunicorn.
Make a copy of the file .env.example as .env and update the following flags as needed:
| Flag | Description | Default |
|---|---|---|
SERVICE_API_KEY |
Arbitrary API key to protect your service. Set it to a random string. | change-me |
AZURE_BASE_URL |
Your Azure OpenAI endpoint base URL (no trailing slash), e.g. https://<resource>.openai.azure.com. |
required |
AZURE_API_KEY |
Azure OpenAI API key. | required |
AZURE_DEPLOYMENT |
Name of the Azure model deployment to use. | gpt-5 |
AZURE_VERBOSITY_LEVEL |
Hint the model to be more or less expansive in its replies. Use either high / medium / low |
medium |
AZURE_SUMMARY_LEVEL |
Set to none to disable summaries. You might have to disable them if your organization hasn't been approved for this feature. |
detailed |
AZURE_TRUNCATION |
Truncation strategy for long inputs. Either auto or disabled |
disabled |
Alternatively, you can pass them through the environment where you run the application.
Optional Configuration
| Flag | Description | Default |
|---|---|---|
AZURE_API_VERSION |
Azure OpenAI Responses API version to call. | 2025-04-01-preview |
FLASK_ENV |
Flask environment. Use development for dev or production for prod. |
production |
RECORD_TRAFFIC |
Toggle writing request/response traffic to recordings/ |
off |
LOG_CONTEXT |
Enable rich pretty-printing of request context to console. | on |
LOG_COMPLETION |
Enable logging of completion responses (not yet implemented). | on |
Why do I have to?
Since Cursor routes requests through its external prompt-building service rather than directly from the IDE to your API, your custom endpoint must be publicly reachable on the Internet.
Consider using Cloudflare because its tunnels are free and require no account.
Install cloudflared and run:
cloudflared tunnel --url http://localhost:8080Copy the URL of your tunnel from the output of the command. It looks something like this:
+----------------------------------------------------+
| Your quick Tunnel has been created! Visit it at: |
| https://foo-bar.trycloudflare.com |
+----------------------------------------------------+
Then paste it into Cursor Settings > Models > API Keys > OpenAI API Key > Override OpenAI Base URL:
In addition to updating the OpenAI Base URL, you need to:
-
Set OpenAI API Key to the value of
SERVICE_API_KEYin your.env -
Ensure the toggles for both options are on, as shown in the previous image.
-
Add the custom models called exactly
gpt-high,gpt-medium, andgpt-low, as shown in the previous image. You can also creategpt-minimalfor minimal reasoning effort for models that support it. You don't need to remove other models.
To run the production version of the app:
docker compose up flask-prodFor instructions on how to run locally without Docker, and the different development commands, see the Development section.
Expand
python -m venv .venv
pip install -r requirements/dev.txtflask run -p 8080export FLASK_ENV=production
export FLASK_DEBUG=0
export LOG_LEVEL=info
flask run -p 8080This will only run the Flask server with the production settings. For a closer approximation of the production server running with supervisord and gunicorn, check Running with Docker.
flask testTo run only specific tests, you can use the pytest -k argument:
flask test -k ...flask lintThe lint command will attempt to fix any linting/style errors in the code. If you only want to know if the code will pass CI and do not wish for the linter to make changes, add the --check argument.
flask lint --checkExpand
docker compose up flask-devdocker compose up flask-prodThis image runs the server through supervisord and gunicorn. See the Production section for more details.
When running flask-prod, the production flags are set in docker-compose.yml:
FLASK_ENV: production
FLASK_DEBUG: 0
LOG_LEVEL: info
GUNICORN_WORKERS: 4The list of environment: variables in the docker-compose.yml file takes precedence over any variables specified in .env.
docker compose run --rm manage testTo run only specific tests, you can use the pytest -k argument:
docker compose run --rm manage test -k ...docker compose run --rm manage lintThe lint command will attempt to fix any linting/style errors in the code. If you only want to know if the code will pass CI and do not wish for the linter to make changes, add the --check argument.
docker compose run --rm manage lint --checkTo make the generation of test fixtures easier, the RECORD_TRAFFIC flag has been added, which creates files with all the incoming/outgoing traffic between this service and Cursor/Azure in the directory recordings/
To avoid violating Cursor's intellectual property, a redaction layer removes any sensitive data, such as: system prompts, tool names, tool descriptions, and any context containing scaffolding from Cursor's prompt-building service.
Therefore, recorded traffic can be published under tests/recordings/ to be used as test fixtures while remaining MIT-licensed.
Expand
You might want to review and modify the following configuration files:
| File | Description |
|---|---|
supervisord/gunicorn.conf |
Supervisor program config for Gunicorn (bind :5000, gevent; workers/log level from env; logs to stdout/stderr). |
supervisord/supervisord_entrypoint.sh |
Container entrypoint that execs supervisord (prepends it when args start with -). |
supervisord/supervisord.conf |
Main Supervisord config: socket, logging, nodaemon; includes conf.d program configs. |
docker compose build flask-prod
docker tag app-production your-tag
docker push your-tag