The project implements AI DIAL API for language models from Azure OpenAI.
This project uses Python>=3.11 and Poetry>=2.1.1 as a dependency manager.
Check out Poetry's documentation on how to install it on your system before proceeding.
To install requirements:
poetry install
This will install all requirements for running the package, linting, formatting and tests.
The recommended IDE is VSCode. Open the project in VSCode and install the recommended extensions.
The VSCode is configured to use PEP-8 compatible formatter Black.
Alternatively you can use PyCharm.
Set-up the Black formatter for PyCharm manually or install PyCharm>=2023.2 with built-in Black support.
Run the development server locally:
make serve
Run the server from Docker container:
make docker_serve
As of now, Windows distributions do not include the make tool. To run make commands, the tool can be installed using the following command (since Windows 10):
winget install GnuWin32.Make
For convenience, the tool folder can be added to the PATH environment variable as C:\Program Files (x86)\GnuWin32\bin
.
The command definitions inside Makefile should be cross-platform to keep the development environment setup simple.
Copy .env.example
to .env
and customize it for your environment.
The following variables cluster all deployments into the groups of deployments which share the same API and the same tokenization algorithm.
Variable | Default | Description |
---|---|---|
DALLE3_DEPLOYMENTS | `` | Comma-separated list of deployments that support DALL-E 3 API. Example: dall-e-3,dalle3,dall-e |
DALLE3_AZURE_API_VERSION | 2024-02-01 | The version API for requests to Azure DALL-E-3 API |
GPT_IMAGE_1_DEPLOYMENTS | `` | Comma-separated list of deployments that support GPT-Image 1 API. Example: gpt-image-1 |
GPT_IMAGE_1_AZURE_API_VERSION | 2024-02-01 | The version API for requests to Azure GPT Image 1 API |
MISTRAL_DEPLOYMENTS | `` | Comma-separated list of deployments that support Mistral Large Azure API. Example: mistral-large-azure,mistral-large |
DATABRICKS_DEPLOYMENTS | `` | Comma-separated list of Databricks chat completion deployments. Example: databricks-dbrx-instruct,databricks-mixtral-8x7b-instruct,databricks-llama-2-70b-chat |
GPT4O_DEPLOYMENTS | `` | Comma-separated list of GPT-4o chat completion deployments. Example: gpt-4o-2024-05-13 |
GPT4O_MINI_DEPLOYMENTS | `` | Comma-separated list of GPT-4o mini chat completion deployments. Example: gpt-4o-mini-2024-07-18 |
AZURE_AI_VISION_DEPLOYMENTS | `` | Comma-separated list of Azure AI Vision embedding deployments. The endpoint of the deployment is expected point to the Azure service: https://<service-name>.cognitiveservices.azure.com/ |
Deployments that do not fall into any of the categories are considered to support text-to-text chat completion OpenAI API or text embeddings OpenAI API.
Variable | Default | Description |
---|---|---|
LOG_LEVEL | INFO | Log level. Use DEBUG for dev purposes and INFO in prod |
WEB_CONCURRENCY | 1 | Number of workers for the server |
TIKTOKEN_MODEL_MAPPING | {} |
Mapping from the request deployment id to tiktoken model name. Required for the tokenization of chat completion requests/responses on the adapter side when the upstream model doesn't return the token usage. Example: {"my-gpt-deployment":"gpt-3.5-turbo","my-gpt-o3-deployment":"o3"} . You don't need to add a deployment to the mapping if it's already named so that it matches one of the tiktoken models. You can check it by running python -c "from tiktoken.model import encoding_name_for_model as e; print(e('my-deployment-name'))" . All chat completion models require tokenization via tiktoken except the one declared in DATABRICKS_DEPLOYMENTS , MISTRAL_DEPLOYMENTS and DALLE3_DEPLOYMENTS variables. |
DIAL_USE_FILE_STORAGE | False | Save image model artifacts to DIAL File storage (DALL-E images are uploaded to the DIAL file storage and its base64 encodings are replaced with links to the storage) |
DIAL_URL | URL of the core DIAL server (required when DIAL_USE_FILE_STORAGE=True) | |
NON_STREAMING_DEPLOYMENTS | `` | Comma-separated list of deployments which do not support streaming. The adapter is going to emulate the streaming by calling the model and converting its response into a single-chunk stream. Example: "o1-mini,o1-preview" |
ACCESS_TOKEN_EXPIRATION_WINDOW | 10 | The Azure access token is renewed this many seconds before its actual expiration time. The buffer ensures that the token does not expire in the middle of an operation due to processing time and potential network delays. |
AZURE_OPEN_AI_SCOPE | https://cognitiveservices.azure.com/.default | Provided scope of access token to Azure OpenAI services |
API_VERSIONS_MAPPING | {} |
The mapping of versions API for requests to Azure OpenAI API. Example: {"2023-03-15-preview": "2023-05-15", "": "2024-02-15-preview"} . An empty key sets the default api version for the case when the user didn't pass it in the request |
ELIMINATE_EMPTY_CHOICES | False | When enabled, the response stream is guaranteed to exclude chunks with an empty list of choices. This is useful when a DIAL client doesn't support such chunks. An empty list of choices can be generated by Azure OpenAI in at least two cases: (1) when the Content filter is not disabled, Azure includes prompt filter results in the first chunk with an empty list of choices; (2) when stream_options.include_usage is enabled, the last chunk contains usage data and an empty list of choices. |
CORE_API_VERSION | Supported value 0.6 to work with the old version of the DIAL File API |
Certain models support configuration via the $ADAPTER_HOSTNAME/openai/deployments/$DEPLOYMENT_NAME/configuration
endpoint.
GET request to this endpoint returns the schema of the model configuration in JSON Schema format.
Such models expect that custom_fields.configuration
field of the chat/completions
request will contain a JSON value that conforms to the schema.
The custom_fields.configuration
field is optional iff each field in the schema is optional too.
OpenAI image generation models accept configurations with parameters specific for image generation such as image size, style, and quality.
The latest supported parameters could be found in the official OpenAI documentation for models capable of image generation or in the Azure OpenAI API documentation.
Alternatively, the configuration schema could be retrieved programmatically from the /configuration
endpoint. However, keep in mind, that this schema could lag behind the official latest one. More on that in the Forward compatibility section.
An example of DALL-E 3 request with configured style and image size:
Request
{
"model": "dall-e-3",
"messages": [
{
"role": "user",
"content": "forest meadow"
}
],
"custom_fields": {
"configuration": {
"size": "1024x1024",
"style": "vivid"
}
}
}
Similarly, the configuration could be preset on the per-deployment basis in the DIAL Core config:
DIAL Core Config
{
"models": {
"dial-dall-e-3": {
"type": "chat",
"description": "...",
"endpoint": "...",
"defaults": {
"custom_fields": {
"configuration": {
"size": "1024x1024",
"style": "vivid"
}
}
}
}
}
}
So that the end user doesn't have to attach configuration to each chat completion request. It will be applied automatically (if missing) by the DIAL Core for all incoming requests to this deployment.
The configuration schema in the adapter isn't fixed and allows for extra fields and arbitrary parameter values. This enables forward compatibility with the future versions of the image generation API.
Let's say the next version of GPT Image model introduces support of a negative prompt. It still will be possible to use a version of OpenAI adapter that is ignorant of the latest developments in the GPT Image API thanks to the permissive configuration schema.
Request
{
"model": "gpt-image-1",
"messages": [
{
"role": "user",
"content": "forest meadow"
}
],
"custom_fields": {
"configuration": {
"negative_prompt": "trees"
}
}
}
The adapter supports multiple upstream definitions in the DIAL Core config:
{
"models": {
"gpt-4o-2024-11-20": {
"type": "chat",
"endpoint": "http://$OPENAI_ADAPTER_HOSTNAME/openai/deployments/gpt-4o-2024-11-20/chat/completions",
"displayName": "GPT-4o",
"upstreams": [
{
"endpoint": "https://$AZURE_OPENAI_SERVICE_NAME1.openai.azure.com/openai/deployments/gpt-4o-2024-11-20/chat/completions"
},
{
"endpoint": "https://$AZURE_OPENAI_SERVICE_NAME2.openai.azure.com/openai/deployments/gpt-4o-2024-11-20/chat/completions"
},
{
"endpoint": "https://$AZURE_OPENAI_SERVICE_NAME3.openai.azure.com/openai/deployments/gpt-4o-2024-11-20/chat/completions"
}
]
}
}
}
The prompt caching could be enabled via the autoCachingSupported
flag in the DIAL Core config.
{
"models": {
"gpt-4o-2024-11-20": {
"type": "chat",
"endpoint": "http://$OPENAI_ADAPTER_HOSTNAME/openai/deployments/gpt-4o-2024-11-20/chat/completions",
"displayName": "GPT-4o",
"upstreams": [
{
"endpoint": "https://$AZURE_OPENAI_SERVICE_NAME1.openai.azure.com/openai/deployments/gpt-4o-2024-11-20/chat/completions"
},
{
"endpoint": "https://$AZURE_OPENAI_SERVICE_NAME2.openai.azure.com/openai/deployments/gpt-4o-2024-11-20/chat/completions"
},
{
"endpoint": "https://$AZURE_OPENAI_SERVICE_NAME3.openai.azure.com/openai/deployments/gpt-4o-2024-11-20/chat/completions"
}
],
"features": {
"autoCachingSupported": true
}
}
}
}
Important
Check that the deployment does actually support prompt caching before enabling it in the config.
Run the linting before committing:
make lint
To auto-fix formatting issues run:
make format
Run unit tests locally:
make test
To remove the virtual environment and build artifacts:
make clean