Batch Processing

Our Batch API is compatible with OpenAI. And It will save you 50% of the cost compared to synchronous interfaces.

Supported Models

/v1/chat/completions

  • meta-llama/Meta-Llama-3.1-8B-Instruct

  • meta-llama/Llama-3.3-70B-Instruct

  • google/gemma-2-27b-it

  • google/gemma-2-9b-it * Qwen/Qwen2.5-7B-Instruct

/v1/embeddings

coming soon

Preparing Your Batch File

Batches start with a .jsonl file where each line contains the details of an individual request to the API. For now, the available endpoints are /v1/chat/completions (Chat Completions API), /v1/embeddings (Embeddings API) is not supported now. For a given input file, the parameters in each line's body field are the same as the parameters for the underlying endpoint. Each request must include a unique custom_id value, which you can use to reference results after completion. Here's an example of an input file with 2 requests. Note that each input file can only include requests to a single model.

{"custom_id": "request-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-3.5-turbo-0125", "messages": [{"role": "system", "content": "You are a helpful assistant."},{"role": "user", "content": "Hello world!"}],"max_tokens": 1000}}
{"custom_id": "request-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-3.5-turbo-0125", "messages": [{"role": "system", "content": "You are an unhelpful assistant."},{"role": "user", "content": "Hello world!"}],"max_tokens": 1000}}

Uploading Your Batch File

You need to use the Netmind platform's File API to create files.You can refer to File API for more information.

Creating the Batch

Once you've successfully uploaded your input file, you can use the input File object's ID to create a batch. In this case, let's assume the file ID is file-123456. For now, the completion window can only be set to 24h. You can also provide custom metadata via an optional metadata parameter.

Curl Example

export API_TOKEN={{API_TOKEN}}
curl -X POST 'https://api.netmind.ai/inference-api/openai/v1/batches' \
--header 'Authorization: $API_TOKEN' \
--header 'Content-Type: application/json' \
--data-raw '{
    "input_file_id": "file-123456",
    "endpoint": "/v1/chat/completions",
    "completion_window": "24h"
}'

Python Example

from openai import OpenAI

client = OpenAI(
    base_url="https://api.netmind.ai/inference-api/openai/v1",
    api_key={{API_TOKEN}},
)

batch_input_file_id = "file-123456"
client.batches.create(
    input_file_id=batch_input_file_id,
    endpoint="/v1/chat/completions",
    completion_window="24h",
    metadata={
        "description": "nightly eval job"
    }
)

Example response

{
    "id": "batch_123456",
    "object": "batch",
    "endpoint": "",
    "errors": null,
    "input_file_id": null,
    "completion_window": "",
    "status": "pending",
    "output_file_id": null,
    "error_file_id": null,
    "created_at": null,
    "in_progress_at": null,
    "expires_at": null,
    "finalizing_at": null,
    "completed_at": null,
    "failed_at": null,
    "expired_at": null,
    "cancelling_at": null,
    "cancelled_at": null,
    "request_counts": {
        "total": 0,
        "completed": 0,
        "failed": 0
    },
    "metadata": null
}

Checking the Status of a Batch

You can check the status of a batch at any time, which will also return a Batch object.

Curl Example

export API_TOKEN={{API_TOKEN}}
curl -X GET 'https://api.netmind.ai/inference-api/openai/v1/batches/{{BATCH_ID}}' \
--header 'Authorization: $API_TOKEN'

Python Example

from openai import OpenAI

client = OpenAI(
    base_url="https://api.netmind.ai/inference-api/openai/v1",
    api_key={{API_TOKEN}},
)

batch_id = "{{BATCH_ID}}"
batch = client.batches.retrieve(batch_id)
print(batch)

Retrieving the Results

You need to use the Netmind platform's File API to get file content.You can refer to File API for more information. Reulst file id in batch.output_file_id.

Canceling the Batch

If necessary, you can cancel an ongoing batch. The batch's status will change to cancelling until in-flight requests are complete (up to 10 minutes), after which the status will change to cancelled.

Curl Example

export API_TOKEN={{API_TOKEN}}
curl -X POST 'https://api.netmind.ai/inference-api/openai/v1/batches/{{BATCH_ID}}/cancel' \
--header 'Authorization: $API_TOKEN'

Python Example

from openai import OpenAI

client = OpenAI(
    base_url="https://api.netmind.ai/inference-api/openai/v1",
    api_key={{API_TOKEN}},
)

batch_id = "{{BATCH_ID}}"
client.batches.cancel(batch_id)

Getting a List of All Batches

At any time, you can see all your batches. For users with many batches, you can use the limit and after parameters to paginate your results.

Curl Example

export API_TOKEN={{API_TOKEN}}
curl -X GET 'https://api.netmind.ai/inference-api/openai/v1/batches?limit=10' \
--header 'Authorization: $API_TOKEN' \

Python Example

from openai import OpenAI

client = OpenAI(
    base_url="https://api.netmind.ai/inference-api/openai/v1",
    api_key={{API_TOKEN}},
)

batches = client.batches.list(limit=10)
print(batches)

Last updated