NetMind Power Documentation
  • NetMind Account
  • Inference
    • Model APIs
    • Dedicated Endpoints
  • Fine-tuning
  • Rent GPUs
    • Container
      • Cloud Sync
      • Use Ngrok as Ingress Service
    • Virtual Machine
  • Rent Cluster (Comming soon)
  • API
    • API token
    • Files
    • Fine-tuning
      • List Models
      • Preparing your dataset
      • Create Job
      • Retrieve job
      • Download model
      • Cancel job
      • Deploy Checkpoint (coming soon)
    • Inference
      • Chat
      • Images
      • Haiper Inference
      • Asynchronous Inference
      • Dedicated Endpoints
      • Batch Processing
      • Embedding API
      • Deprecated Models
    • Rent GPU
      • SSH Authentication
      • List Available images
      • List Available GPU Instances
      • Create Your First Environment
      • Stop GPU instace
    • API Reference
      • Files
      • Fine-tuning
      • Rent GPU
Powered by GitBook
On this page
  • Supported Models
  • /v1/chat/completions
  • /v1/embeddings
  • Preparing Your Batch File
  • Uploading Your Batch File
  • Creating the Batch
  • Curl Example
  • Python Example
  • Example response
  • Checking the Status of a Batch
  • Curl Example
  • Python Example
  • Retrieving the Results
  • Canceling the Batch
  • Curl Example
  • Python Example
  • Getting a List of All Batches
  • Curl Example
  • Python Example

Was this helpful?

  1. API
  2. Inference

Batch Processing

Our Batch API is compatible with OpenAI. And It will save you 50% of the cost compared to synchronous interfaces.

Supported Models

/v1/chat/completions

  • meta-llama/Meta-Llama-3.1-8B-Instruct

  • meta-llama/Llama-3.3-70B-Instruct

  • google/gemma-2-27b-it

  • google/gemma-2-9b-it * Qwen/Qwen2.5-7B-Instruct

/v1/embeddings

coming soon

Preparing Your Batch File

Batches start with a .jsonl file where each line contains the details of an individual request to the API. For now, the available endpoints are /v1/chat/completions (Chat Completions API), /v1/embeddings (Embeddings API) is not supported now. For a given input file, the parameters in each line's body field are the same as the parameters for the underlying endpoint. Each request must include a unique custom_id value, which you can use to reference results after completion. Here's an example of an input file with 2 requests. Note that each input file can only include requests to a single model.

{"custom_id": "request-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-3.5-turbo-0125", "messages": [{"role": "system", "content": "You are a helpful assistant."},{"role": "user", "content": "Hello world!"}],"max_tokens": 1000}}
{"custom_id": "request-2", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-3.5-turbo-0125", "messages": [{"role": "system", "content": "You are an unhelpful assistant."},{"role": "user", "content": "Hello world!"}],"max_tokens": 1000}}

Uploading Your Batch File

Creating the Batch

Once you've successfully uploaded your input file, you can use the input File object's ID to create a batch. In this case, let's assume the file ID is file-123456. For now, the completion window can only be set to 24h. You can also provide custom metadata via an optional metadata parameter.

Curl Example

export API_TOKEN={{API_TOKEN}}
curl -X POST 'https://api.netmind.ai/inference-api/openai/v1/batches' \
--header 'Authorization: $API_TOKEN' \
--header 'Content-Type: application/json' \
--data-raw '{
    "input_file_id": "file-123456",
    "endpoint": "/v1/chat/completions",
    "completion_window": "24h"
}'

Python Example

from openai import OpenAI

client = OpenAI(
    base_url="https://api.netmind.ai/inference-api/openai/v1",
    api_key={{API_TOKEN}},
)

batch_input_file_id = "file-123456"
client.batches.create(
    input_file_id=batch_input_file_id,
    endpoint="/v1/chat/completions",
    completion_window="24h",
    metadata={
        "description": "nightly eval job"
    }
)

Example response

{
    "id": "batch_123456",
    "object": "batch",
    "endpoint": "",
    "errors": null,
    "input_file_id": null,
    "completion_window": "",
    "status": "pending",
    "output_file_id": null,
    "error_file_id": null,
    "created_at": null,
    "in_progress_at": null,
    "expires_at": null,
    "finalizing_at": null,
    "completed_at": null,
    "failed_at": null,
    "expired_at": null,
    "cancelling_at": null,
    "cancelled_at": null,
    "request_counts": {
        "total": 0,
        "completed": 0,
        "failed": 0
    },
    "metadata": null
}

Checking the Status of a Batch

You can check the status of a batch at any time, which will also return a Batch object.

Curl Example

export API_TOKEN={{API_TOKEN}}
curl -X GET 'https://api.netmind.ai/inference-api/openai/v1/batches/{{BATCH_ID}}' \
--header 'Authorization: $API_TOKEN'

Python Example

from openai import OpenAI

client = OpenAI(
    base_url="https://api.netmind.ai/inference-api/openai/v1",
    api_key={{API_TOKEN}},
)

batch_id = "{{BATCH_ID}}"
batch = client.batches.retrieve(batch_id)
print(batch)

Retrieving the Results

Canceling the Batch

If necessary, you can cancel an ongoing batch. The batch's status will change to cancelling until in-flight requests are complete (up to 10 minutes), after which the status will change to cancelled.

Curl Example

export API_TOKEN={{API_TOKEN}}
curl -X POST 'https://api.netmind.ai/inference-api/openai/v1/batches/{{BATCH_ID}}/cancel' \
--header 'Authorization: $API_TOKEN'

Python Example

from openai import OpenAI

client = OpenAI(
    base_url="https://api.netmind.ai/inference-api/openai/v1",
    api_key={{API_TOKEN}},
)

batch_id = "{{BATCH_ID}}"
client.batches.cancel(batch_id)

Getting a List of All Batches

At any time, you can see all your batches. For users with many batches, you can use the limit and after parameters to paginate your results.

Curl Example

export API_TOKEN={{API_TOKEN}}
curl -X GET 'https://api.netmind.ai/inference-api/openai/v1/batches?limit=10' \
--header 'Authorization: $API_TOKEN' \

Python Example

from openai import OpenAI

client = OpenAI(
    base_url="https://api.netmind.ai/inference-api/openai/v1",
    api_key={{API_TOKEN}},
)

batches = client.batches.list(limit=10)
print(batches)
PreviousDedicated EndpointsNextEmbedding API

Last updated 4 months ago

Was this helpful?

You need to use the Netmind platform's File API to create files.You can refer to for more information.

You need to use the Netmind platform's File API to get file content.You can refer to for more information. Reulst file id in batch.output_file_id.

File API
File API