Dedicated Endpoints

Pricing

Billing based on the GPU type, number of instances and the duration of instance services.

Get inference flavor

Replace {{API_TOKEN}} with your actual token.

Example Request:

curl --location 'https://api.netmind.ai/v1/inference-service/flavor' \
--header 'Authorization: Bearer {{API_TOKEN}}'

import requests

url = "https://api.netmind.ai/v1/inference-service/flavor"

payload = {}
headers = {
  'Authorization': 'Bearer {{API_TOKEN}}'
}

response = requests.request("GET", url, headers=headers, data=payload)

print(response.text)

Example Response:

{
    "flavor_list": [
        {
            "flavor_id": "69475e82e81c4dd6be3467e2ca374e0c",
            "display_name": "NVIDIA_GeForce_RTX_4090",
            "cluster_id": "1",
            "cluster_flavor_id": "US_01_4090",
            "meta_info": {
                "cuda": "12.0",
                "region": "cn"
            },
            "billing": {
                "cny_price_unit": 2.2,
                "usd_price_unit": 0.3,
                "nmt_price_unit": 0.147164
            },
            "created_at": "2024-11-13 09:00:19",
            "updated_at": "2024-11-13 09:00:19",
            "deleted_at": null,
            "is_deleted": false,
            "available_num": 1,
            "node_max_gpu": 2
        },
        {
            "flavor_id": "7b2f36d30e0743debc6c60d5017e2d16",
            "display_name": "NVIDIA_GeForce_RTX_4090",
            "cluster_id": "1",
            "cluster_flavor_id": "US_01_4090",
            "meta_info": {
                "cuda": "12.0",
                "region": "other"
            },
            "billing": {
                "cny_price_unit": 2.2,
                "usd_price_unit": 0.3,
                "nmt_price_unit": 0.147164
            },
            "created_at": "2024-11-13 09:00:19",
            "updated_at": "2024-11-13 09:00:19",
            "deleted_at": null,
            "is_deleted": false,
            "available_num": 1,
            "node_max_gpu": 2
        }
    ]
}

Create endpoint

Replace {{API_TOKEN}} with your actual token.

Example Request:

curl --location 'https://api.netmind.ai/v1/inference-service' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer {{API_TOKEN}}' \
--data '{
    "name": "test-flask-server-9",
    "description": "test",
    "payment_type": "usd",
    "resource_metadata": {
        "flavor_id": "69475e82e81c4dd6be3467e2ca374e0c",
        "scale_type": "manual",
        "target_instance_number": 1
    },
    "deploy_metadata": {
        "image_url": "python:3.10-slim",
        "command": "pip install flask && apt update && apt install -y curl && curl -O https://raw.githubusercontent.com/huang-hf/share_data/refs/heads/main/app.py && python app.py",
        "port": 8080,
        "gpu_num": 0
    }
}'

import requests
import json

url = "https://api.netmind.ai/v1/inference-service"

payload = json.dumps({
  "name": "test-flask-server-9",
  "description": "test password-1",
  "payment_type": "usd",
  "resource_metadata": {
    "flavor_id": "69475e82e81c4dd6be3467e2ca374e0c",
    "scale_type": "manual",
    "target_instance_number": 1
  },
  "deploy_metadata": {
    "image_url": "python:3.10-slim",
    "command": "pip install flask && apt update && apt install -y curl && curl -O https://raw.githubusercontent.com/huang-hf/share_data/refs/heads/main/app.py && python app.py",
    "port": 8080,
    "gpu_num": 0
  }
})
headers = {
  'Content-Type': 'application/json',
  'Authorization': 'Bearer {{API_TOKEN}}'
}

response = requests.request("POST", url, headers=headers, data=payload)

print(response.text)

Example Response:

{
    "service_id": "...",
    "name": "test-flask-server-9",
    "description": "test",
    "user_id": "...",
    "status": "initializing",
    "status_info": null,
    "resource_metadata": {
        "resource_display_name": "NVIDIA_GeForce_RTX_4090",
        "scale_type": "manual",
        "flavor_id": "69475e82e81c4dd6be3467e2ca374e0c",
        "target_instance_number": 1,
        "scale_policy": null,
        "VRAM": "32GB",
        "image_size": "12GB"
    },
    "billing_metadata": {
        ...
    },
    "endpoint_metadata": {
        ...
    },
    "deploy_metadata": {
        ...
    },
    "service_type": "normal",
    "created_at": "...",
    "updated_at": "...",
    "deleted_at": null,
    "is_deleted": false,
    "payment_type": "usd"
}

Get endpoint

Replace {{API_TOKEN}} with your actual token.

Replace {{INFERENCE_ID}} with "service_id" you got from previous step.

Example Request:

curl --location 'https://api.netmind.ai/v1/inference-service/{{INFERENCE_ID}}' \
--header 'Authorization: Bearer {{API_TOKEN}}'

import requests

url = "https://api.netmind.ai/v1/inference-service/{{INFERENCE_ID}}"

payload = {}
headers = {
  'Authorization': 'Bearer {{API_TOKEN}}'
}

response = requests.request("GET", url, headers=headers, data=payload)

print(response.text)

Example Response:

{
    "service_id": "...",
    "name": "test-flask-server-9",
    "description": "test",
    "user_id": "...",
    "status": "available",
    "status_info": null,
    "resource_metadata": {
        "resource_display_name": "NVIDIA_GeForce_RTX_4090",
        "scale_type": "manual",
        "flavor_id": "69475e82e81c4dd6be3467e2ca374e0c",
        "target_instance_number": 1,
        "scale_policy": null,
        "VRAM": "32GB",
        "image_size": "12GB"
    },
    "billing_metadata": {
        ...
    },
    "endpoint_metadata": {
        ...
    },
    "deploy_metadata": {
        ...
    },
    "service_type": "normal",
    "created_at": "...",
    "updated_at": "...",
    "deleted_at": null,
    "is_deleted": false,
    "payment_type": "usd"
}

The value of "status" can be:

initializing: The status immediately after creating a new instance or redeploying a stopped instance. It means the instance is being initializing.
available: Typically follows the "Deploying" status. This indicates the instance is running normally, can be scaled, and the endpoint is accessible.
stopped: The instance is stopped, and the number of workers will be reduced to zero.
unavailable: The instance deployment or scaling failed due to an error, which may cause the endpoint to become inaccessible.

When the instance is in an "available" state, it can be accessed via "https://api.deeptrin.com/inference-api/v1/inference_service/{{INFERENCE_ID}}".

Update endpoint

Replace {{API_TOKEN}} with your actual token. Replace {{INFERENCE_ID}} with "service_id" you got from previous step.

Example Request:

curl --location --request PUT 'https://api.netmind.ai/v1/inference-service/{{INFERENCE_ID}}' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer {{API_TOKEN}}' \
--data '{
  "resource_metadata": {
    "target_instance_number": "2"
  }
}'

import requests
import json

url = "https://api.netmind.ai/v1/inference-service/{{INFERENCE_ID}}"

payload = json.dumps({
  "resource_metadata": {
    "target_instance_number": "2"
  }
})
headers = {
  'Content-Type': 'application/json',
  'Authorization': 'Bearer {{API_TOKEN}}'
}

response = requests.request("PUT", url, headers=headers, data=payload)

print(response.text)

Example Response:

{
    "service_id": "...",
    "name": "test-flask-server-9",
    "description": "test",
    "user_id": "...",
    "status": "initializing",
    "status_info": null,
    "resource_metadata": {
        "resource_display_name": "NVIDIA_GeForce_RTX_4090",
        "scale_type": "manual",
        "flavor_id": "69475e82e81c4dd6be3467e2ca374e0c",
        "target_instance_number": 1,
        "scale_policy": null,
        "VRAM": "32GB",
        "image_size": "12GB"
    },
    "billing_metadata": {
        ...
    },
    "endpoint_metadata": {
        ...
    },
    "deploy_metadata": {
        ...
    },
    "service_type": "normal",
    "created_at": "...",
    "updated_at": "...",
    "deleted_at": null,
    "is_deleted": false,
    "payment_type": "usd"
}

Delete endpoint

Replace {{API_TOKEN}} with your actual token. Replace {{INFERENCE_ID}} with "service_id" you got from previous step.

Example Request:

curl --location --request DELETE 'https://api.netmind.ai/v1/inference-service/{{GENERATION_ID}}' \
--header 'Authorization: Bearer {{API_TOKEN}}'

import requests

url = "https://api.netmind.ai/v1/inference-service/{{GENERATION_ID}}"

payload = {}
headers = {
  'Authorization': 'Bearer {{API_TOKEN}}'
}

response = requests.request("DELETE", url, headers=headers, data=payload)

print(response.text)

Example Response:

PreviousAsynchronous Inference NextBatch Processing

Last updated 5 months ago

Was this helpful?