Dedicated Endpoints

Pricing

Billing based on the GPU type, number of instances and the duration of instance services.

Get inference flavor

Replace {{API_TOKEN}} with your actual token.

Example Request:

curl --location 'https://api.netmind.ai/v1/inference-service/flavor' \
--header 'Authorization: Bearer {{API_TOKEN}}'

Example Response:

{
    "flavor_list": [
        {
            "flavor_id": "69475e82e81c4dd6be3467e2ca374e0c",
            "display_name": "NVIDIA_GeForce_RTX_4090",
            "cluster_id": "1",
            "cluster_flavor_id": "US_01_4090",
            "meta_info": {
                "cuda": "12.0",
                "region": "cn"
            },
            "billing": {
                "cny_price_unit": 2.2,
                "usd_price_unit": 0.3,
                "nmt_price_unit": 0.147164
            },
            "created_at": "2024-11-13 09:00:19",
            "updated_at": "2024-11-13 09:00:19",
            "deleted_at": null,
            "is_deleted": false,
            "available_num": 1,
            "node_max_gpu": 2
        },
        {
            "flavor_id": "7b2f36d30e0743debc6c60d5017e2d16",
            "display_name": "NVIDIA_GeForce_RTX_4090",
            "cluster_id": "1",
            "cluster_flavor_id": "US_01_4090",
            "meta_info": {
                "cuda": "12.0",
                "region": "other"
            },
            "billing": {
                "cny_price_unit": 2.2,
                "usd_price_unit": 0.3,
                "nmt_price_unit": 0.147164
            },
            "created_at": "2024-11-13 09:00:19",
            "updated_at": "2024-11-13 09:00:19",
            "deleted_at": null,
            "is_deleted": false,
            "available_num": 1,
            "node_max_gpu": 2
        }
    ]
}

Create endpoint

Replace {{API_TOKEN}} with your actual token.

Example Request:

curl --location 'https://api.netmind.ai/v1/inference-service' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer {{API_TOKEN}}' \
--data '{
    "name": "test-flask-server-9",
    "description": "test",
    "payment_type": "usd",
    "resource_metadata": {
        "flavor_id": "69475e82e81c4dd6be3467e2ca374e0c",
        "scale_type": "manual",
        "target_instance_number": 1
    },
    "deploy_metadata": {
        "image_url": "python:3.10-slim",
        "command": "pip install flask && apt update && apt install -y curl && curl -O https://raw.githubusercontent.com/huang-hf/share_data/refs/heads/main/app.py && python app.py",
        "port": 8080,
        "gpu_num": 0
    }
}'

Example Response:

{
    "service_id": "...",
    "name": "test-flask-server-9",
    "description": "test",
    "user_id": "...",
    "status": "initializing",
    "status_info": null,
    "resource_metadata": {
        "resource_display_name": "NVIDIA_GeForce_RTX_4090",
        "scale_type": "manual",
        "flavor_id": "69475e82e81c4dd6be3467e2ca374e0c",
        "target_instance_number": 1,
        "scale_policy": null,
        "VRAM": "32GB",
        "image_size": "12GB"
    },
    "billing_metadata": {
        ...
    },
    "endpoint_metadata": {
        ...
    },
    "deploy_metadata": {
        ...
    },
    "service_type": "normal",
    "created_at": "...",
    "updated_at": "...",
    "deleted_at": null,
    "is_deleted": false,
    "payment_type": "usd"
}

Get endpoint

Replace {{API_TOKEN}} with your actual token.

Replace {{INFERENCE_ID}} with "service_id" you got from previous step.

Example Request:

curl --location 'https://api.netmind.ai/v1/inference-service/{{INFERENCE_ID}}' \
--header 'Authorization: Bearer {{API_TOKEN}}'

Example Response:

{
    "service_id": "...",
    "name": "test-flask-server-9",
    "description": "test",
    "user_id": "...",
    "status": "available",
    "status_info": null,
    "resource_metadata": {
        "resource_display_name": "NVIDIA_GeForce_RTX_4090",
        "scale_type": "manual",
        "flavor_id": "69475e82e81c4dd6be3467e2ca374e0c",
        "target_instance_number": 1,
        "scale_policy": null,
        "VRAM": "32GB",
        "image_size": "12GB"
    },
    "billing_metadata": {
        ...
    },
    "endpoint_metadata": {
        ...
    },
    "deploy_metadata": {
        ...
    },
    "service_type": "normal",
    "created_at": "...",
    "updated_at": "...",
    "deleted_at": null,
    "is_deleted": false,
    "payment_type": "usd"
}

The value of "status" can be:

  • initializing: The status immediately after creating a new instance or redeploying a stopped instance. It means the instance is being initializing.

  • available: Typically follows the "Deploying" status. This indicates the instance is running normally, can be scaled, and the endpoint is accessible.

  • stopped: The instance is stopped, and the number of workers will be reduced to zero.

  • unavailable: The instance deployment or scaling failed due to an error, which may cause the endpoint to become inaccessible.

When the instance is in an "available" state, it can be accessed via "https://api.deeptrin.com/inference-api/v1/inference_service/{{INFERENCE_ID}}".

Update endpoint

Replace {{API_TOKEN}} with your actual token. Replace {{INFERENCE_ID}} with "service_id" you got from previous step.

Example Request:

curl --location --request PUT 'https://api.netmind.ai/v1/inference-service/{{INFERENCE_ID}}' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer {{API_TOKEN}}' \
--data '{
  "resource_metadata": {
    "target_instance_number": "2"
  }
}'

Example Response:

{
    "service_id": "...",
    "name": "test-flask-server-9",
    "description": "test",
    "user_id": "...",
    "status": "initializing",
    "status_info": null,
    "resource_metadata": {
        "resource_display_name": "NVIDIA_GeForce_RTX_4090",
        "scale_type": "manual",
        "flavor_id": "69475e82e81c4dd6be3467e2ca374e0c",
        "target_instance_number": 1,
        "scale_policy": null,
        "VRAM": "32GB",
        "image_size": "12GB"
    },
    "billing_metadata": {
        ...
    },
    "endpoint_metadata": {
        ...
    },
    "deploy_metadata": {
        ...
    },
    "service_type": "normal",
    "created_at": "...",
    "updated_at": "...",
    "deleted_at": null,
    "is_deleted": false,
    "payment_type": "usd"
}

Delete endpoint

Replace {{API_TOKEN}} with your actual token. Replace {{INFERENCE_ID}} with "service_id" you got from previous step.

Example Request:

curl --location --request DELETE 'https://api.netmind.ai/v1/inference-service/{{GENERATION_ID}}' \
--header 'Authorization: Bearer {{API_TOKEN}}'

Example Response:

Last updated

Was this helpful?