NetMind Power Documentation
  • NetMind Account
  • Inference
    • Model APIs
    • Dedicated Endpoints
  • Fine-tuning
  • Rent GPUs
    • Container
      • Cloud Sync
      • Use Ngrok as Ingress Service
    • Virtual Machine
  • Rent Cluster (Comming soon)
  • API
    • API token
    • Files
    • Fine-tuning
      • List Models
      • Preparing your dataset
      • Create Job
      • Retrieve job
      • Download model
      • Cancel job
      • Deploy Checkpoint (coming soon)
    • Inference
      • Chat
      • Images
      • Haiper Inference
      • Asynchronous Inference
      • Dedicated Endpoints
      • Batch Processing
      • Embedding API
      • Deprecated Models
    • Rent GPU
      • SSH Authentication
      • List Available images
      • List Available GPU Instances
      • Create Your First Environment
      • Stop GPU instace
    • API Reference
      • Files
      • Fine-tuning
      • Rent GPU
Powered by GitBook
On this page
  • Pricing
  • Get inference flavor
  • Create endpoint
  • Get endpoint
  • Update endpoint
  • Delete endpoint

Was this helpful?

  1. API
  2. Inference

Dedicated Endpoints

Pricing

Billing based on the GPU type, number of instances and the duration of instance services.

Get inference flavor

Replace {{API_TOKEN}} with your actual token.

Example Request:

curl --location 'https://api.netmind.ai/v1/inference-service/flavor' \
--header 'Authorization: Bearer {{API_TOKEN}}'
import requests

url = "https://api.netmind.ai/v1/inference-service/flavor"

payload = {}
headers = {
  'Authorization': 'Bearer {{API_TOKEN}}'
}

response = requests.request("GET", url, headers=headers, data=payload)

print(response.text)

Example Response:

{
    "flavor_list": [
        {
            "flavor_id": "69475e82e81c4dd6be3467e2ca374e0c",
            "display_name": "NVIDIA_GeForce_RTX_4090",
            "cluster_id": "1",
            "cluster_flavor_id": "US_01_4090",
            "meta_info": {
                "cuda": "12.0",
                "region": "cn"
            },
            "billing": {
                "cny_price_unit": 2.2,
                "usd_price_unit": 0.3,
                "nmt_price_unit": 0.147164
            },
            "created_at": "2024-11-13 09:00:19",
            "updated_at": "2024-11-13 09:00:19",
            "deleted_at": null,
            "is_deleted": false,
            "available_num": 1,
            "node_max_gpu": 2
        },
        {
            "flavor_id": "7b2f36d30e0743debc6c60d5017e2d16",
            "display_name": "NVIDIA_GeForce_RTX_4090",
            "cluster_id": "1",
            "cluster_flavor_id": "US_01_4090",
            "meta_info": {
                "cuda": "12.0",
                "region": "other"
            },
            "billing": {
                "cny_price_unit": 2.2,
                "usd_price_unit": 0.3,
                "nmt_price_unit": 0.147164
            },
            "created_at": "2024-11-13 09:00:19",
            "updated_at": "2024-11-13 09:00:19",
            "deleted_at": null,
            "is_deleted": false,
            "available_num": 1,
            "node_max_gpu": 2
        }
    ]
}

Create endpoint

Replace {{API_TOKEN}} with your actual token.

Example Request:

curl --location 'https://api.netmind.ai/v1/inference-service' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer {{API_TOKEN}}' \
--data '{
    "name": "test-flask-server-9",
    "description": "test",
    "payment_type": "usd",
    "resource_metadata": {
        "flavor_id": "69475e82e81c4dd6be3467e2ca374e0c",
        "scale_type": "manual",
        "target_instance_number": 1
    },
    "deploy_metadata": {
        "image_url": "python:3.10-slim",
        "command": "pip install flask && apt update && apt install -y curl && curl -O https://raw.githubusercontent.com/huang-hf/share_data/refs/heads/main/app.py && python app.py",
        "port": 8080,
        "gpu_num": 0
    }
}'
import requests
import json

url = "https://api.netmind.ai/v1/inference-service"

payload = json.dumps({
  "name": "test-flask-server-9",
  "description": "test password-1",
  "payment_type": "usd",
  "resource_metadata": {
    "flavor_id": "69475e82e81c4dd6be3467e2ca374e0c",
    "scale_type": "manual",
    "target_instance_number": 1
  },
  "deploy_metadata": {
    "image_url": "python:3.10-slim",
    "command": "pip install flask && apt update && apt install -y curl && curl -O https://raw.githubusercontent.com/huang-hf/share_data/refs/heads/main/app.py && python app.py",
    "port": 8080,
    "gpu_num": 0
  }
})
headers = {
  'Content-Type': 'application/json',
  'Authorization': 'Bearer {{API_TOKEN}}'
}

response = requests.request("POST", url, headers=headers, data=payload)

print(response.text)

Example Response:

{
    "service_id": "...",
    "name": "test-flask-server-9",
    "description": "test",
    "user_id": "...",
    "status": "initializing",
    "status_info": null,
    "resource_metadata": {
        "resource_display_name": "NVIDIA_GeForce_RTX_4090",
        "scale_type": "manual",
        "flavor_id": "69475e82e81c4dd6be3467e2ca374e0c",
        "target_instance_number": 1,
        "scale_policy": null,
        "VRAM": "32GB",
        "image_size": "12GB"
    },
    "billing_metadata": {
        ...
    },
    "endpoint_metadata": {
        ...
    },
    "deploy_metadata": {
        ...
    },
    "service_type": "normal",
    "created_at": "...",
    "updated_at": "...",
    "deleted_at": null,
    "is_deleted": false,
    "payment_type": "usd"
}

Get endpoint

Replace {{API_TOKEN}} with your actual token.

Replace {{INFERENCE_ID}} with "service_id" you got from previous step.

Example Request:

curl --location 'https://api.netmind.ai/v1/inference-service/{{INFERENCE_ID}}' \
--header 'Authorization: Bearer {{API_TOKEN}}'
import requests

url = "https://api.netmind.ai/v1/inference-service/{{INFERENCE_ID}}"

payload = {}
headers = {
  'Authorization': 'Bearer {{API_TOKEN}}'
}

response = requests.request("GET", url, headers=headers, data=payload)

print(response.text)

Example Response:

{
    "service_id": "...",
    "name": "test-flask-server-9",
    "description": "test",
    "user_id": "...",
    "status": "available",
    "status_info": null,
    "resource_metadata": {
        "resource_display_name": "NVIDIA_GeForce_RTX_4090",
        "scale_type": "manual",
        "flavor_id": "69475e82e81c4dd6be3467e2ca374e0c",
        "target_instance_number": 1,
        "scale_policy": null,
        "VRAM": "32GB",
        "image_size": "12GB"
    },
    "billing_metadata": {
        ...
    },
    "endpoint_metadata": {
        ...
    },
    "deploy_metadata": {
        ...
    },
    "service_type": "normal",
    "created_at": "...",
    "updated_at": "...",
    "deleted_at": null,
    "is_deleted": false,
    "payment_type": "usd"
}

The value of "status" can be:

  • initializing: The status immediately after creating a new instance or redeploying a stopped instance. It means the instance is being initializing.

  • available: Typically follows the "Deploying" status. This indicates the instance is running normally, can be scaled, and the endpoint is accessible.

  • stopped: The instance is stopped, and the number of workers will be reduced to zero.

  • unavailable: The instance deployment or scaling failed due to an error, which may cause the endpoint to become inaccessible.

When the instance is in an "available" state, it can be accessed via "https://api.deeptrin.com/inference-api/v1/inference_service/{{INFERENCE_ID}}".

Update endpoint

Replace {{API_TOKEN}} with your actual token. Replace {{INFERENCE_ID}} with "service_id" you got from previous step.

Example Request:

curl --location --request PUT 'https://api.netmind.ai/v1/inference-service/{{INFERENCE_ID}}' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer {{API_TOKEN}}' \
--data '{
  "resource_metadata": {
    "target_instance_number": "2"
  }
}'
import requests
import json

url = "https://api.netmind.ai/v1/inference-service/{{INFERENCE_ID}}"

payload = json.dumps({
  "resource_metadata": {
    "target_instance_number": "2"
  }
})
headers = {
  'Content-Type': 'application/json',
  'Authorization': 'Bearer {{API_TOKEN}}'
}

response = requests.request("PUT", url, headers=headers, data=payload)

print(response.text)

Example Response:

{
    "service_id": "...",
    "name": "test-flask-server-9",
    "description": "test",
    "user_id": "...",
    "status": "initializing",
    "status_info": null,
    "resource_metadata": {
        "resource_display_name": "NVIDIA_GeForce_RTX_4090",
        "scale_type": "manual",
        "flavor_id": "69475e82e81c4dd6be3467e2ca374e0c",
        "target_instance_number": 1,
        "scale_policy": null,
        "VRAM": "32GB",
        "image_size": "12GB"
    },
    "billing_metadata": {
        ...
    },
    "endpoint_metadata": {
        ...
    },
    "deploy_metadata": {
        ...
    },
    "service_type": "normal",
    "created_at": "...",
    "updated_at": "...",
    "deleted_at": null,
    "is_deleted": false,
    "payment_type": "usd"
}

Delete endpoint

Replace {{API_TOKEN}} with your actual token. Replace {{INFERENCE_ID}} with "service_id" you got from previous step.

Example Request:

curl --location --request DELETE 'https://api.netmind.ai/v1/inference-service/{{GENERATION_ID}}' \
--header 'Authorization: Bearer {{API_TOKEN}}'
import requests

url = "https://api.netmind.ai/v1/inference-service/{{GENERATION_ID}}"

payload = {}
headers = {
  'Authorization': 'Bearer {{API_TOKEN}}'
}

response = requests.request("DELETE", url, headers=headers, data=payload)

print(response.text)

Example Response:

PreviousAsynchronous InferenceNextBatch Processing

Last updated 3 months ago

Was this helpful?