NetMind Power Documentation
  • NetMind Account
  • Inference
    • Model APIs
    • Dedicated Endpoints
  • Fine-tuning
  • Rent GPUs
    • Cloud Sync
    • Use Ngrok as Ingress Service
  • Rent Cluster (Comming soon)
  • API
    • API token
    • Files
    • Fine-tuning
      • List Models
      • Preparing your dataset
      • Create Job
      • Retrieve job
      • Download model
      • Cancel job
      • Deploy Checkpoint (coming soon)
    • Inference
      • Chat
      • Images
      • Haiper Inference
      • Asynchronous Inference
      • Dedicated Endpoints
      • Batch Processing
      • Embedding API
      • Deprecated Models
    • Rent GPU
      • SSH Authentication
      • List Available images
      • List Available GPU Instances
      • Create Your First Environment
      • Stop GPU instace
    • API Reference
      • Files
      • Fine-tuning
      • Rent GPU
Powered by GitBook
On this page
  • Overview
  • Base URL
  • API Key
  • Supported Models
  • Supported APIs
  • Usage Examples
  • Python Client
  • cURL Client
  • Model Parameters
  • Supported Parameters
  • Migrating from OpenAI

Was this helpful?

  1. API
  2. Inference

Chat

PreviousInferenceNextImages

Last updated 2 months ago

Was this helpful?

Overview

Netmind provides compatibility with the OpenAI API standard, allowing for easier integration into existing applications. This API supports Chat Completion and Completion endpoints, both in streaming and regular modes.

Base URL

API Key

To use the API, you need to obtain a Netmind AI API Key. For detailed instructions, please refer to the .

Supported Models

You can find all the models supported by the platform on the "" page, for example:

  • meta-llama/Meta-Llama-3.1-8B-Instruct

  • meta-llama/Meta-Llama-3.3-70B-Instruct

  • ...

As AI models continue to evolve, we will regularly update the list of supported models. While some models may be removed, we will strive to handle the transition in a way that ensures compatibility for users already integrated with these model APIs. For detailed information on the transition process, please refer to .

Supported APIs

  1. Chat Completion (streaming and regular)

  2. Completion (streaming and regular)

Usage Examples

Python Client

First, install the OpenAI Python client:

pip install 'openai>=1.0.0'

Chat Completions API

from openai import OpenAI

client = OpenAI(
    base_url="https://api.netmind.ai/inference-api/openai/v1",
    api_key="<YOUR Netmind AI API Key>",
)

model = "meta-llama/Meta-Llama-3.1-8B-Instruct"
stream = True  # or False
max_tokens = 512

chat_completion_res = client.chat.completions.create(
    model=model,
    messages=[
        {
            "role": "system",
            "content": "Act like you are a helpful assistant.",
        },
        {
            "role": "user",
            "content": "Hi there!",
        }
    ],
    stream=stream,
    max_tokens=max_tokens,
)

if stream:
    for chunk in chat_completion_res:
        print(chunk.choices[0].delta.content or "", end="")
else:
    print(chat_completion_res.choices[0].message.content)

Completions API

from openai import OpenAI

client = OpenAI(
    base_url="https://api.netmind.ai/inference-api/openai/v1",
    api_key="<YOUR Netmind AI API Key>",
)

model = "meta-llama/Meta-Llama-3.1-8B-Instruct"
stream = True  # or False
max_tokens = 512

completion_res = client.completions.create(
    model=model,
    prompt="A chat between a curious user and an artificial intelligence assistant.\nYou are a cooking assistant.\nBe edgy in your cooking ideas.\nUSER: How do I make pasta?\nASSISTANT: First, boil water. Then, add pasta to the boiling water. Cook for 8-10 minutes or until al dente. Drain and serve!\nUSER: How do I make it better?\nASSISTANT:",
    stream=stream,
    max_tokens=max_tokens,
)

if stream:
    for chunk in completion_res:
        print(chunk.choices[0].text or "", end="")
else:
    print(completion_res.choices[0].text)

cURL Client

Chat Completions API

# Set your API key
export API_KEY="<YOUR Netmind AI API Key>"

curl "https://api.netmind.ai/inference-api/openai/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${API_KEY}" \
  -d $'{
    "model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
    "messages": [
        {
            "role": "system",
            "content": "Act like you are a helpful assistant."
        },
        {
            "role": "user",
            "content": "Hi there!"
        }
    ],
    "max_tokens": 512
}'

Completions API

# Set your API key
export API_KEY="<YOUR Netmind AI API Key>"

curl "https://api.netmind.ai/inference-api/openai/v1/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${API_KEY}" \
  -d $'{
    "model": "meta-llama/Meta-Llama-3.1-8B-Instruct",
    "prompt": "A chat between a curious user and an artificial intelligence assistant.\\n You are a cooking assistant.\\n Be edgy in your cooking ideas.\\n USER: How do I make pasta?\\n ASSISTANT: First, boil water. Then, add pasta to the boiling water. Cook for 8-10 minutes or until al dente. Drain and serve!\\n USER: How do I make it better?\\n ASSISTANT:",
    "max_tokens": 512
}'

Model Parameters

Please note that we are not yet 100% compatible with all OpenAI parameters. If you encounter any issues, you can start a discussion in our Discord server channel #issues.

Supported Parameters

  • messages: (ChatCompletion only) An array of message objects with roles (system, user, assistant) and content.

  • prompt: (Completion only) The prompt to generate completions for.

  • max_tokens: The maximum number of tokens to generate.

  • stream: If set to true, partial message deltas will be sent as they become available.

  • temperature: Controls randomness in output generation (0-2).

  • top_p: Alternative to temperature, controls diversity via nucleus sampling.

  • stop: Up to 4 sequences where the API will stop generating further tokens.

  • n: Number of chat completion choices to generate for each input message.

  • presence_penalty: Penalizes new tokens based on their presence in the generated text so far.

  • frequency_penalty: Penalizes new tokens based on their frequency in the generated text so far.

  • repetition_penalty: Penalizes new tokens based on their appearance in the prompt and generated text.

  • logit_bias: Modifies the likelihood of specified tokens appearing in the output.

Migrating from OpenAI

If you're already using OpenAI's chat completion endpoint, you can easily switch to Netmind by:

  1. Setting the base URL to https://api.netmind.ai/inference-api/openai/v1

  2. Obtaining and setting your Netmind AI API Key

  3. Updating the model name according to your needs

model: Specify the model to use. Find all supported models .

For more information or support, please visit our or join our .

https://api.netmind.ai/inference-api/openai/v1
authentication documentation
Model APIs
this
here
website
Discord server