Model Inference API

Our Model Inference API is now available. Before running the examples, please ensure that you have created your API token. Below are examples demonstrating how to use CURL and Python to call the model for predictions.

If you want more model inference APIs and more detailed parameter descriptions, please visit the inference section.

Bart-cnn

Other text-based input models can also refer to this example.

CURL Example

export API_TOKEN=your_api_token
  -H "Authorization: $API_TOKEN" \
  -H "Content-Type: application/json" \
  -d $'{ 
      "input":"New York (CNN)When Liana Barrientos was 23 years old, she got married in Westchester County, New York.A year later, she got married again in Westchester County, but to a different man and without divorcing her first husband."  
}' \
  https://inference-api.netmind.ai/inference-api/v1/bart-large-cnn

Python Example

import requests

api_token = 'your_api_token'
headers = {
    'Authorization': api_token,
    'Content-Type': 'application/json'
}
data = {
    'input': 'New York (CNN)When Liana Barrientos was 23 years old, she got married in Westchester County, New York.A year later, she got married again in Westchester County, but to a different man and without divorcing her first husband.'
}

response = requests.post('https://inference-api.netmind.ai/inference-api/v1/bart-large-cnn', headers=headers, json=data)
print(response.json())

rmbg

Other file-based input and file-based output models can also refer to this example.

CURL Example

export API_TOKEN=your_api_token
curl -X POST \
  -H "Authorization: $API_TOKEN" \
  -H "Content-Type: application/json" \
  -d $'{
      "image":"https://i.postimg.cc/1XJMrxrT/giraffe.png"
}' \
  --output result.webp \
  https://inference-api.netmind.ai/inference-api/v1/rmbg-1-4

Python Example

import requests

api_token = 'your_api_token'
headers = {
    'Authorization': api_token,
    'Content-Type': 'application/json'
}
data = {
    'image': 'https://i.postimg.cc/1XJMrxrT/giraffe.png'
}

response = requests.post('https://inference-api.netmind.ai/inference-api/v1/rmbg-1-4', headers=headers, json=data)
if response.status_code == 200:
    with open('result.png', 'wb') as f:
        f.write(response.content)
else:
    print('Failed:', response.status_code, response.text)

Video-llava

Other file-based input models can also refer to this example.

video-llava supports two content types: application/json and multipart/form-data. We use different input types mainly to provide different file transfer methods. You can choose the type according to different situations. Generally, if your file is large, it is recommended to use application/json where you can directly transfer the file using its URL. If your file is small, it is recommended to use multipart/form-data. Using application/json (file URL) may provide faster data transmission performance. Below we provide examples for different content types.

CURL Example

multipart/form-data Example

export API_TOKEN=your_api_token
curl -X POST \
  -H "Authorization: $API_TOKEN" \
  -F 'video=@"/path/to/audio.mp4"' \
  -F 'inp="How many people are in the video?"' \
  https://inference-api.netmind.ai/inference-api/v1/video-llava

application/json Example

export API_TOKEN=your_api_token
curl -X POST \
  -H "Authorization: $API_TOKEN" \
  -H "Content-Type: application/json" \
  -d $'{
      "video": "https://inference-api-dev.netmind.ai/example_file/baby_laugh.mp4",
      "inp": "Why is this video funny?"
}' \
  https://inference-api.netmind.ai/inference-api/v1/video-llava

Python Example

multipart/form-data Example

import requests

api_token = 'your_api_token'
headers = {
    'Authorization': api_token,
}
files = {
    'video': open('/path/to/audio.mp4', 'rb')
}
data = {
    'inp': 'How many people are in the video?'
}

response = requests.post('https://inference-api.netmind.ai/inference-api/v1/video-llava', headers=headers, files=files, data=data)
print(response.json())

application/json Example

import requests

api_token = 'your_api_token'
headers = {
    'Authorization': api_token,
    'Content-Type': 'application/json'
}
data = {
    'video': 'https://inference-api-dev.netmind.ai/example_file/baby_laugh.mp4',
    'inp': 'Why is this video funny?'
}

response = requests.post('https://inference-api.netmind.ai/inference-api/v1/video-llava', headers=headers, json=data)
print(response.json())

llama3-70B

Other chat-based models can also refer to this example.

CURL Example

export API_TOKEN=your_api_token
curl -X POST \
  -H "Authorization: $API_TOKEN" \
  -H "Content-Type: application/json" \
  -d $'{ 
      "messages":[
        {
          "role": "system",
          "content": "You are a helpful assistant."
        },
        {
          "role": "user",
          "content": "Write a 100-word article on Benefits of Open-Source in AI research"
        }
      ],
      "max_new_tokens":1024,
      "temperature":0.6,
      "top_p":0.9,
      "top_k":50,
      "repetition_penalty":1.2  
}' \
  https://inference-api.netmind.ai/inference-api/v1/llama3-70B

Python Example

import requests

api_token = 'your_api_token'
headers = {
    'Authorization': api_token,
    'Content-Type': 'application/json'
}
data = {
    'messages': [
        {
            'role': 'system',
            'content': 'You are a helpful assistant.'
        },
        {
            'role': 'user',
            'content': 'Write a 100-word article on Benefits of Open-Source in AI research'
        }
    ],
    'max_new_tokens': 1024,
    'temperature': 0.6,
    'top_p': 0.9,
    'top_k': 50,
    'repetition_penalty': 1.2
}

response = requests.post('https://inference-api.netmind.ai/inference-api/v1/llama3-70B', headers=headers, json=data, stream=True)
for line in response.iter_lines():
    if line:
        decoded_line = line.decode('utf-8')
        if decoded_line.startswith('data:'):
            print(decoded_line[len('data:'):].strip())

Last updated