Generate Response

POST
v1/generate_response

Send a structured list of input messages with text, and the model will generate the next message in the conversation.

Parameters


model:required string

The identifier of the language model to use for generating the response. This should correspond to a valid model name from your available models (e.g., Llama-3.1-8b, DeepSeek-R1-8b, etc.).

It can also be models you have created.


messages:required Array of Message

Message Structure

Models are trained to operate on alternating user and assistant conversational turns. When creating a new Message, you specify the prior conversational turns with the messages parameter, and the model then generates the next Message in the conversation.

Message

Each input message must be an object with a role and content. You can specify three types of roles:

  • system - Sets instructions and context for the entire conversation. This is placed at the beginning of the messages array.
  • user - Represents input from the end user
  • assistant - Represents responses from the model

System Message

The system role allows you to provide context, instructions, and guidelines that influence the model's behavior throughout the conversation. System messages are optional but powerful for:

  • Defining the assistant's role, personality, or expertise
  • Setting response format requirements (e.g., "always respond in JSON")
  • Providing domain-specific context or constraints
  • Establishing behavioral guidelines

System messages are included in the messages array as the first message:

{
  "model"    : "your-model-name",
  "messages" : [
    {
      "role"    : "system",
      "content" : "You are a helpful API documentation assistant. Provide clear, concise explanations with code examples."
    },
    {
      "role"    : "user",
      "content" : "Why is the sky blue?"
    }
  ]
}
Message members

stream:optional boolean

Controls whether the response is returned as a complete message or streamed incrementally as it's generated.

  • false - Returns the complete response only after generation is finished
  • true - Streams response chunks as they're generated, allowing for real-time display

temperature:optional number

Controls the randomness and creativity of the model's responses. Lower values make output more focused and deterministic, while higher values increase randomness and creativity.

  • 0.0 - Most deterministic, repeatable responses
  • 1.0 - Balanced creativity and coherence (recommended for most use cases)
  • 2.0 - Maximum randomness and creativity

Note: For tasks requiring consistency (like data extraction or classification), use lower values (0.0-0.3). For creative tasks (like brainstorming or storytelling), higher values (0.7-1.5) work better.


top_p:optional number

Also known as "nucleus sampling," this parameter controls the diversity of responses by limiting the model to consider only the most probable tokens whose cumulative probability reaches the specified threshold.

  • 0.1 - Very focused, only highly probable tokens
  • 0.5 - Moderately diverse output
  • 1.0 - Considers all tokens based on their probability

Note: It's generally recommended to adjust either temperature OR top_p, but not both simultaneously. When top_p is less than 1.0, the model samples from the smallest set of tokens whose cumulative probability exceeds the threshold.

Output


GenerateResponse

This is the standard (non-streaming) response format returned by the Generate Response endpoint. It contains the generated message along with usage statistics.


id:required string

The conversation ID that uniquely identifies this conversation session.


message:required Message

The generated message from the assistant. Contains the role (always "assistant") and the content (the generated text).

Message

Each input message must be an object with a role and content. You can specify three types of roles:

  • system - Sets instructions and context for the entire conversation. This is placed at the beginning of the messages array.
  • user - Represents input from the end user
  • assistant - Represents responses from the model
Message members

usage:required Usage

Usage

Usage statistics for API requests, including token counts and processing duration.

Usage members

Generate Response (standard)

curl http://localhost:45678/v1/generate_response \
    -X POST \
    -H "Authorization: Bearer $TIGER_API_KEY"\
    -H 'Content-Type: application/json' \
    -d "{
  \"model\": \"Llama-3.1-8b\",
  \"messages\": [
    {
      \"role\": \"system\",
      \"content\": \"You are a usefull assistent, answer question briefly?\"
    },
    {
      \"role\": \"user\",
      \"content\": \"Why is the sky blue?\"
    }
  ],
  \"stream\": false,
  \"temperature\": 0.7,
  \"top_p\": 0.95
}"

Response (standard)

{
  "id"      : "7",
  "message" : {
    "content" : "The sky appears blue because of a phenomenon called Rayleigh scattering, where sunlight scatters off tiny molecules of gases in the atmosphere, such as nitrogen and oxygen.",
    "role"    : "assistant"
  },
  "usage"   : {
    "duration_ms"   : 1297,
    "input_tokens"  : 33,
    "output_tokens" : 33
  }
}