API Reference
Generate Response
Send a structured list of input messages with text, and the model will generate the next message in the conversation.
Parameters
The identifier of the language model to use for generating the response. This should correspond to a valid model name from your available models (e.g., Llama-3.1-8b, DeepSeek-R1-8b, etc.).
It can also be models you have created.
Message Structure
Models are trained to operate on alternating user and assistant conversational turns. When creating a new Message, you specify the prior conversational turns with the messages parameter, and the model then generates the next Message in the conversation.
Message
Each input message must be an object with a role and content. You can specify three types of roles:
- system - Sets instructions and context for the entire conversation. This is placed at the beginning of the messages array.
- user - Represents input from the end user
- assistant - Represents responses from the model
System Message
The system role allows you to provide context, instructions, and guidelines that influence the model's behavior throughout the conversation. System messages are optional but powerful for:
- Defining the assistant's role, personality, or expertise
- Setting response format requirements (e.g., "always respond in JSON")
- Providing domain-specific context or constraints
- Establishing behavioral guidelines
System messages are included in the messages array as the first message:
{
"model" : "your-model-name",
"messages" : [
{
"role" : "system",
"content" : "You are a helpful API documentation assistant. Provide clear, concise explanations with code examples."
},
{
"role" : "user",
"content" : "Why is the sky blue?"
}
]
}
Controls whether the response is returned as a complete message or streamed incrementally as it's generated.
- false - Returns the complete response only after generation is finished
- true - Streams response chunks as they're generated, allowing for real-time display
Controls the randomness and creativity of the model's responses. Lower values make output more focused and deterministic, while higher values increase randomness and creativity.
- 0.0 - Most deterministic, repeatable responses
- 1.0 - Balanced creativity and coherence (recommended for most use cases)
- 2.0 - Maximum randomness and creativity
Note: For tasks requiring consistency (like data extraction or classification), use lower values (0.0-0.3). For creative tasks (like brainstorming or storytelling), higher values (0.7-1.5) work better.
Also known as "nucleus sampling," this parameter controls the diversity of responses by limiting the model to consider only the most probable tokens whose cumulative probability reaches the specified threshold.
- 0.1 - Very focused, only highly probable tokens
- 0.5 - Moderately diverse output
- 1.0 - Considers all tokens based on their probability
Note: It's generally recommended to adjust either temperature OR top_p, but not both simultaneously. When top_p is less than 1.0, the model samples from the smallest set of tokens whose cumulative probability exceeds the threshold.
Output
GenerateResponse
This is the standard (non-streaming) response format returned by the Generate Response endpoint. It contains the generated message along with usage statistics.
The conversation ID that uniquely identifies this conversation session.
The generated message from the assistant. Contains the role (always "assistant") and the content (the generated text).
Message
Each input message must be an object with a role and content. You can specify three types of roles:
- system - Sets instructions and context for the entire conversation. This is placed at the beginning of the messages array.
- user - Represents input from the end user
- assistant - Represents responses from the model
Usage
Usage statistics for API requests, including token counts and processing duration.
curl http://localhost:45678/v1/generate_response \
-X POST \
-H "Authorization: Bearer $TIGER_API_KEY"\
-H 'Content-Type: application/json' \
-d "{
\"model\": \"Llama-3.1-8b\",
\"messages\": [
{
\"role\": \"system\",
\"content\": \"You are a usefull assistent, answer question briefly?\"
},
{
\"role\": \"user\",
\"content\": \"Why is the sky blue?\"
}
],
\"stream\": false,
\"temperature\": 0.7,
\"top_p\": 0.95
}"{
"id" : "7",
"message" : {
"content" : "The sky appears blue because of a phenomenon called Rayleigh scattering, where sunlight scatters off tiny molecules of gases in the atmosphere, such as nitrogen and oxygen.",
"role" : "assistant"
},
"usage" : {
"duration_ms" : 1297,
"input_tokens" : 33,
"output_tokens" : 33
}
}