Model Parameters

Model parameters control how an LLM generates text — including creativity, output length, and sampling strategy. Tuning these parameters appropriately can significantly improve AI output quality and make it better suited to your use case.

When to Use

The default output is too random or too rigid and you want to adjust creativity
Responses are frequently truncated and you need to increase the output length limit
You are using a reasoning model and want to control the depth of thinking
You need different parameter combinations for different task scenarios (creative writing vs. code generation)

Where to Configure Parameters

Elftia provides multiple levels of parameter configuration, with the following priority (highest to lowest):

Session-level parameters (temporary adjustments in the chat window)
    |
    v  overrides
Global model parameters (Settings → Model Parameters)
    |
    v  overrides
Provider default parameters (defaultSettings in the provider configuration)
    |
    v  overrides
System defaults

Global Model Parameters

Open Settings → Model Parameters
Adjust the value of each parameter
Use the Enable/Disable toggle next to each parameter to control whether it applies

Session-Level Parameters

In the chat UI, click the Parameters button next to the model selection area
Temporarily adjust parameters for the current session
These adjustments apply only to the current session

Core Parameters Explained

temperature

Property	Value
Range	0 – 2
Default	0.7
Global control	Yes

Purpose: Controls the randomness and creativity of the output.

Low temperature (0 – 0.3): Output is more deterministic and consistent — suitable for code generation, factual Q&A, data extraction, and other precision-demanding scenarios
Mid temperature (0.4 – 0.8): Balances creativity and consistency — suitable for everyday conversation and documentation writing
High temperature (0.9 – 2.0): Output is more random and creative — suitable for creative writing and brainstorming

Recommended settings:

Scenario	Recommended value
Code generation	0 – 0.2
Technical documentation	0.3 – 0.5
Everyday conversation	0.6 – 0.8
Creative writing	0.8 – 1.2
Brainstorming	1.0 – 1.5

max_tokens

Property	Value
Range	1 – (model limit)
Default	4096
Global control	Yes

Purpose: Limits the maximum length of the model's reply in a single response (measured in tokens — roughly 1–2 tokens per Chinese character, ~1 token per English word).

If a reply is truncated mid-sentence and a "maximum length reached" message appears, max_tokens is set too low
Setting a very high value does not force the model to always produce long responses — the model stops naturally when it considers the answer complete
Different models have different limits; values above the model's maximum are automatically clamped

Common model output limits:

Model	Max output tokens
GPT-4o	16,384
GPT-5	65,536
Claude Sonnet 4.5	65,536
Claude Haiku 4.5	65,536
Gemini 3 Flash	65,536
Qwen3 Max	65,536
DeepSeek V3	8,192

top_p

Property	Value
Range	0 – 1
Default	1.0
Global control	Yes

Purpose: Nucleus Sampling is another way to control randomness. When generating each token, the model samples only from the candidates whose cumulative probability reaches top_p.

top_p = 1.0: All candidate tokens are considered (no filtering)
top_p = 0.9: Samples only from the highest-probability tokens that together account for 90% of the probability mass
top_p = 0.1: Samples only from the highest-probability tokens that together account for 10% — output is very deterministic

Note: It is generally recommended to adjust only one of temperature or top_p at a time; modifying both significantly can produce unpredictable effects.

Thinking Budget

Property	Value
Options	none / low / medium / high
Default	low
Global control	Yes

Purpose: Controls the depth of "thinking" performed by reasoning models (such as Claude Sonnet 4.5, Gemini 3 Flash, DeepSeek R1, and other reasoning-capable models) before they reply.

Level	Description	Use case
none	Reasoning disabled; the model replies directly	Simple Q&A, casual chat
low	Light thinking	Everyday tasks — balances speed and quality
medium	Moderate thinking	Complex problems requiring some reasoning
high	Deep thinking with maximum token budget	Mathematical proofs, complex code, in-depth analysis

The thinking budget only takes effect for models marked reasoning: true. For models that do not support reasoning, this setting is ignored.

reasoning_effort

This parameter is specific to OpenAI's o-series models (e.g. o1, o3, o4-mini).

Property	Value
Options	low / medium / high
Default	medium

low: Fast response, suitable for simple questions
medium: Balanced mode
high: Deep reasoning, suitable for complex tasks

Tool Max Turns

Property	Value
Range	1 – 50
Default	5
Global control	Yes

Purpose: Limits the maximum number of MCP tool-call iterations in Agent mode. When the model calls tools consecutively within a single conversation, the process is forcibly stopped and the current result is returned once this limit is reached.

Configuration Reference

Global Model Parameters

Parameter	Type	Default	Range	Enable/Disable
temperature	Number	0.7	0 – 2	Enabled by default
topP	Number	1.0	0 – 1	Disabled by default
maxTokens	Number	4096	1 – (model limit)	Disabled by default
defaultThinkingBudget	Enum	`low`	none/low/medium/high	Always active
toolMaxTurns	Number	5	1 – 50	Always active

Provider Default Parameters

Parameter	Type	System default	Description
temperature	Number	0.7	Provider-level default temperature
topP	Number	1	Provider-level default top_p
maxTokens	Number	4096	Provider-level default max output
stream	Boolean	`true`	Whether to use streaming
presencePenalty	Number	(not set)	Presence penalty (–2 to 2)
frequencyPenalty	Number	(not set)	Frequency penalty (–2 to 2)
stop	String array	(not set)	Stop sequences
seed	Number	(not set)	Random seed (for reproducible output)
jsonMode	Boolean	`false`	Whether to force JSON-formatted output

Default Parameter Differences by Provider

Different provider templates ship with different default parameters:

Provider	temperature	maxTokens	topP	Other
OpenAI	0.7	4,096	--	--
Anthropic	0.7	65,536	--	--
Google Gemini	0.7	65,536	0.95	--
System default	0.7	4,096	1	stream: true

Behavior Notes

Parameter Priority

When the same parameter is set at multiple levels, the priority is:

Session-level parameters (temporarily adjusted during chat) — highest priority
Global model parameters (configured in the settings page and marked as "enabled")
Provider default parameters (defaultSettings in the provider configuration)
System defaults (temperature: 0.7, topP: 1, maxTokens: 4096)

Global model parameters marked as "disabled" do not override provider default values.

Parameter Compatibility with Models

Not all parameters are supported by all models:

Parameter	OpenAI	Anthropic	Gemini	Local models
temperature	Supported	Supported	Supported	Supported
maxTokens	Supported	Supported	Supported	Supported
topP	Supported	Supported	Supported	Supported
presencePenalty	Supported	Not supported	Not supported	Partially supported
frequencyPenalty	Supported	Not supported	Not supported	Partially supported
seed	Supported	Not supported	Not supported	Partially supported
thinking	Partial (o-series)	Supported	Supported	Not supported

Unsupported parameters are automatically stripped by the Transformer before the request is sent, so no error is produced.

Transformer Impact on Parameters

Elftia's Transformer system adapts parameters before sending a request:

Transformer	Parameter handling
`anthropic`	Maps `max_tokens` to Anthropic's field format
`gemini`	Converts parameters to Gemini's `generationConfig` format
`sampling`	Normalizes sampling parameters such as `temperature` and `top_p`
`maxtoken`	Forces a specific `max_tokens` value (overrides user setting)
`maxcompletiontokens`	Converts `max_tokens` to `max_completion_tokens` (required by some models)
`reasoning`	Handles the `reasoning_content` field for reasoning models
`forcereasoning`	Forces reasoning mode on (ignores the user's thinking budget setting)

Model Routing

Elftia supports configuring different models for different task types:

Routing role	Description	Where to configure
Default model	Primary chat model	Model selection in the chat UI
Background model	Used for lightweight tasks (summarization, title generation, etc.)	Settings → Agent Default Models
Vision model	Handles messages containing images (when the default model does not support vision)	Settings → Agent Default Models
Reasoning model	Tasks requiring deep thinking	Controlled via the thinking budget level

Follow Provider: When this option is enabled, the background model and vision model automatically use the corresponding models from the same provider as the current default model. For example, if you use Zhipu's GLM-5 as the default model, the background model will automatically use GLM-4.5 Air and the vision model will use GLM-4.6V.

Troubleshooting

Issue	Possible cause	Solution
Responses are truncated	`max_tokens` is set too low	Increase the `max_tokens` value, or enable it in global parameters and set a larger value
Responses are too random or irrelevant	`temperature` is too high	Lower the temperature (0.3–0.7 recommended)
Responses are too rigid or repetitive	`temperature` is too low	Raise the temperature slightly (0.5–0.8 recommended)
Reasoning model does not show chain-of-thought	`thinking budget` is set to none	Set `defaultThinkingBudget` to `low` or higher
Reasoning model thinks for too long	`thinking budget` is set to high	Lower it to `medium` or `low`
Parameter settings have no effect	Overridden by a higher-priority setting	Check whether session-level parameters are overriding; confirm the global parameter enable switch is on
Request returns a parameter error	The model does not support a particular parameter	Check parameter compatibility with the model; disable incompatible parameters
Tool calls stop mid-conversation	`toolMaxTurns` limit reached	Increase the `toolMaxTurns` value (note: this may increase API costs)
Large performance differences between providers	Provider default parameters differ	Enable global parameters and set key parameters to a consistent value

LLM Providers Overview - Understand the provider system and format transformers
Adding a Provider - Set default parameters when configuring a provider
API Key Pools - Multi-key management and load balancing
Custom Endpoints - Parameter compatibility notes for local models

When to Use​

Where to Configure Parameters​

Global Model Parameters​

Session-Level Parameters​

Core Parameters Explained​

temperature​

max_tokens​

top_p​

Thinking Budget​

reasoning_effort​

Tool Max Turns​

Configuration Reference​

Global Model Parameters​

Provider Default Parameters​

Default Parameter Differences by Provider​

Behavior Notes​

Parameter Priority​

Parameter Compatibility with Models​

Transformer Impact on Parameters​

Model Routing​

Troubleshooting​

Related Pages​

When to Use

Where to Configure Parameters

Global Model Parameters

Session-Level Parameters

Core Parameters Explained

temperature

max_tokens

top_p

Thinking Budget

reasoning_effort

Tool Max Turns

Configuration Reference

Global Model Parameters

Provider Default Parameters

Default Parameter Differences by Provider

Behavior Notes

Parameter Priority

Parameter Compatibility with Models

Transformer Impact on Parameters

Model Routing

Troubleshooting

Related Pages