Security Settings

Elftia provides multi-layer security protection mechanisms to protect your system from potential risks of Agent tool calls and external message injection. This page details all security-related configuration options.

Security Architecture Overview

Elftia's security system includes the following core components:

Component	Location	Function
GuardianAgent	Agent execution layer	Uses AI to review the security of tool calls
PromptGuardian	Channel message layer	Detects prompt injection attacks in external messages
Permission Mode	Agent interaction layer	Controls whether users need to confirm tool execution
InputSanitizer	Channel message layer	Rule-based input filtering (regex matching)
AuditLogger	Global	Records audit logs for all security events

GuardianAgent

GuardianAgent is an AI-based security review module. When an Agent executes a tool call, GuardianAgent first uses an independent LLM to evaluate the risk level of the operation, and decides whether to allow execution based on configuration.

Four Modes

Mode	Behavior	Recommended Scenario
Off	Completely disabled, zero performance overhead	Personal use, trust Agent behavior
Monitor	Reviews sensitive tool calls, logs only without blocking	Want to understand Agent behavior but avoid interruptions
Guard	Reviews sensitive tool calls, blocks high-risk and critical operations	Daily use, balance security and convenience
Strict	Reviews all tool calls, blocks medium and above-risk operations	Shared environment, maximum security requirements

Risk Levels

GuardianAgent categorizes tool calls into five risk levels:

Risk Level	Meaning	Example
None	Completely safe	Read files within project, list directories
Low	Minor risk	Write project files within workspace
Medium	Medium risk	Execute Shell commands that modify system state, install npm packages
High	Significant risk	Access sensitive paths, operate files outside workspace, network operations
Critical	Immediate danger	Recursive delete commands, data theft, privilege escalation

Blocking Behavior by Mode

Risk Level	Off	Monitor	Guard	Strict
none	Allow	Allow	Allow	Allow
low	Allow	Log + Allow	Allow	Allow
medium	Allow	Log + Allow	Allow	Block
high	Allow	Log + Allow	Block	Block
critical	Allow	Log + Allow	Block	Block

Sensitive Tools

In monitor and guard modes, only the following "sensitive tools" trigger security review:

Bash: Shell command execution
Write: File writing
Edit: File editing
Agent: Sub-Agent generation

In strict mode, all tool calls (including file reading) trigger review.

Fault Tolerance Design

Mode	Behavior on Timeout/Error
Monitor / Guard	Allow (fail-open) — Do not block normal operations when review is unavailable
Strict	Block (fail-closed) — Reject by default when review is unavailable

Review timeout is 15 seconds. If the LLM does not return a review result within 15 seconds, the above rules apply.

Configuration Location

Settings → Clawia → GuardianAgent Mode

Select the desired mode. Changes take effect immediately without restart.

Audit Log

All review results (whether blocked or not) are recorded in the audit log. You can view historical review records in Settings → Clawia → Audit Log, including:

Timestamp
Tool name and parameters
Review result (risk level, reason)
Whether blocked

PromptGuardian

PromptGuardian is security protection for Channel (multi-platform channel) messages. When Elftia receives messages from external users through channels like Discord and Telegram, PromptGuardian uses AI to detect whether messages contain prompt injection attacks.

Three Modes

Mode	Behavior	Recommended Scenario
Off	No injection detection	Use Channel only for yourself, trust all message sources
Monitor	Detect injections, log but do not block	Want to observe if there are suspicious messages
Block	Detect injections, intercept suspicious messages and return friendly rejection	Open Channel, need to defend against malicious users

Detection Content

PromptGuardian can detect the following types of attacks:

Instruction Override: Attempt to override or bypass system instructions (e.g., "Ignore all instructions above")
Role Manipulation: Impersonate other identities (e.g., "You are now DAN")
Prompt Marker Injection: Insert system/instruction markers (e.g., [INST], <|im_start|>)
Prompt Extraction: Attempt to obtain system prompt content
XML/Tag Injection: Inject tool call markers (e.g., <tool_use>, <function_calls>)
Encoding Bypass: Use base64 encoding, invisible characters, and other techniques
Social Engineering: Use "educational purposes" or "hypothetical scenarios" to bypass security rules
Multi-step Attacks: Build attacks gradually across multiple messages

Behavior When Blocked

When a message is blocked, PromptGuardian does not reveal the existence of security detection to the user. Instead, it returns a random friendly rejection message (e.g., "The signal dropped～let's change the topic"), preventing attackers from adjusting their strategy accordingly.

Fault Tolerance Design

PromptGuardian adopts a fail-open design: if review times out (10 seconds) or fails, the message is released normally and will not interrupt message flow due to security detection failures.

Configuration Location

PromptGuardian configuration is located in Channel security settings. You need to first configure the LLM provider and model for security review.

Permission Mode

Permission Mode controls whether an Agent needs user confirmation when executing tool calls.

Mode	Behavior	Use Case
Default	Sensitive operations require user confirmation	Daily use, recommended
AcceptEdits	File edits pass automatically, other operations still require confirmation	Trust Agent's code editing ability
BypassPermissions	All operations pass automatically	High trust in Agent, pursue maximum efficiency
Plan	Agent only makes plans, does not execute any operations	Review Agent's behavior plan

caution

"BypassPermissions" mode allows the Agent to directly execute all operations (including Shell commands and file deletion). Only use this if you completely trust Agent behavior and are in a secure environment. It is recommended to use it with GuardianAgent's "Guard" or "Strict" mode.

Channel Security Settings

When you connect external platforms through a Channel, you can configure the following security options:

Rate Limiting

Setting	Description
Message Rate Limit	Limit the maximum number of messages per user within a specific time window
Global Rate Limit	Limit the total message processing rate of the Channel

Input Filtering

InputSanitizer provides regex-based input filtering (executed before PromptGuardian):

Filter known malicious patterns
Clean potential injection markers
Only produce warning logs, do not block messages

User Roles

Channels can assign roles to different users to control access permissions.

Security Configuration Recommendations

Personal Use

If Elftia is for your personal use only and does not provide services externally:

Setting	Recommended Value
GuardianAgent	Off or Monitor
PromptGuardian	Off
Permission Mode	Default

Shared Channel

If your Channel is open to external users who can send messages:

Setting	Recommended Value
GuardianAgent	Guard or Strict
PromptGuardian	Block
Permission Mode	Default
Rate Limiting	Enabled

Development and Debugging

In development and testing scenarios:

Setting	Recommended Value
GuardianAgent	Monitor
PromptGuardian	Monitor
Permission Mode	AcceptEdits

This allows you to observe security events in audit logs while not affecting development efficiency.

Audit Log

All security events are recorded in the audit log, which you can view in Settings → Clawia → Audit Log.

The log records the following information:

Field	Description
Time	Time when the event occurred
Type	GuardianAgent / PromptGuardian / Permission
Operation	Specific tool name or message source
Result	Allow / Block
Details	Risk level, reason explanation, etc.

tip

Regularly reviewing the audit log can help you understand the Agent's behavior patterns and potential security threats, and adjust your security policies accordingly.

For more configuration options, refer to Settings Reference. If you encounter security-related issues during use, please consult Common Issues.

Security Architecture Overview​

GuardianAgent​

Four Modes​

Risk Levels​

Blocking Behavior by Mode​

Sensitive Tools​

Fault Tolerance Design​

Configuration Location​

Audit Log​

PromptGuardian​

Three Modes​

Detection Content​

Behavior When Blocked​

Fault Tolerance Design​

Configuration Location​

Permission Mode​

Channel Security Settings​

Rate Limiting​

Input Filtering​

User Roles​

Security Configuration Recommendations​

Personal Use​

Shared Channel​

Development and Debugging​

Audit Log​

Security Architecture Overview

GuardianAgent

Four Modes

Risk Levels

Blocking Behavior by Mode

Sensitive Tools

Fault Tolerance Design

Configuration Location

Audit Log

PromptGuardian

Three Modes

Detection Content

Behavior When Blocked

Fault Tolerance Design

Configuration Location

Permission Mode

Channel Security Settings

Rate Limiting

Input Filtering

User Roles

Security Configuration Recommendations

Personal Use

Shared Channel

Development and Debugging

Audit Log