Which LLM do you use for agents?

That depends on the task. Claude (Anthropic) has the best tool-use performance and a large context window, which is useful for long task loops. OpenAI's latest model (OpenAI) is a good option if there is already an OpenAI integration in the environment. I advise based on specific requirements.

How do you deal with hallucinations?

Agents are more susceptible to hallucinations than chatbots because errors in one step carry over to the next. I limit this by always returning tool results as ground truth to the model, by building output validation per step and by configuring the agent to stop and escalate when confidence drops below a threshold. Logging makes every decision verifiable after the fact.

Can the agent execute code itself?

Only if explicitly included in the tool whitelist and the scope requires it. Code execution is a high-risk tool: faulty code can damage systems. When needed, it is executed in an isolated container with resource limits and an allowlist of permitted commands. No generic code execution without explicit scope.

How do you handle token costs in long chains?

I configure a max-iterations limit per agent run to prevent runaway loops. I also use Anthropic API prompt caching to make repeated context cheaper. For production agents I also set a budget alert at API level, so unexpected spikes are immediately visible.

What if the agent gets stuck?

Every agent has a fallback. If the max-iterations limit is reached, or if the agent encounters the same error three times in a row, the agent stops, writes the current state and sends a notification. The task enters a review queue. A human can then decide whether the task should be completed manually or restarted with additional context.

AI agent implementation: autonomous task execution

Chatbot versus agent: the difference in practice

A chatbot responds. You ask a question, you get an answer. The chatbot has no memory of previous messages, no access to external systems and no ability to do anything outside the conversation. That is useful for customer service, FAQ handling and information delivery. But a chatbot executes no tasks.

An AI agent acts. You provide an instruction as a starting point. The agent analyses the instruction, determines which steps are needed, executes those steps via tool calls and checks whether the result is correct before moving to the next step. If something fails, the agent recovers itself or asks for human input. The process stops only when the task is complete or when the agent indicates it cannot continue.

The distinction is architectural. A chatbot is an input-output system. An agent is a loop: plan, action, observation, correction. That loop can run multiple cycles for a single instruction. That makes agents suitable for tasks that are too complex for a script but too context-dependent for a fixed automation.

When an agent is the right choice

Not every task needs an agent. But for a specific category of tasks, an agent is the most effective solution:

▸Multi-step tasks where each step depends on the result of the previous step. A script cannot handle this well because the branching is too complex.
▸Tasks requiring context-dependent decisions. The agent must understand what the right action is based on the specific situation, not based on a fixed rule.
▸Tasks where multiple tools need to be combined. For example: fetch data from a system, analyse it, enrich via an external API and write the result back.
▸Tasks where error handling itself requires reasoning. If a tool returns an unexpected result, the agent must decide whether that is acceptable or whether it should approach the task differently.
▸Tasks where the output needs to be validated before use. The agent can evaluate its own result and retry if it does not meet the criteria.

The common characteristic is variability. If the task always proceeds in exactly the same way, a script or n8n workflow is more efficient. Once the task involves variation that requires human reasoning, an agent is the better choice.

When you do not need an agent

Building an agent is more expensive than writing a script or configuring a workflow in n8n. That is not a problem when the task requires it, but it is a reason to be critical:

▸Fully deterministic tasks: if the steps are always identical and the input is structured, a script or workflow is faster, cheaper and more reliable than an agent.
▸Single API calls: if it comes down to fetching data and writing it back without decisions in between, I would rather build a direct integration or n8n node.
▸Rule-based logic: if the decisions can be fully written out in if-then rules without AI reasoning, an agent adds no value.
▸Situations with zero tolerance for variation: for financial transactions or medical data where any deviation is unacceptable, an agent is not the right choice. Deterministic code is safer.
▸If the budget does not match the complexity: agents consume more tokens per task than a chatbot. At large volumes the costs need to be weighed.

I will say this in an initial conversation if it applies. If a simpler solution fits better, I propose that. That is ultimately better for the collaboration.

Architecture: how I build an agent

The core of every agent I build consists of three layers:

▸Reasoning engine: Claude (via the Anthropic API) or OpenAI's latest model (via the OpenAI API) as the model that plans, decides and generates text. The choice depends on the task, the budget and the required context window.
▸Tool layer via MCP servers: the agent has access to a set of tools I explicitly define. A tool can be a database query, a REST API call, a file operation or an external service. Every tool has a schema with parameters and a description. The agent decides which tool is relevant.
▸State layer in Postgres: the agent stores its current task state in a Postgres table. Every step is logged with timestamp, tool name, parameters, result and any errors. This makes the full execution reconstructable.
▸Audit log: in addition to the state table, I write all tool calls to a separate audit log. That log is read-only and immutable. Intended for debugging, compliance and post-mortem analysis.
▸Orchestrator: the application code that drives the loop. It sends the instruction to the model, processes the tool calls, returns results to the model and determines when the task is done or must be stopped.

The orchestrator is deliberately simple. Complex orchestration frameworks add abstractions that make debugging harder. I write the orchestrator myself in TypeScript or Python, depending on the context.

Guardrails: how I keep the agent in check

An agent without limits is an agent you cannot trust. Every agent I build has explicit constraints:

▸Max iterations: the agent stops after a configurable maximum number of steps, even if the task is not complete. This prevents infinite loops on unexpected input or model failure.
▸Tool whitelist: the agent only has access to tools explicitly defined. No generic code execution, no access to systems outside the scope. Every tool is in the config.
▸Human approval on critical actions: actions with irreversible consequences, such as overwriting files, deleting data or sending external messages, require a human approval step. The agent pauses and waits.
▸Rollback transactions: write actions to a database run via transactions. If the agent makes an error halfway through, the entire operation is rolled back. No partially modified data.
▸Parameter validation per tool: every tool call is validated before execution. Incorrect parameters are rejected with a clear error message the agent can interpret.
▸Confidence threshold: for tasks where the agent makes an assessment, I configure a minimum confidence threshold. Below that threshold the agent asks for human input.

Guardrails are not an afterthought. They are the reason you can deploy an agent on real tasks without constantly watching.

-- Client case

Quote administration: agent that processes and uploads files

An SME in quote administration received daily batches of attachments: PDF files, spreadsheets and scanned documents. Staff had to manually determine which data was relevant per file, adjust the right fields and upload the processed file to an internal system.

I built an agent that handles this process autonomously. The agent receives a batch of files as input. Per file the model determines what adjustments are needed based on the content. The agent then calls a file-edit tool, validates whether the modified file meets the expected format and uploads it to the internal system via an API tool. Files that do not meet the threshold are flagged for human review.

Steps per file

< 5%

Human review needed

Per step

Logging

The agent logs every decision: which file, which tool call, what the result was and whether a retry was needed. The audit log is directly available to staff. When errors occur, you can see exactly where it went wrong and what the agent tried.

What I do not build

There are deliberate limits to what I build:

▸No agents without logging. Without an audit trail there is no way to verify what the agent did. I always deliver a fully logged system.
▸No agents with unrestricted tool access. An agent with access to everything is a security risk. The tool whitelist is always explicit and minimal.
▸No agents without error recovery. An agent that stalls and stops without notification is not production-ready. Error handling and fallback to human-in-the-loop are standard.
▸No agents for tasks where deterministic code fits better. If a script or workflow covers the task, that is the better choice.
▸No agents without a clear stop condition. An agent must know when it is done. Without a stop condition an agent keeps iterating until the token budget runs out.

What does an agent implementation cost?

Scope determines price. A single agent for a well-defined task is less work than a multi-agent system with state management, rollback and approval flows. After a conversation I provide a concrete proposal.

On request

Schedule a call. I look at your task and give an honest picture of scope and cost.

→Back to the AI overview →Custom AI tools →Claude MCP integration →AI-driven automation

AI agents that take work off your hands