AI QualityJune 23, 2026 6 min read

Testing AI agents: the new quality reflex for SMEs

The JFTL 2026 material points to a clear shift: the discussion is no longer only about using AI to generate test cases. Teams now need to test agents that can interpret a request, consult tools, propose an action or chain several steps. For an SME, this becomes very practical. Before giving a slice of a process to an agent, you need to know how to verify its behaviour.

An agent is not tested like a simple form

A form follows a fairly stable path: mandatory fields, expected formats, validation rules and error messages. An AI agent introduces interpretation. It can rephrase a request, choose a source, decide that information is missing, prepare a response or propose the next action.

The “Testing AI agents” session presented at JFTL 2026 highlights this change: an agent combines a model, tools, possible memory, a goal and limits. Acceptance testing therefore needs to look at the full behaviour, not only the final answer.

Start with the delegation contract

An SME can avoid a lot of confusion by writing down what the agent is allowed to do. This contract can stay simple: which goal is delegated, which sources it may use, which actions it may prepare, which actions remain forbidden, and which signals require human validation.

This connects with another JFTL 2026 thread on human-agent governance: automating a task and delegating an objective do not create the same risk. The more discretion the agent has, the more testing must verify boundaries, refusals, clarification requests and traces left in the process.

Build a test matrix with real cases

The right reflex is not to ask the agent for ten perfect examples. Select ordinary cases instead: a complete request, an ambiguous request, missing data, a badly named attachment, a priority customer, a regulatory exception, an action that must be refused.

For each case, write the expected answer, source used, confidence level, expected human decision and behaviour in case of uncertainty. This matrix creates an acceptance baseline that business teams can read, not only the technical team.

Also test what should go wrong

The material on adversarial testing and red teaming is a useful reminder: testing AI is not only confirming that it works in favourable cases. You must try to push it outside the frame: contradictory instruction, sensitive data, out-of-scope request, doubtful source, pressure to bypass a rule.

In an SME, this can stay lightweight. A few adversarial scenarios already show whether the agent refuses correctly, explains its limit, protects data and hands over to a person when the case becomes sensitive.

Measure what actually helps decisions

A correct-answer rate can be reassuring without being enough. To steer a business agent, it is better to track indicators that help decisions: clarification requests, errors caught before action, cases handed to humans, missing sources, time saved on simple cases, incidents avoided.

The point is not to turn an AI pilot into a heavy quality programme. It is to know whether the agent makes the process more reliable. If the metrics do not help correct the scope, instructions or sources, they become decorative.

Sources consulted

CFTL, Journée Française des Tests Logiciels 2026, support page: https://cftl.fr/actualites/jftl/. “Tester les agents IA : défis, techniques et retour d’expériences”, Bruno Legeard, Smartesting.

Additional JFTL 2026 material: “Vers une gouvernance antifragile : décider à l’ère des collectifs humain-agent”, IBM and Impacteev; “Redéfinir la qualité de l’IA grâce aux tests contradictoires et au red teaming”, Applause; “Arrêter les vanity metrics”, Shift Op Solutions and Alpharho.

Next step

Turn this reference point into a concrete project

If this topic resonates with a situation in your organisation, a short diagnostic lets us look at the process, the available data, the risks and the right initial scope.

Request a diagnostic