Fine-Tuning Agentic AI Systems
Key Points
- Fine‑tuning is presented as the next step to improve the performance, reliability, and domain alignment of agentic AI systems that combine large language models with specialized toolkits.
- Current agent designs suffer from token‑inefficient, heavyweight prompts, high execution costs, and error‑propagation across multi‑step tasks, leading to poor decision‑making and increased failure rates.
- Without deep, domain‑specific knowledge, agents may misuse tools or make decisions misaligned with organizational goals, highlighting the need for tighter integration between the language model and its toolkit.
- Effective fine‑tuning requires a structured data‑collection strategy that separates tool‑specific usage data from general reasoning, planning, and decision‑making examples to systematically address these shortcomings.
Sections
- Fine‑Tuning Agentic AI Systems - The segment explains the need to fine‑tune autonomous AI agents to overcome current design limitations, align tool use with organizational goals, and provides practical data‑collection strategies for effective model customization.
- Fine‑Tuning Agents with Tool Data - The speaker outlines how early mistakes can cascade into agent failures and recommends collecting two types of training data—tool‑specific examples that teach when and how to invoke each tool, and general reasoning/decision‑making samples—to fine‑tune models for more effective, domain‑aligned decisions.
- Iterative Policy Alignment via Data - The speaker outlines how to align an AI agent with organizational policies by leveraging documentation, case studies, execution trace analysis, role‑specific data, and iterative fine‑tuning to improve decision‑making.
Full Transcript
# Fine-Tuning Agentic AI Systems **Source:** [https://www.youtube.com/watch?v=aQuCTWhiiPg](https://www.youtube.com/watch?v=aQuCTWhiiPg) **Duration:** 00:09:13 ## Summary - Fine‑tuning is presented as the next step to improve the performance, reliability, and domain alignment of agentic AI systems that combine large language models with specialized toolkits. - Current agent designs suffer from token‑inefficient, heavyweight prompts, high execution costs, and error‑propagation across multi‑step tasks, leading to poor decision‑making and increased failure rates. - Without deep, domain‑specific knowledge, agents may misuse tools or make decisions misaligned with organizational goals, highlighting the need for tighter integration between the language model and its toolkit. - Effective fine‑tuning requires a structured data‑collection strategy that separates tool‑specific usage data from general reasoning, planning, and decision‑making examples to systematically address these shortcomings. ## Sections - [00:00:00](https://www.youtube.com/watch?v=aQuCTWhiiPg&t=0s) **Fine‑Tuning Agentic AI Systems** - The segment explains the need to fine‑tune autonomous AI agents to overcome current design limitations, align tool use with organizational goals, and provides practical data‑collection strategies for effective model customization. - [00:03:14](https://www.youtube.com/watch?v=aQuCTWhiiPg&t=194s) **Fine‑Tuning Agents with Tool Data** - The speaker outlines how early mistakes can cascade into agent failures and recommends collecting two types of training data—tool‑specific examples that teach when and how to invoke each tool, and general reasoning/decision‑making samples—to fine‑tune models for more effective, domain‑aligned decisions. - [00:06:24](https://www.youtube.com/watch?v=aQuCTWhiiPg&t=384s) **Iterative Policy Alignment via Data** - The speaker outlines how to align an AI agent with organizational policies by leveraging documentation, case studies, execution trace analysis, role‑specific data, and iterative fine‑tuning to improve decision‑making. ## Full Transcript
So you have built an agentic AI system,
but you're looking to boost its performance and reliability.
Today, we'll explore how model fine tuning can be your next step in supercharging your AI agents capabilities.
In this video, we'll explore key considerations for customizing your models
within your agentic system of different levels of autonomy.
We will discuss the shortcomings of current system designs
and how we can systematically address these challenges through fine tuning.
And most importantly, we will focus on practical design tips for data collection that can enable effective fine tuning.
Keep in mind that this is a continuously evolving field with a changing terminology.
First, why agentic systems?
Agentic systems are purpose built to address complex multi-step problems that require degrees of autonomy and creativity.
This allows the systems to adapt and make context aware decision making.
What grounds this approach is the use of the toolkit.
The unique blend of large language models generalization capabilities with domain alignment of the toolkit,
allows agenetic systems to tackle the problems where traditional automation falls short.
However, this flexibility comes with the trade off.
Without the deep domain specific knowledge your large language model may fail to use the tools correctly.
Furthermore, it might make decisions that are not aligned with your organization's unique objectives and constraints.
This calls for the deeper integration between the large language model and the toolkit.
So what are the key limitations of the current designs?
First one is high token and efficiency.
Instead of using tokens to solve your problem, you use token heavy prompt that agents require just for the setup.
Limiting the number of tokens the agent can use for execution actually making the decision and trying to solve your problem.
Furthermore, it draws the focus away from the problem that you're trying to solve.
Second one is the high cost of the execution.
Every time you're running your agent, you're embedding the same amount of token,
which has computational overhead and results in higher costs.
And the most important problem that with agentic systems,
since they are working with multi step complex problems, there is an issue of error propagation.
So if the agent makes incorrect decision
in the beginning of the execution, trace all the following decisions might not lead to the correct answer,
which can lead to the agent's fail rate go up and agents stuck in the feedback loops and not achieving the task successfully.
So over time you're running into the higher cost
just because your model has a shallow understanding that doesn't allow it
to make more effective domain aligned decisions.
If fine tuning can address this challenges, how should we approach data collection?
Well, let's split this conversation into two parts.
One, we're going to talk about the tool specific data,
and another one is going to be general reasoning, decision making or planning capabilities.
So for the tool collection, the most important thing to explain to the model
is when to use the tool, how to call it, and what to do with the output.
So for the first part, you're trying to explain the context.
Let's say you have two very similar tools for search, but they have different context where they should be applied.
Focus on creating the examples that highlight these differences,
annotated with explanations, and try to provide the model with as much information as possible.
How to use the tool,
Is the opportunity to teach the model how to properly configure the core parameters that at all.
Where should this reasoning come from
and how to use the tool effectively to achieve the specific task that the model is trying to solve,
and what to do with the output is defining the expectations for the model.
For example, is the tool that go trying to use deterministic, can you trust the output?
Do we have to do any of the post-processing when you get the output from the tool?
One thing that I would like to focus on is write tools specifically.
Write tools are the tools that modify your environment.
So make sure to be extra careful when you provide the guidelines for those tools.
As a general approach, we reserve to fine tune it here, because we want to show the full range of capabilities.
So focus on the edge cases and make sure
to provide annotations to your model so it can reason better about each of the tool use steps.
This is especially important if you are building your agentic workflow
using custom tools that model may not reason well about by default.
Building on the principles that we use for data collection for more effective tool use,
we can talk about how we can enhance models, reasoning, planning and decision making capabilities.
First, we should treat this as an opportunity to get the model aligned with your organization's specific policies and objectives.
You can do that through the use of documentation.
If you structure and use that as a training data,
this will give the model a great background to rely on when making the decisions.
Furthermore, you can use case studies and showcase how the decisions are made within their organizations.
Which policies in which scenarios are being consulted?
So the model has this understanding when it is trying to make decisions on its own.
And most importantly, since your agent is already running, you can collect, analyze the execution traces.
Then you can help the model with annotating successful and unsuccessful decisions,
explaining why in certain scenarios it is more beneficial to proceed one route over the other one.
Additionally, if your system has any role specific components, for example, you have judges, validators or optimizers,
you can collect raw specific data and improve the robustness of your system
by highlighting how these decisions should be made,
when these decisions are made within the context of the specific role.
As with any AI system, agentic system thrive
on iterative improvement by collecting, analyzing
and inspecting your execution data, you can find the failure modes of your AI system.
These lessons can be incorporated either into your prompt and techniques,
but also these failure modes could be a great source for fine tuning data.
As with all the fine tuning data that you're going to collect,
whether it's going to be regarding the tool use or the general decision making capabilities of the model,
make sure to provide very detailed annotations.
You can use React or other structured reasoning frameworks
to help model process this annotations and become more effective, robust and reliable.
Remember, the ultimate goal of fine tuning is creating a system that is more aligned with your unique challenge.
It comes with additional benefits as reducing cost and making the system more efficient,
but most importantly, it transforms your agentic system from a novel solution to a trusted and reliable partner.
Applying techniques that we discussed today for data collection
can hopefully help you customize your agentic workflow better.