Balancing AI and Human Judgment
Key Points
- Deciding whether a human or an AI should make a particular decision depends on the task’s nature, with AI generally outperforming humans on many statistical decisions but humans excelling when nuanced judgment and context are needed.
- In fraud detection, AI can filter the bulk of alerts by assigning confidence scores, achieving high accuracy on clearly high‑ or low‑confidence cases, while human analysts handle the ambiguous alerts where AI confidence is low.
- Performance curves show AI’s success rate rises sharply with confidence, whereas humans maintain a flatter curve, often outperforming AI at the mid‑range (around 50 % confidence) due to their ability to incorporate external information and flexible reasoning.
- The optimal solution is a hybrid system that routes high‑confidence alerts to AI for efficiency and delegates uncertain or complex cases to skilled analysts, leveraging the strengths of both.
Sections
- Human‑AI Decision Allocation in Fraud Detection - The speaker explains how to split fraud‑alert handling between analysts and an AI by using confidence‑score performance curves to assign high‑certainty cases to humans and low‑uncertainty ones to the algorithm.
- AI Confidence vs Human Judgment - The passage explains that AI outperforms humans when its confidence is high, humans surpass AI when its confidence is low, and merging both via augmented intelligence creates a balanced, intermediate performance curve.
- Optional AI Display Reduces Bias - The speaker explains that showing AI fraud recommendations only when analysts request them mitigates automation bias and trust concerns by allowing a human first impression, while noting that displaying accuracy percentages can further diminish reliance on the AI.
Full Transcript
# Balancing AI and Human Judgment **Source:** [https://www.youtube.com/watch?v=8lo1s29ODj8](https://www.youtube.com/watch?v=8lo1s29ODj8) **Duration:** 00:08:54 ## Summary - Deciding whether a human or an AI should make a particular decision depends on the task’s nature, with AI generally outperforming humans on many statistical decisions but humans excelling when nuanced judgment and context are needed. - In fraud detection, AI can filter the bulk of alerts by assigning confidence scores, achieving high accuracy on clearly high‑ or low‑confidence cases, while human analysts handle the ambiguous alerts where AI confidence is low. - Performance curves show AI’s success rate rises sharply with confidence, whereas humans maintain a flatter curve, often outperforming AI at the mid‑range (around 50 % confidence) due to their ability to incorporate external information and flexible reasoning. - The optimal solution is a hybrid system that routes high‑confidence alerts to AI for efficiency and delegates uncertain or complex cases to skilled analysts, leveraging the strengths of both. ## Sections - [00:00:00](https://www.youtube.com/watch?v=8lo1s29ODj8&t=0s) **Human‑AI Decision Allocation in Fraud Detection** - The speaker explains how to split fraud‑alert handling between analysts and an AI by using confidence‑score performance curves to assign high‑certainty cases to humans and low‑uncertainty ones to the algorithm. - [00:03:09](https://www.youtube.com/watch?v=8lo1s29ODj8&t=189s) **AI Confidence vs Human Judgment** - The passage explains that AI outperforms humans when its confidence is high, humans surpass AI when its confidence is low, and merging both via augmented intelligence creates a balanced, intermediate performance curve. - [00:06:18](https://www.youtube.com/watch?v=8lo1s29ODj8&t=378s) **Optional AI Display Reduces Bias** - The speaker explains that showing AI fraud recommendations only when analysts request them mitigates automation bias and trust concerns by allowing a human first impression, while noting that displaying accuracy percentages can further diminish reliance on the AI. ## Full Transcript
A decision needs to be made.
But who should make it?
Me, a human, ... or an artificial intelligence, an AI?
We've discussed before that humans can outperform AI at some tasks,
but that, statistically, AI will make a better job of deciding for other tasks.
So for one single decision, who should decide?
Well, the answer is a fascinating combination of holistic curves and human bias.
Let's get into it.
So, consider a fraud detection system.
Fraud detection.
The system generates the alerts of potentially fraudulent transactions.
Financial analysts review each alert.
Now, there's thousands of events generated each day,
and the analysts are overwhelmed with 90 percent of those alerts being false positives.
An AI system could help alleviate the workload.
But which alerts should the AI handle, and which should be processed by a skilled financial analyst?
Well, let's draw a graph to answer the question, "Is this a real alert?"
So, let's draw a graph with an X and Y axis.
The Y axis tracks the success rate.
So an alert comes in, we make a prediction as to if it is real or not,
and we track if that prediction turned out to be right.
Along the X axis is the confidence score.
So a confidence score of zero percent
says a prediction thinks that this is definitely not a real alert, it's a false positive.
A confidence score of 100 percent
means that a prediction is certain that it is a real alert.
Now a typical AI performance curve will look something like this.
So we've got very low confidence scores, this is not a real alert,
and very high confidence scores, this is a real alert.
They're correlated to a high success rate.
That's these areas up here.
When the AI is not sure about a given prediction, then it's not such a case.
Lower success rate when the AI is not sure.
And so effectively the AI algorithm is saying, "I don't know".
Now, human performance curves are typically a little bit flatter than that.
So the human's performance curve might look something like this.
Often not quite as accurate as a very confident AI algorithm,
but a little better at making the right decision when the AI is unsure.
At a 50 percent confidence level, a human is likely to do a better job than an AI.
Now why is that?
Well, when an AI is certain of itself,
it's highly performant and beats out humans who can lose consistency and focus and attention.
AIs, they don't get distracted.
But on the other hand, when an AI is unsure,
often for cases that are complex or statistically rare,
humans can outperform an AI prediction by bringing in additional information and context.
They can look stuff up or ask a colleague,
whereas the AI sticks to its same old decision logic and information.
So when a new alert comes in, if the AI assigns a high or low confidence level,
then chances are that statistically speaking, it will do a better job of deriving if that alert is real
or a false positive, than a given financial analyst.
But this is not a zero sum game.
It doesn't have to be AI or human.
We have one more option.
Augmented.
Augmented intelligence combines both a human decision, aided by AI,
and this performance curve falls somewhere between the two.
And for somewhat low and for somewhat high confidence scores,
which make up a significant number of predictions,
it's augmented intelligence that will have the highest success rate.
Except ...
... for augmented intelligence to be most effective, we need to account
for the messy business of human cognitive bias.
We're not always great at doing what we're told.
It turns out that how we present information from an AI algorithm to a human decision maker
has a significant influence on how effectively that information is used.
So, to illustrate that, let's consider forced display vs. optional display.
A forced display simultaneously displays an AI recommendation along with a given decision case.
So, for every fraud decision alert that I need to make a decision about,
I, as the analyst, also see the AI's recommendation.
And this can lead to something called automation bias,
which is the propensity for humans to favor suggestions from automated decision making systems
and to ignore contradictory information.
Effectively, the human decision maker is saying the AI knows best
and going with the AI prediction at the expense of their own judgment.
Optional display means the AI recommendation is only shown to the human decision maker when they request it.
So, a person sees a decision case and can then ask the AI to reveal its recommendation.
This overcomes automation bias
by giving a person time to consider the case for themselves before consulting an AI recommendation.
The human is not overwhelmingly influenced by what the AI thinks
because they've had a chance to make up their own first impression.
And then there's the whole issue of trust, too.
When an AI recommendation is accompanied by an accuracy percentage,
which indicates how likely this prediction is to be correct,
humans are less likely to incorporate the AI recommendation into their decision,
regardless of the accuracy percentage being displayed.
Basically, we don't like recommendations that openly tell us that they might be wrong.
So, we've seen that who should make a decision, a human, an AI,
or a human assisted by an AI recommendation, is something that we can derive.
We can move from subjective decisions to the quantifiable.
That for a given decision who the most effective decision maker is likely to be.
And when the most effective decision maker is a combination of AI and human, that's augmented intelligence,
we must consider a presentation of that augmentation to minimize human cognitive bias in the decision making process.
Brought together, us humans and AI algorithms make a pretty powerful team.
We can improve decision making outcomes - if we just know who to ask.
If you have any questions, please drop us a line below,
and if you want to see more videos like this in the future, please like and subscribe.
Thanks for watching.