Multimodal AI for Real-Time Fraud Detection
Key Points
- Banks must decide within ≈200 ms whether a transaction is fraudulent, so they rely heavily on AI to automate this binary judgment.
- Traditional fraud‑detection models (logistic regression, decision trees, random forests, gradient‑boosting) are trained on large labeled datasets of structured features such as amount, time, location, and merchant category to output a risk score.
- These models struggle with novel fraud tactics and cannot process unstructured information (free‑form text, descriptions, images), causing many ambiguous cases to be escalated for manual review.
- An ensemble approach augments predictive ML with encoder‑style large language models (e.g., BERT, RoBERTa) that understand and extract signals from textual data, improving detection of subtle, context‑dependent fraud without generating new content.
Sections
- AI-Powered Real-Time Fraud Screening - Banks use ultra‑fast AI models—traditionally predictive ML on structured transaction features—to decide fraud in under 200 ms, but emerging multimodal AI aims to catch subtle, novel fraud tactics that evade these conventional approaches.
- LLMs vs Predictive Models in Fraud - The excerpt contrasts encoder LLMs, which read unstructured text like memos to spot scam language and spoofing cues, with traditional predictive ML that relies on fast, low‑cost analysis of structured columns, highlighting each approach's strengths for fraud detection.
- On-Chip Multimodel AI Fraud Detection - The speaker describes how on‑chip AI acceleration enables a multimodel architecture that combines traditional machine‑learning with large‑language‑model reasoning to detect fraud in milliseconds directly where the data resides.
Full Transcript
# Multimodal AI for Real-Time Fraud Detection **Source:** [https://www.youtube.com/watch?v=Mo7JMC_oDlI](https://www.youtube.com/watch?v=Mo7JMC_oDlI) **Duration:** 00:10:50 ## Summary - Banks must decide within ≈200 ms whether a transaction is fraudulent, so they rely heavily on AI to automate this binary judgment. - Traditional fraud‑detection models (logistic regression, decision trees, random forests, gradient‑boosting) are trained on large labeled datasets of structured features such as amount, time, location, and merchant category to output a risk score. - These models struggle with novel fraud tactics and cannot process unstructured information (free‑form text, descriptions, images), causing many ambiguous cases to be escalated for manual review. - An ensemble approach augments predictive ML with encoder‑style large language models (e.g., BERT, RoBERTa) that understand and extract signals from textual data, improving detection of subtle, context‑dependent fraud without generating new content. ## Sections - [00:00:00](https://www.youtube.com/watch?v=Mo7JMC_oDlI&t=0s) **AI-Powered Real-Time Fraud Screening** - Banks use ultra‑fast AI models—traditionally predictive ML on structured transaction features—to decide fraud in under 200 ms, but emerging multimodal AI aims to catch subtle, novel fraud tactics that evade these conventional approaches. - [00:03:11](https://www.youtube.com/watch?v=Mo7JMC_oDlI&t=191s) **LLMs vs Predictive Models in Fraud** - The excerpt contrasts encoder LLMs, which read unstructured text like memos to spot scam language and spoofing cues, with traditional predictive ML that relies on fast, low‑cost analysis of structured columns, highlighting each approach's strengths for fraud detection. - [00:10:00](https://www.youtube.com/watch?v=Mo7JMC_oDlI&t=600s) **On-Chip Multimodel AI Fraud Detection** - The speaker describes how on‑chip AI acceleration enables a multimodel architecture that combines traditional machine‑learning with large‑language‑model reasoning to detect fraud in milliseconds directly where the data resides. ## Full Transcript
Every payment transfer or claim has to pass a single question. Is this fraud - yes or no? Before
the money moves and we usually have less than 200 milliseconds to decide. And that's why banks lean
on AI models. They watch for patterns, learn from history, make decisions fast. And when an AI model
is unsure of how to rate a given transaction, fraud or not fraud, the request can be escalated
for human evaluation. But multimodel AI is changing that. Exactly right. Most fraud detection
platforms today, they begin with traditional machine learning models. So let's call these
predictive ML. We're talking about algorithms like logistic regression and decision trees and
random forest. And don't forget my favorite, gradient boosting machines. Oh, how could we
forget those? Now, these models, they're trained on large labelled data sets of past transactions.
Some of those are fraudulent, some of those are legitimate. And that's so they can recognize
patterns that indicate fraud. So, for example, a gradient boosting model. Yeah. that might
use dozens of features like transaction amount and time, location, merchant category, and user
spending history in its analysis before outputting a risk score. Yeah, lots of well-defined
structured data that the model can process and access, which ultimately gets to an evaluation
that answers the question, is this transaction fraud or not fraud? But novel or subtle fraud
tactics can evade detection if they don't trigger one of the known indicators. And also these models
generally ignore unstructured information like free form text, descriptions, and images entirely.
Cases with any of these attributes or where risk assessment is uncertain typically get escalated
for manual review since the automated system can't conclusively classify them. which brings
us to an ensemble of AI. I was wondering if you'd ever get to that. So ensemble being multiple AI
models. Yeah, exactly. A second AI model for fraud detection. So in addition to the predictive ML,
this second model uses transformer-based large language models and these are yeah encoder LLMs.
Thought you might say that. And so we're clear here on the distinction. Uh a decoder LLM is a
generative AI that can generate new content based on given prompt. So chat bots and things like
that. Whereas an encoder LLM is a non-generative LLM that focuses on natural language
understanding. It's great for understanding text classification, name entity recognition,
and sentiment analysis. Models like BERT and RoBERTa, a lovely couple, aren't they just? And an
encoder LLM is a great fit for fraud detection as these models can grasp nuanced language patterns.
They can detect contextual clues and they can extract key information from unstructured data.
For example, a bank might use an encoder LLM to read the description of an online funds transfer.
If the text says, "Refund for overpayment. Please rush." Well, the model might detect urgency and
phrasing common in scam scenarios and assign a higher risk score. Or an encoder LLM could analyze
the merchant name and free form address for signs of spoofing or association with known fraud cases,
which is something a traditional model might not capture. So, let's compare the two. Predictive ML
and encoder LLM. Predictive ML loves structured data and numbers. It's well suited at spotting
sudden card not present spikes bursts of spending geolocation jumps impossible travel scenarios and
things like that. Anything you can measure in neat columns or in whatever those are that's structured
data Jeff of course. Now whereas an encoder LLM that can read between the lines of unstructured
data so images and the like and uh I guess this is like a mountain vista of course. Does he like that
better? Much better. Much better. So, so think of a a wire memo that says urgent investment
guaranteed 200% ROI. That's a pretty obvious indication of a scam to humanize and also to an
encoder LLM because both can recognize linguistic patterns associated with fraud that don't nicely
fit into a spreadsheet column. In the pros column, predictive ML has certain advantages. for instance
microsecond latency, its cheap compute requirements and its simple scaling and easy to follow audit
trail. Whereas the pros for encoder LLMs are being well context aware and being language savvy and
then being surprisingly good at connecting dots even in situations that humans might miss. A good
encoder LLM can greatly reduce false positives because it understands why something looks a
bit fishy. In the cons column for predictive ML is being that the it's patternbound. A new
scam using clever wording slips right through the defenses unless you craft manually a new sort of
detection scheme. And then the cons for encoder LLMs, well, they're more computationally intensive
than simpler ML models. They have millions or billions of parameters and they require
significant processing often with GPU acceleration to run inference. So that begs the question,
how do you use these two types of AI models in an ensemble solution? So let's build a multiple
model AI fraud detection workflow to answer that. So we're going to start at the top here with a box
that will represent incoming transaction data. All incoming transactions first go through the
predictive model. This model using ML algorithms like random forest receives structured data and
generates a fraud score based on probability of fraud and a confidence level at which point we
assess that confidence level at this stage here. Now in most instances the model's output's pretty
clearcut. Either the score is well below the risk threshold, so it's likely legitimate,
or it's well above it, so most likely fraud. And when the model has a high confidence level,
either way, the transaction is routed straight to the final decision, that's where an action
is taken. Either the transaction is auto approved or it's flagged as fraud. It's the low confidence,
ambiguous transactions that trigger the second stage. When the predictive model returns a score
in the borderline range indicating uncertainty, the system will not immediately decide. Instead,
the transaction is escalated to an encoder model LLM like BERT for further analysis. And the
encoder LLM that receives the original structured features, but it can also process any unstructured
data or contextual information that's available. So that could be like the transactions description
text or customer profile notes and the like. And the encoder LLM, it ingests this composite input
and it compares it with millions of fraud patterns using a deeper context-aware lens,
outputting its own LLM assessment. The final decision engine combines the LLM findings with
the original model's input. So, a transaction that was borderline might be definitively flagged as
fraud because the LLM uncovered incriminating text or it might be cleared because the LLM found the
context to be innocuous. So, in this architecture, straightforward cases, they're processed with
minimal overhead, while trickier cases, they get kind of a second look through the AI rather than
being immediately handed off to a human evaluator. By not sending everything through the LLM,
the system stays efficient. This costly LLM here is only run when necessary. And by using
the LLM on truly ambiguous cases, the system improves overall accuracy. Fewer legitimate
transactions are falsely flagged because the LLM can recognize a benign explanation. And fewer
frauds slip through because the LLM can catch subtle cues the first model missed. And this
can really save a lot of time and resources. So, let's consider insurance claims processing. When
a natural disaster hits, well, lots of claims, they're all kind of filed at the same time, and
insurance agents are probably going to need to put in a bit of overtime to process the high number of
claims coming in. And there's probably a bunch of unstructured data in these claims here. So,
images of uh property damage and stuff like that. Now an ensemble of AI model solution using an
encoder LLM that can look at that unstructured data here and it can extract insights like the
cause of the claim and the urgency and the predictive model can automatically rank and
autojudicate incoming claims together reducing the burden on insurance agents a bit less overtime for
them. But there is one more important piece and that's the infrastructure. Because running these
multiple models in real time, especially something as compute heavy as encoder LLMs, it requires
specialized hardware, right? You need a system that can handle low latency inference at scale,
ideally right at the point of transaction. That's where things like AI accelerator chips come in.
On-chip AI acceleration support workloads like this, allowing fraud detection models to run
directly where the data lives. So while the models do the detecting, it's the hardware that makes
it all possible, especially when you're aiming to catch fraud in milliseconds and well, not minutes.
So that's the multimodel AI for fraud detection. As fraudsters devise new tactics, banks and
businesses need to respond with smarter detection. And a multiple model AI architecture combines the
predictive power of traditional ML here with the contextual reasoning of large language models.