Learning Library

← Back to Library

Multimodal AI for Real-Time Fraud Detection

10m • Unknown Channel • ai-ml • deep-dive • intermediate • Watch on YouTube ↗

Key Points

Banks must decide within ≈200 ms whether a transaction is fraudulent, so they rely heavily on AI to automate this binary judgment.
Traditional fraud‑detection models (logistic regression, decision trees, random forests, gradient‑boosting) are trained on large labeled datasets of structured features such as amount, time, location, and merchant category to output a risk score.
These models struggle with novel fraud tactics and cannot process unstructured information (free‑form text, descriptions, images), causing many ambiguous cases to be escalated for manual review.
An ensemble approach augments predictive ML with encoder‑style large language models (e.g., BERT, RoBERTa) that understand and extract signals from textual data, improving detection of subtle, context‑dependent fraud without generating new content.

Sections

Full Transcript

# Multimodal AI for Real-Time Fraud Detection **Source:** [https://www.youtube.com/watch?v=Mo7JMC_oDlI](https://www.youtube.com/watch?v=Mo7JMC_oDlI) **Duration:** 00:10:50 ## Summary - Banks must decide within ≈200 ms whether a transaction is fraudulent, so they rely heavily on AI to automate this binary judgment. - Traditional fraud‑detection models (logistic regression, decision trees, random forests, gradient‑boosting) are trained on large labeled datasets of structured features such as amount, time, location, and merchant category to output a risk score. - These models struggle with novel fraud tactics and cannot process unstructured information (free‑form text, descriptions, images), causing many ambiguous cases to be escalated for manual review. - An ensemble approach augments predictive ML with encoder‑style large language models (e.g., BERT, RoBERTa) that understand and extract signals from textual data, improving detection of subtle, context‑dependent fraud without generating new content. ## Sections - [00:00:00](https://www.youtube.com/watch?v=Mo7JMC_oDlI&t=0s) **AI-Powered Real-Time Fraud Screening** - Banks use ultra‑fast AI models—traditionally predictive ML on structured transaction features—to decide fraud in under 200 ms, but emerging multimodal AI aims to catch subtle, novel fraud tactics that evade these conventional approaches. - [00:03:11](https://www.youtube.com/watch?v=Mo7JMC_oDlI&t=191s) **LLMs vs Predictive Models in Fraud** - The excerpt contrasts encoder LLMs, which read unstructured text like memos to spot scam language and spoofing cues, with traditional predictive ML that relies on fast, low‑cost analysis of structured columns, highlighting each approach's strengths for fraud detection. - [00:10:00](https://www.youtube.com/watch?v=Mo7JMC_oDlI&t=600s) **On-Chip Multimodel AI Fraud Detection** - The speaker describes how on‑chip AI acceleration enables a multimodel architecture that combines traditional machine‑learning with large‑language‑model reasoning to detect fraud in milliseconds directly where the data resides. ## Full Transcript

0:00Every payment transfer or claim has to pass a single question. Is this fraud - yes or no? Before 0:09the money moves and we usually have less than 200 milliseconds to decide. And that's why banks lean 0:18on AI models. They watch for patterns, learn from history, make decisions fast. And when an AI model 0:24is unsure of how to rate a given transaction, fraud or not fraud, the request can be escalated 0:30for human evaluation. But multimodel AI is changing that. Exactly right. Most fraud detection 0:36platforms today, they begin with traditional machine learning models. So let's call these 0:42predictive ML. We're talking about algorithms like logistic regression and decision trees and 0:49random forest. And don't forget my favorite, gradient boosting machines. Oh, how could we 0:55forget those? Now, these models, they're trained on large labelled data sets of past transactions. 1:01Some of those are fraudulent, some of those are legitimate. And that's so they can recognize 1:06patterns that indicate fraud. So, for example, a gradient boosting model. Yeah. that might 1:13use dozens of features like transaction amount and time, location, merchant category, and user 1:19spending history in its analysis before outputting a risk score. Yeah, lots of well-defined 1:25structured data that the model can process and access, which ultimately gets to an evaluation 1:32that answers the question, is this transaction fraud or not fraud? But novel or subtle fraud 1:39tactics can evade detection if they don't trigger one of the known indicators. And also these models 1:45generally ignore unstructured information like free form text, descriptions, and images entirely. 1:51Cases with any of these attributes or where risk assessment is uncertain typically get escalated 1:57for manual review since the automated system can't conclusively classify them. which brings 2:03us to an ensemble of AI. I was wondering if you'd ever get to that. So ensemble being multiple AI 2:12models. Yeah, exactly. A second AI model for fraud detection. So in addition to the predictive ML, 2:19this second model uses transformer-based large language models and these are yeah encoder LLMs. 2:27Thought you might say that. And so we're clear here on the distinction. Uh a decoder LLM is a 2:34generative AI that can generate new content based on given prompt. So chat bots and things like 2:39that. Whereas an encoder LLM is a non-generative LLM that focuses on natural language 2:46understanding. It's great for understanding text classification, name entity recognition, 2:51and sentiment analysis. Models like BERT and RoBERTa, a lovely couple, aren't they just? And an 2:57encoder LLM is a great fit for fraud detection as these models can grasp nuanced language patterns. 3:03They can detect contextual clues and they can extract key information from unstructured data. 3:11For example, a bank might use an encoder LLM to read the description of an online funds transfer. 3:17If the text says, "Refund for overpayment. Please rush." Well, the model might detect urgency and 3:24phrasing common in scam scenarios and assign a higher risk score. Or an encoder LLM could analyze 3:30the merchant name and free form address for signs of spoofing or association with known fraud cases, 3:36which is something a traditional model might not capture. So, let's compare the two. Predictive ML 3:42and encoder LLM. Predictive ML loves structured data and numbers. It's well suited at spotting 3:49sudden card not present spikes bursts of spending geolocation jumps impossible travel scenarios and 3:56things like that. Anything you can measure in neat columns or in whatever those are that's structured 4:03data Jeff of course. Now whereas an encoder LLM that can read between the lines of unstructured 4:09data so images and the like and uh I guess this is like a mountain vista of course. Does he like that 4:17better? Much better. Much better. So, so think of a a wire memo that says urgent investment 4:23guaranteed 200% ROI. That's a pretty obvious indication of a scam to humanize and also to an 4:31encoder LLM because both can recognize linguistic patterns associated with fraud that don't nicely 4:37fit into a spreadsheet column. In the pros column, predictive ML has certain advantages. for instance 4:45microsecond latency, its cheap compute requirements and its simple scaling and easy to follow audit 4:53trail. Whereas the pros for encoder LLMs are being well context aware and being language savvy and 5:02then being surprisingly good at connecting dots even in situations that humans might miss. A good 5:09encoder LLM can greatly reduce false positives because it understands why something looks a 5:15bit fishy. In the cons column for predictive ML is being that the it's patternbound. A new 5:22scam using clever wording slips right through the defenses unless you craft manually a new sort of 5:30detection scheme. And then the cons for encoder LLMs, well, they're more computationally intensive 5:36than simpler ML models. They have millions or billions of parameters and they require 5:41significant processing often with GPU acceleration to run inference. So that begs the question, 5:48how do you use these two types of AI models in an ensemble solution? So let's build a multiple 5:55model AI fraud detection workflow to answer that. So we're going to start at the top here with a box 6:01that will represent incoming transaction data. All incoming transactions first go through the 6:08predictive model. This model using ML algorithms like random forest receives structured data and 6:16generates a fraud score based on probability of fraud and a confidence level at which point we 6:23assess that confidence level at this stage here. Now in most instances the model's output's pretty 6:30clearcut. Either the score is well below the risk threshold, so it's likely legitimate, 6:35or it's well above it, so most likely fraud. And when the model has a high confidence level, 6:43either way, the transaction is routed straight to the final decision, that's where an action 6:50is taken. Either the transaction is auto approved or it's flagged as fraud. It's the low confidence, 6:56ambiguous transactions that trigger the second stage. When the predictive model returns a score 7:01in the borderline range indicating uncertainty, the system will not immediately decide. Instead, 7:07the transaction is escalated to an encoder model LLM like BERT for further analysis. And the 7:15encoder LLM that receives the original structured features, but it can also process any unstructured 7:22data or contextual information that's available. So that could be like the transactions description 7:28text or customer profile notes and the like. And the encoder LLM, it ingests this composite input 7:35and it compares it with millions of fraud patterns using a deeper context-aware lens, 7:41outputting its own LLM assessment. The final decision engine combines the LLM findings with 7:48the original model's input. So, a transaction that was borderline might be definitively flagged as 7:55fraud because the LLM uncovered incriminating text or it might be cleared because the LLM found the 8:01context to be innocuous. So, in this architecture, straightforward cases, they're processed with 8:07minimal overhead, while trickier cases, they get kind of a second look through the AI rather than 8:14being immediately handed off to a human evaluator. By not sending everything through the LLM, 8:20the system stays efficient. This costly LLM here is only run when necessary. And by using 8:26the LLM on truly ambiguous cases, the system improves overall accuracy. Fewer legitimate 8:33transactions are falsely flagged because the LLM can recognize a benign explanation. And fewer 8:39frauds slip through because the LLM can catch subtle cues the first model missed. And this 8:44can really save a lot of time and resources. So, let's consider insurance claims processing. When 8:51a natural disaster hits, well, lots of claims, they're all kind of filed at the same time, and 8:56insurance agents are probably going to need to put in a bit of overtime to process the high number of 9:02claims coming in. And there's probably a bunch of unstructured data in these claims here. So, 9:09images of uh property damage and stuff like that. Now an ensemble of AI model solution using an 9:15encoder LLM that can look at that unstructured data here and it can extract insights like the 9:22cause of the claim and the urgency and the predictive model can automatically rank and 9:27autojudicate incoming claims together reducing the burden on insurance agents a bit less overtime for 9:34them. But there is one more important piece and that's the infrastructure. Because running these 9:40multiple models in real time, especially something as compute heavy as encoder LLMs, it requires 9:47specialized hardware, right? You need a system that can handle low latency inference at scale, 9:54ideally right at the point of transaction. That's where things like AI accelerator chips come in. 10:00On-chip AI acceleration support workloads like this, allowing fraud detection models to run 10:06directly where the data lives. So while the models do the detecting, it's the hardware that makes 10:12it all possible, especially when you're aiming to catch fraud in milliseconds and well, not minutes. 10:18So that's the multimodel AI for fraud detection. As fraudsters devise new tactics, banks and 10:24businesses need to respond with smarter detection. And a multiple model AI architecture combines the 10:31predictive power of traditional ML here with the contextual reasoning of large language models.