Learning Library

← Back to Library

Framework for Selecting Foundation Models

7m • Unknown Channel • ai-ml • tutorial • intermediate • Watch on YouTube ↗

Key Points

Selecting a foundation model requires balancing factors like training data, parameter count, bias risks, and hallucination potential rather than simply opting for the largest model.
A practical six‑stage AI model selection framework involves (1) defining the use case, (2) listing available model options, (3) gathering each model’s size, performance, cost, and risk metrics, (4) evaluating those characteristics against the use case, (5) testing candidates, and (6) choosing the model that delivers the greatest value.
In the example of generating personalized marketing emails, the organization narrows its choices to two existing models—Meta’s Llama 2‑70B and IBM’s Granite‑13B—and assesses them based on model cards, fine‑tuning relevance, and known performance for text generation.
By comparing size, cost, deployment complexity, and risk profiles, and then running targeted tests, the team can select the model that best meets accuracy, efficiency, and business‑value requirements for the specific email‑writing task.

Sections

Full Transcript

# Framework for Selecting Foundation Models **Source:** [https://www.youtube.com/watch?v=pePAAGfh-IU](https://www.youtube.com/watch?v=pePAAGfh-IU) **Duration:** 00:07:54 ## Summary - Selecting a foundation model requires balancing factors like training data, parameter count, bias risks, and hallucination potential rather than simply opting for the largest model. - A practical six‑stage AI model selection framework involves (1) defining the use case, (2) listing available model options, (3) gathering each model’s size, performance, cost, and risk metrics, (4) evaluating those characteristics against the use case, (5) testing candidates, and (6) choosing the model that delivers the greatest value. - In the example of generating personalized marketing emails, the organization narrows its choices to two existing models—Meta’s Llama 2‑70B and IBM’s Granite‑13B—and assesses them based on model cards, fine‑tuning relevance, and known performance for text generation. - By comparing size, cost, deployment complexity, and risk profiles, and then running targeted tests, the team can select the model that best meets accuracy, efficiency, and business‑value requirements for the specific email‑writing task. ## Sections - [00:00:00](https://www.youtube.com/watch?v=pePAAGfh-IU&t=0s) **Framework for Selecting Foundation Models** - The speaker outlines a six‑stage process for choosing the appropriate generative AI model based on the specific use case, model characteristics, costs, risks, and testing. - [00:03:11](https://www.youtube.com/watch?v=pePAAGfh-IU&t=191s) **Factors for Model Evaluation** - The passage explains how fine‑tuned or zero‑shot use of pre‑trained foundation models impacts performance and outlines three key evaluation criteria—accuracy, reliability (including consistency, explainability, trustworthiness, and toxicity avoidance), and speed. - [00:06:16](https://www.youtube.com/watch?v=pePAAGfh-IU&t=376s) **Balancing Cloud, On‑Prem, Multi‑Model Strategy** - The passage contrasts running an open‑source Llama 2 model on public cloud versus fine‑tuning it on‑premise—emphasizing security, cost, and compute trade‑offs—and proposes a framework for pairing different foundation models with varied enterprise use cases. ## Full Transcript

0:00If you have a use case for generative AI, 0:02how do you decide on which foundation model to pick to run it? 0:07With the huge number of foundation models out there, 0:11It's not an easy question. 0:12Different models are trained on different data and have different parameter counts, 0:16and picking the wrong model can have severe unwanted impact, 0:20like biases originating from the training data or hallucinations that are just plain wrong. 0:25Now, one approach is to just pick the largest, 0:29most massive model out there to execute every task. 0:33The largest models have huge parameter counts 0:36and are usually pretty good generalists, but with large models come costs, 0:42costs of compute, cost of complexity and costs of variability. 0:46So often the better approach is to pick the right size model for the specific use case you have. 0:52So let me propose to you an AI model selection framework. 0:58It has six pretty simple stages. 1:00Let's take a look at what they areand then give some examples of how this might work. 1:06Now, stage one, that is to clearly articulate your use case. 1:11What exactly are you planning to use generative A.I. for? 1:16From there you'll list some of the model options available to you. 1:19Perhaps there are already a subset of foundation models running that you have access to. 1:25With a short list of models you'll next want to identify each model's size, 1:30performance costs, risks, and deployment methods. 1:34Next, evaluate those model characteristics for your specific use case. 1:39Run some tests. 1:40That's the next stage, 1:42testing options based on your previously identified use case and deployment needs. 1:46And then finally, choose the option that provides the most value. 1:51So let's put this framework to the test. 1:54Now, my use case, we're going to say that is a use case for text generation. 2:02I need the AI to write personalized emails for my awesome marketing campaign. 2:07That's stage one. 2:09Now, my organization is already using two foundation models for other things, 2:13so I'll evaluate those. 2:15First of all, we've got Llama 2 2:19and specifically the Llama 2 70 model. a fairly large model, 70 billion parameters. 2:26It's from meta and I know it's quite good at some text generation use cases. 2:31Then there's also Granite that we have deployed. 2:36Granite is a smaller general purpose model and that's from IBM. 2:40And I know there is a 13 billion parameter model 2:44that I've heard does quite well with text generation as well. 2:48So those are the models I'm going to evaluate, Llama 2 and Granite. 2:54Next, we need to evaluate model size, performance, and risks. 2:58And a good place to start here is with the model card. 3:04The model cards might tell us if the model has been 3:08trained on data specifically for our purposes. 3:11Pre-trained Foundation models are fine tuned for specific use cases 3:15such as sentiment analysis or document summarization or maybe text generation. 3:21And that's important to know because if a model is pre trained 3:24on a use case close to ours, it may perform better when processing our prompts 3:29and enable us to use zero shot prompting to obtain our desired results. 3:34And that means we can simply ask the model to perform tasks 3:37without having to provide multiple completed examples first. 3:42Now, when it comes to evaluating model performance for our use case, we can consider three factors. 3:48The first factor that we would consider is accuracy. 3:53Now, accuracy denotes how close the generated output is to the 3:58desired output, and it can be measured objectively and repeatedly 4:03by choosing evaluation metrics that are relevant to your use cases. 4:06So for example, if your use case related to text translation, 4:11the B.L.E.U. - that's the BiLingual Evaluation Understudy benchmark, 4:18can be used to indicate the quality of the generated translations. 4:23Now the second factor relates to reliably of the model. 4:29Now that's a function of several factors actually, such as consistency, 4:33explainability and trustworthiness, 4:35as well as how well a model avoids toxicity like hate speech. 4:40Reliability comes down to trust, 4:42and trust is built through transparency and traceability of the training data 4:46and accuracy and reliability of the output. 4:50And then the third factor that is speed. 4:55And specifically we're saying 4:56how quickly does a user get a response to a submitted prompt? 5:00Now, speed and accuracy are often a trade off here. 5:05Larger models may be slower, but perhaps deliver a more accurate answer. 5:09Or then again, maybe the smaller model is faster 5:12and has minimal differences in accuracy to the larger model. 5:15It really comes down to finding the sweet spot between performance, speed and cost. 5:20A smaller, less expensive model may not offer 5:23performance or accuracy metrics on par with an expensive one, but 5:27it would still be preferable over the latter. 5:30If you consider any additional benefits, the model might deliver like lower latency 5:33and greater transparency into the model inputs and outputs. 5:37The way to find out is to simply select the model that's likely 5:41to deliver the desired output and well, test it. 5:46Test that model with your prompts to see if it works, 5:49and then assess the model, performance and quality of the output using metrics. 5:54Now, I've mentioned deployment in passing, so a quick word on that. 5:58As a decision factor, we need to evaluate where and how we want the model and data to be deployed. 6:05So let's say that we're leaning towards Llama 2 6:09as our chosen model based on our testing. 6:14Right, cool. Llama 2. 6:16That's an open source model and we could inference with it on a public cloud. 6:20So we've got a public cloud already out here. 6:24It's got an element of choice in it, which is limited to we can just inference to that. 6:31But if we decide we want to fine tune the model with our own enterprise data, 6:36we might need to deploy it on prem. 6:40So this is where we have our own version of Llama two 6:47and we are going to provide fine tuning to it. 6:50Now, deploying on premise gives you greater control, 6:53and more security benefits compared to a public cloud environment. 6:57But it's an expensive proposition, 6:59especially when factoring model size and compute power, 7:03including the number of GPUs it takes to run a single large language model. 7:07Now, everything we've discussed here is tied to a specific use case, 7:12but of course it's quite likely that any given organization will have multiple use cases. 7:17And as we run through this model selection framework, 7:21we might find that each use case is better suited to a different foundation model. 7:26That's called a multi model approach. 7:29Essentially, not all A.I. models are the same, and neither are your use cases. 7:35And this framework might be just what you need to pair the models 7:39and the use cases together to find a winning combination of both.