Learning Library

← Back to Library

Framework for Selecting Foundation Models

Key Points

  • Selecting a foundation model requires balancing factors like training data, parameter count, bias risks, and hallucination potential rather than simply opting for the largest model.
  • A practical six‑stage AI model selection framework involves (1) defining the use case, (2) listing available model options, (3) gathering each model’s size, performance, cost, and risk metrics, (4) evaluating those characteristics against the use case, (5) testing candidates, and (6) choosing the model that delivers the greatest value.
  • In the example of generating personalized marketing emails, the organization narrows its choices to two existing models—Meta’s Llama 2‑70B and IBM’s Granite‑13B—and assesses them based on model cards, fine‑tuning relevance, and known performance for text generation.
  • By comparing size, cost, deployment complexity, and risk profiles, and then running targeted tests, the team can select the model that best meets accuracy, efficiency, and business‑value requirements for the specific email‑writing task.

Full Transcript

# Framework for Selecting Foundation Models **Source:** [https://www.youtube.com/watch?v=pePAAGfh-IU](https://www.youtube.com/watch?v=pePAAGfh-IU) **Duration:** 00:07:54 ## Summary - Selecting a foundation model requires balancing factors like training data, parameter count, bias risks, and hallucination potential rather than simply opting for the largest model. - A practical six‑stage AI model selection framework involves (1) defining the use case, (2) listing available model options, (3) gathering each model’s size, performance, cost, and risk metrics, (4) evaluating those characteristics against the use case, (5) testing candidates, and (6) choosing the model that delivers the greatest value. - In the example of generating personalized marketing emails, the organization narrows its choices to two existing models—Meta’s Llama 2‑70B and IBM’s Granite‑13B—and assesses them based on model cards, fine‑tuning relevance, and known performance for text generation. - By comparing size, cost, deployment complexity, and risk profiles, and then running targeted tests, the team can select the model that best meets accuracy, efficiency, and business‑value requirements for the specific email‑writing task. ## Sections - [00:00:00](https://www.youtube.com/watch?v=pePAAGfh-IU&t=0s) **Framework for Selecting Foundation Models** - The speaker outlines a six‑stage process for choosing the appropriate generative AI model based on the specific use case, model characteristics, costs, risks, and testing. - [00:03:11](https://www.youtube.com/watch?v=pePAAGfh-IU&t=191s) **Factors for Model Evaluation** - The passage explains how fine‑tuned or zero‑shot use of pre‑trained foundation models impacts performance and outlines three key evaluation criteria—accuracy, reliability (including consistency, explainability, trustworthiness, and toxicity avoidance), and speed. - [00:06:16](https://www.youtube.com/watch?v=pePAAGfh-IU&t=376s) **Balancing Cloud, On‑Prem, Multi‑Model Strategy** - The passage contrasts running an open‑source Llama 2 model on public cloud versus fine‑tuning it on‑premise—emphasizing security, cost, and compute trade‑offs—and proposes a framework for pairing different foundation models with varied enterprise use cases. ## Full Transcript
0:00If you have a use case for generative AI, 0:02how do you decide on which  foundation model to pick to run it? 0:07With the huge number of  foundation models out there, 0:11It's not an easy question. 0:12Different models are trained on different  data and have different parameter counts, 0:16and picking the wrong model can  have severe unwanted impact, 0:20like biases originating from the training data  or hallucinations that are just plain wrong. 0:25Now, one approach is to just pick the largest, 0:29most massive model out  there to execute every task. 0:33The largest models have huge parameter counts 0:36and are usually pretty good generalists,  but with large models come costs, 0:42costs of compute, cost of  complexity and costs of variability. 0:46So often the better approach is to pick the right  size model for the specific use case you have. 0:52So let me propose to you an  AI model selection framework. 0:58It has six pretty simple stages. 1:00Let's take a look at what they areand then  give some examples of how this might work. 1:06Now, stage one, that is to  clearly articulate your use case. 1:11What exactly are you planning  to use generative A.I. for? 1:16From there you'll list some of the  model options available to you. 1:19Perhaps there are already a subset of foundation  models running that you have access to. 1:25With a short list of models you'll next  want to identify each model's size, 1:30performance costs, risks, and deployment methods. 1:34Next, evaluate those model characteristics  for your specific use case. 1:39Run some tests. 1:40That's the next stage, 1:42testing options based on your previously  identified use case and deployment needs. 1:46And then finally, choose the option  that provides the most value. 1:51So let's put this framework to the test. 1:54Now, my use case, we're going to say  that is a use case for text generation. 2:02I need the AI to write personalized  emails for my awesome marketing campaign. 2:07That's stage one. 2:09Now, my organization is already using  two foundation models for other things, 2:13so I'll evaluate those. 2:15First of all, we've got Llama 2 2:19and specifically the Llama 2 70 model. a  fairly large model, 70 billion parameters. 2:26It's from meta and I know it's quite  good at some text generation use cases. 2:31Then there's also Granite that we have deployed. 2:36Granite is a smaller general  purpose model and that's from IBM. 2:40And I know there is a 13 billion parameter model 2:44that I've heard does quite well  with text generation as well. 2:48So those are the models I'm going  to evaluate, Llama 2 and Granite. 2:54Next, we need to evaluate model  size, performance, and risks. 2:58And a good place to start  here is with the model card. 3:04The model cards might tell  us if the model has been 3:08trained on data specifically for our purposes. 3:11Pre-trained Foundation models are  fine tuned for specific use cases 3:15such as sentiment analysis or document  summarization or maybe text generation. 3:21And that's important to know  because if a model is pre trained 3:24on a use case close to ours, it may  perform better when processing our prompts 3:29and enable us to use zero shot  prompting to obtain our desired results. 3:34And that means we can simply  ask the model to perform tasks 3:37without having to provide  multiple completed examples first. 3:42Now, when it comes to evaluating model performance  for our use case, we can consider three factors. 3:48The first factor that we  would consider is accuracy. 3:53Now, accuracy denotes how close  the generated output is to the 3:58desired output, and it can be measured objectively and repeatedly 4:03by choosing evaluation metrics that  are relevant to your use cases. 4:06So for example, if your use case  related to text translation, 4:11the B.L.E.U. - that's the BiLingual  Evaluation Understudy benchmark, 4:18can be used to indicate the quality  of the generated translations. 4:23Now the second factor relates  to reliably of the model. 4:29Now that's a function of several  factors actually, such as consistency, 4:33explainability and trustworthiness, 4:35as well as how well a model  avoids toxicity like hate speech. 4:40Reliability comes down to trust, 4:42and trust is built through transparency  and traceability of the training data 4:46and accuracy and reliability of the output. 4:50And then the third factor that is speed. 4:55And specifically we're saying 4:56how quickly does a user get a  response to a submitted prompt? 5:00Now, speed and accuracy  are often a trade off here. 5:05Larger models may be slower, but  perhaps deliver a more accurate answer. 5:09Or then again, maybe the smaller model is faster 5:12and has minimal differences in  accuracy to the larger model. 5:15It really comes down to finding the sweet  spot between performance, speed and cost. 5:20A smaller, less expensive model may not offer 5:23performance or accuracy metrics  on par with an expensive one, but 5:27it would still be preferable over the latter. 5:30If you consider any additional benefits,  the model might deliver like lower latency 5:33and greater transparency into  the model inputs and outputs. 5:37The way to find out is to simply  select the model that's likely 5:41to deliver the desired output and well, test it. 5:46Test that model with your  prompts to see if it works, 5:49and then assess the model, performance  and quality of the output using metrics. 5:54Now, I've mentioned deployment in  passing, so a quick word on that. 5:58As a decision factor, we need to evaluate where  and how we want the model and data to be deployed. 6:05So let's say that we're leaning towards Llama 2 6:09as our chosen model based on our testing. 6:14Right, cool. Llama 2. 6:16That's an open source model and we could  inference with it on a public cloud. 6:20So we've got a public cloud already out here. 6:24It's got an element of choice in it, which  is limited to we can just inference to that. 6:31But if we decide we want to fine tune  the model with our own enterprise data, 6:36we might need to deploy it on prem. 6:40So this is where we have  our own version of Llama two 6:47and we are going to provide fine tuning to it. 6:50Now, deploying on premise  gives you greater control, 6:53and more security benefits compared  to a public cloud environment. 6:57But it's an expensive proposition, 6:59especially when factoring  model size and compute power, 7:03including the number of GPUs it takes to run a single large language model. 7:07Now, everything we've discussed  here is tied to a specific use case, 7:12but of course it's quite likely that any given  organization will have multiple use cases. 7:17And as we run through this  model selection framework, 7:21we might find that each use case is better  suited to a different foundation model. 7:26That's called a multi model approach. 7:29Essentially, not all A.I. models are the  same, and neither are your use cases. 7:35And this framework might be just  what you need to pair the models 7:39and the use cases together to find  a winning combination of both.