Learning Library

← Back to Library

Foundation Models Driving Business Value

Key Points

  • LLMs like ChatGPT have sparked a rapid shift in AI capabilities, moving from niche, task‑specific models to versatile, enterprise‑driving solutions.
  • These models belong to a broader class called “foundation models,” which are pre‑trained on massive amounts of unstructured text data in an unsupervised, generative fashion.
  • The core generative skill—predicting the next word—allows the same model to be fine‑tuned with a small amount of labeled data for a variety of NLP tasks such as classification or named‑entity recognition.
  • This fine‑tuning process turns a single foundation model into a reusable asset that can power many different business applications without building separate models for each use case.
  • IBM Research’s Kate Soule highlights that leveraging foundation models can unlock new sources of enterprise value by dramatically reducing development time and data requirements.

Full Transcript

# Foundation Models Driving Business Value **Source:** [https://www.youtube.com/watch?v=hfIUstzHs9A](https://www.youtube.com/watch?v=hfIUstzHs9A) **Duration:** 00:08:40 ## Summary - LLMs like ChatGPT have sparked a rapid shift in AI capabilities, moving from niche, task‑specific models to versatile, enterprise‑driving solutions. - These models belong to a broader class called “foundation models,” which are pre‑trained on massive amounts of unstructured text data in an unsupervised, generative fashion. - The core generative skill—predicting the next word—allows the same model to be fine‑tuned with a small amount of labeled data for a variety of NLP tasks such as classification or named‑entity recognition. - This fine‑tuning process turns a single foundation model into a reusable asset that can power many different business applications without building separate models for each use case. - IBM Research’s Kate Soule highlights that leveraging foundation models can unlock new sources of enterprise value by dramatically reducing development time and data requirements. ## Sections - [00:00:00](https://www.youtube.com/watch?v=hfIUstzHs9A&t=0s) **Foundation Models: A New AI Paradigm** - Kate Soule explains how large language models, termed foundation models, represent a shift from task‑specific AI to versatile, transferable systems that can power a wide range of business applications and drive enterprise value. - [00:03:12](https://www.youtube.com/watch?v=hfIUstzHs9A&t=192s) **Tuning and Prompting Foundation Models** - The speaker explains how foundation models can be adapted either by fine‑tuning with a small dataset or by using prompt engineering to handle tasks with minimal labeled data, emphasizing their superior performance and key advantages. - [00:06:22](https://www.youtube.com/watch?v=hfIUstzHs9A&t=382s) **Ensuring Trust in Foundation Models** - The speaker highlights that massive, often opaque training data makes it impossible to fully vet bias or toxicity, prompting IBM Research to develop methods to boost efficiency, reliability, and trustworthiness of language and multimodal foundation models for business use. ## Full Transcript
0:00Over the past couple of months, large language models, or LLMs, such as chatGPT, have taken the world by storm. 0:08Whether it's writing poetry or helping plan your upcoming vacation, we are seeing a step change in the performance of AI and its potential to drive enterprise value. 0:19My name is Kate Soule. 0:21I'm a senior manager of business strategy at IBM Research, 0:24and today I'm going to give a brief overview of this new field of AI that's emerging and how it can be used in a business setting to drive value. 0:32Now, large language models are actually a part of a different class of models called foundation models. 0:42Now, the term "foundation models" was actually first coined by a team from Stanford when they saw that the field of AI was converging to a new paradigm. 0:52Where before AI applications were being built by training, 0:56maybe a library of different AI models, where each AI model was trained on very task-specific data to perform very specific task. 1:08They predicted that we were going to start moving to a new paradigm, 1:14where we would have a foundational capability, or a foundation model, that would drive all of these same use cases and applications. 1:24So the same exact applications that we were envisioning before with conventional AI, and the same model could drive any number of additional applications. 1:33The point is that this model could be transferred to any number of tasks. 1:38What gives this model the super power to be able to transfer to multiple different tasks and perform multiple different functions 1:44is that it's been trained on a huge amount, in an unsupervised manner, on unstructured data. 1:57And what that means, in the language domain, is basically I'll feed a bunch of sentences-- and I'm talking terabytes of data here --to train this model. 2:07And the start of my sentence might be "no use crying over spilled" and the end of my sentence might be "milk". 2:15And I'm trying to get my model to predict the last word of the sentence based off of the words that it saw before. 2:22And it's this generative capability of the model-- predicting and generating the next word --based off of previous words that it's seen beforehand, 2:29that is why that foundation models are actually a part of the field of AI called generative AI 2:40because we're generating something new in this case, the next word in a sentence. 2:46And even though these models are trained to perform, at its core, a generation past, predicting the next word in the sentence, we actually can take these models, 2:55and if you introduce a small amount of labeled data to the equation, you can tune them to perform traditional NLP tasks-- things like classification, or 3:06named-entity recognition --things that you don't normally associate as being a generative-based model or capability. 3:13And this process is called tuning. 3:16Where you can tune your foundation model by introducing a small amount of data, 3:19you update the parameters of your model and now perform a very specific natural language task. 3:25If you don't have data, or have only very few data points, you can still take these foundation models and they actually work very well in low-labeled data domains. 3:39And in a process called prompting or prompt engineering, you can apply these models for some of those same exact tasks. 3:50So an example of prompting a model to perform a classification task 3:54might be you could give a model a sentence and then ask it a question: Does this sentence have a positive sentiment or negative sentiment? 4:03The model's going to try and finish generating words in that sentence, and the next natural word in that sentence would be the answer to your classification problem, 4:10which would respond either positive or negative, depending on where it estimated the sentiment of the sentence would be. 4:17And these models work surprisingly well when applied to these new settings and domains. 4:23Now, this is a lot of where the advantages of foundation models come into play. 4:29So if we talk about the advantages, the chief advantage is the performance. 4:40These models have seen so much data. 4:43Again, data with a capital D-- terabytes of data --that by the time that they're applied to small tasks, 4:49they can drastically outperform a model that was only trained on just a few data points. 4:54The second advantage of these models are the productivity gains. 5:04So just like I said earlier, through prompting or tuning, you need far less label data to get to task-specific model 5:13than if you had to start from scratch because your model is taking advantage of all the unlabeled data that it saw in its pre-training when we created this generative task. 5:23With these advantages, there are also some disadvantages that are important to keep in mind. 5:34And the first of those is the compute cost. 5:40So that penalty for having this model see so much data is that they're very expensive to train, 5:47making it difficult for smaller enterprises to train a foundation model on their own. 5:53They're also expensive-- by the time they get to a huge size, a couple billion parameters --they're also very expensive to run inference. 6:01You might require multiple GPUs at a time just to host these models and run inference, making them a more costly method than traditional approaches. 6:10The second disadvantage of these models is on the trustworthiness side. 6:14So just like data is a huge advantage for these models, they've seen so much unstructured data, it also comes at a cost, especially in the domain like language. 6:22A lot of these models are trained basically off of language data that's been scraped from the Internet. 6:28And there's so much data that these models have been trained on. 6:31Even if you had a whole team of human annotators, you wouldn't be able to go through 6:35and actually vet every single data point to make sure that it wasn't biased and didn't contain hate speech or other toxic information. 6:42And that's just assuming you actually know what the data is. 6:45Often we don't even know-- for a lot of these open source models that have been posted 6:48--what the exact datasets are that these models have been trained on leading to trustworthiness issues. 6:55So IBM recognizes the huge potential of these technologies. 6:59But my partners in IBM Research are working on multiple different innovations to try 7:03and improve also the efficiency of these models and the trustworthiness and reliability of these models to make them more relevant in a business setting. 7:12All of these examples that I've talked through so far have just been on the language side. 7:16But the reality is, there are a lot of other domains that foundation models can be applied towards. 7:21Famously, we've seen foundation models for vision --looking at models such as DALL-E 2, which takes text data, and that's then used to generate a custom image. 7:33We've seen models for code with products like Copilot that can help complete code as it's being authored. 7:40And IBM's innovating across all of these domains. 7:43So whether it's language models that we're building into products like Watson Assistant and Watson Discovery, 7:48vision models that we're building into products like Maximo Visual Inspection, 7:53or Ansible code models that we're building with our partners at Red Hat under Project Wisdom. 7:58We're innovating across all of these domains and more. 8:02We're working on chemistry. 8:06So, for example, we just published and released molformer, which is a foundation model to promote molecule discovery or different targeted therapeutics. 8:16And we're working on models for climate change, building Earth Science Foundation models using geospatial data to improve climate research. 8:26I hope you found this video both informative and helpful. 8:29If you're interested in learning more, particularly how IBM is working to improve some of these disadvantages, 8:35making foundation models more trustworthy and more efficient, please take a look at the links below. 8:39Thank you.