Learning Library

← Back to Library

Foundation Model Development Workflow

Key Points

  • Deep learning traditionally requires collecting, labeling, and training large, domain‑specific datasets for each new AI application, such as chatbots or fraud detection.
  • Foundation models serve as a central, pre‑trained base that can be fine‑tuned with smaller, specialized data sets, dramatically accelerating the creation of niche AI solutions (e.g., predictive maintenance or code translation).
  • The AI model development workflow begins with Stage 1: preparing a massive, filtered “base data pile” from open‑source and proprietary sources, categorizing content, removing profanity, copyrighted material, sensitive information, and duplicates to ensure governance and data quality.
  • Stage 2 involves selecting an appropriate foundation model type (generative, encoder‑only, lightweight, high‑parameter, etc.), tokenizing the curated data pile, and training the model, setting the foundation for subsequent fine‑tuning and deployment stages.

Full Transcript

# Foundation Model Development Workflow **Source:** [https://www.youtube.com/watch?v=jcgaNrC4ElU](https://www.youtube.com/watch?v=jcgaNrC4ElU) **Duration:** 00:06:55 ## Summary - Deep learning traditionally requires collecting, labeling, and training large, domain‑specific datasets for each new AI application, such as chatbots or fraud detection. - Foundation models serve as a central, pre‑trained base that can be fine‑tuned with smaller, specialized data sets, dramatically accelerating the creation of niche AI solutions (e.g., predictive maintenance or code translation). - The AI model development workflow begins with Stage 1: preparing a massive, filtered “base data pile” from open‑source and proprietary sources, categorizing content, removing profanity, copyrighted material, sensitive information, and duplicates to ensure governance and data quality. - Stage 2 involves selecting an appropriate foundation model type (generative, encoder‑only, lightweight, high‑parameter, etc.), tokenizing the curated data pile, and training the model, setting the foundation for subsequent fine‑tuning and deployment stages. ## Sections - [00:00:00](https://www.youtube.com/watch?v=jcgaNrC4ElU&t=0s) **Foundation Models Streamline AI Development** - Foundation models serve as versatile base models that can be fine‑tuned with specialized data, dramatically accelerating the creation of niche AI applications such as predictive maintenance or language translation by reducing the need for extensive data gathering and training from scratch. - [00:03:06](https://www.youtube.com/watch?v=jcgaNrC4ElU&t=186s) **Selecting, Training, and Tuning Foundation Models** - An overview of picking a suitable foundation model, tokenizing massive data piles, training and validating the model, and finally fine‑tuning it with application developers. - [00:06:20](https://www.youtube.com/watch?v=jcgaNrC4ElU&t=380s) **Watsonx Governance Enables Structured AI Development** - The passage explains how watsonx.governance handles data and model cards across stages to ensure a governed AI lifecycle, while watsonx.ai lets developers engage with models in Stage 4, supporting a five‑stage workflow that accelerates and sophisticates AI model creation. ## Full Transcript
0:00Deep learning has enabled us to build detailed specialized AI models, 0:07and we can do that provided we gather enough data, 0:10label it, and use that to train and deploy those models. 0:14Models like customer service chatbots or fraud detection in banking. 0:18Now, in the past if you wanted to build a new model for your specialization - 0:22so, say a model for predictive maintenance in manufacturing - 0:26well, you’d need to start again with data selection and curation, 0:30labeling, model development, training, and validation. 0:33But foundation models are changing that paradigm. 0:37So what is a foundation model? 0:42A foundation model is a more focused, centralized effort to create a base model. 0:50And, through fine tuning, that base foundation model can be adapted to a specialized model. 0:55Need an AI model for programming language translation? 0:59Well, start with a foundational model 1:01and then fine tune it with programming language data. 1:04Fine tuning and adapting base foundation models rapidly speeds up AI model development. 1:10So, how do we do that? 1:12Let’s look at the five stages of the workflow to create an AI model. 1:17Stage 1 is to prepare the data. 1:24Now in this stage we need to train our AI model with the data we're going to use, 1:30and we're going to need a lot of data. 1:32Potentially petabytes of data across dozens of domains. 1:36The data can combine both available open source data and proprietary data. 1:41Now this stage performs a series of data processing tasks. 1:48Those include categorization which describes what the data is. 1:53So which data is English, which is German? 1:55Which is Ansible which is Java? That sort of thing. 1:58Then the data is also applied with a filtere. 2:03So filtering allows us to, for example, apply filters for hate speech, 2:08and profanity and abuse, and that sort of thing. 2:10Stuff we want to filter out of the system that we don't train the model on it. 2:15Other filters may flag copyrighted material, private or sensitive information. 2:20Something else we're going to take out is duplicate data as well. 2:25So we're going to remove that from there. 2:28And then that leaves us with something called a base data pile. 2:35So that's really the output of stage one. 2:39And this base data pile can be versioned and tagged. 2:43And that allows us to say, "This is what I’m training the AI model on, and here are the filters I used". 2:50It's perfect for governance. 2:52Now, Stage 2 is to train the model. 2:58And we're going to train the model on those base data piles. 3:01So we start this stage by picking the foundational model we want to use. 3:06So we will select our model. 3:10Now, there are many types of foundation models. 3:14There are generative foundation models, encoder-only models, lightweight models, high parameter models. 3:19Are you looking to build an AI model to use as a chatbot, or as a classifier? 3:24So pick the foundational model that matches your use case, 3:26then match the data pile with that model. 3:30Next we take the data pile and we tokenize it. 3:37Foundation models work with tokens rather than words, and a data pile could result in potentially trillions of tokens. 3:45And now we can engage the process of training using all of those tokens. 3:51This process can take a long time, depending on the size of the model. 3:55Large scale foundation models can take months with many thousands of GPUs. 4:00But, once it’s done, the longest and highest computational costs are behind us. 4:06Stage 3 is "validate". 4:10When training is finished we benchmark the model. 4:13And this involves running the model 4:15and assessing its performance against a set of benchmarks 4:18that help define the quality of the model. 4:20And then from here we can create a model card 4:26that says this is the model I’ve trained 4:28and these are the benchmark scores it has achieved. 4:32Now up until this point the main persona that has performed these tasks 4:37is the data scientist. 4:39Now Stage 4 is "tune", 4:42and this is where we bring in the persona of the application developer. 4:46This persona does not need to be an AI expert. 4:49They engage with the model, generating - for example - prompts that elicit good performance from the model. 4:55They can provide additional local data to fine tune the model 5:02to improve its performance. 5:04And this stage is something that you can do in hours or days - 5:08much quicker than building a model from scratch. 5:12And now we’re ready for Stage 5, which is to deployment the model. 5:19Now this model could run as as service offering deployed to a public cloud. 5:24Or we could, alternatively, embed the model into an application that runs much closer to the edge of the network. 5:33Either way we can continue to iterate and improve the model over time. 5:38Now here at IBM we’ve announced a platform that enables all 5 of the stages of this workflow. 5:45And It’s called watsonx and it’s composed of three elements. 5:49So we have: watsonx.data, watsonx.governance, and watsonx.ai., 5:59and this all built on IBM’s hybrid cloud platform which is Red Hat OpenShift. 6:10Now Watsonx.data is a modern data lakehouse 6:15and establishes connections with the data repositories that make up the data in Stage 1. 6:20Watsonx.governance manages the data cards from Stage 1 and model cards from Stage 3 6:26enabling a collection of fact sheets that ensure a well-governed AI process and lifecycle. 6:31And watsonx.ai provides a means for the application developer persona to engage with the model in Stage 4. 6:40Overall foundation models are changing the way we build specialized AI models 6:45and this 5-stage workflow allows teams to create AI and AI-derived applications 6:51with greater sophistication while rapidly speeding up AI model development.