Leveraging Open Source in Watson X
Key Points
- IBM is extending its long‑standing open‑source heritage to Watson X, using community‑driven tools to deliver the best AI models and innovation.
- Watson X’s model‑training and validation layer is built on the open‑source CodeFlare project, which abstracts scaling, queuing and deployment by integrating Ray, Kubernetes (OpenShift) and PyTorch.
- CodeFlare automatically provisions clusters, queues jobs, scales resources up or down when needed, and tears down the environment after training, freeing data scientists from infrastructure concerns.
- The platform represents and runs models with PyTorch, leveraging its tensor operations, GPU support and distributed‑training capabilities for large foundation models.
- Complementary open‑source components also handle model tuning/inferencing and data gathering/analytics, completing an end‑to‑end AI lifecycle in Watson X.
Full Transcript
# Leveraging Open Source in Watson X **Source:** [https://www.youtube.com/watch?v=Cgiqx0pJuLo](https://www.youtube.com/watch?v=Cgiqx0pJuLo) **Duration:** 00:07:31 ## Summary - IBM is extending its long‑standing open‑source heritage to Watson X, using community‑driven tools to deliver the best AI models and innovation. - Watson X’s model‑training and validation layer is built on the open‑source CodeFlare project, which abstracts scaling, queuing and deployment by integrating Ray, Kubernetes (OpenShift) and PyTorch. - CodeFlare automatically provisions clusters, queues jobs, scales resources up or down when needed, and tears down the environment after training, freeing data scientists from infrastructure concerns. - The platform represents and runs models with PyTorch, leveraging its tensor operations, GPU support and distributed‑training capabilities for large foundation models. - Complementary open‑source components also handle model tuning/inferencing and data gathering/analytics, completing an end‑to‑end AI lifecycle in Watson X. ## Sections - [00:00:00](https://www.youtube.com/watch?v=Cgiqx0pJuLo&t=0s) **IBM Watson X Open‑Source Stack** - The segment explains how IBM’s Watson X platform uses open‑source tools—especially the CodeFlare project integrating Ray, Kubernetes (OpenShift) and PyTorch—to streamline model training, tuning/inferencing, and data analytics for large foundation models. ## Full Transcript
IBM has a rich history of both
contributing to open source and
leveraging open source in its offerings
and IBM continues that tradition with
Watson X
what is Watson X well that's our new
Enterprise platform for AI and data and
why do we leverage open source in Watson
X well open source gives us the best AI
it gives us the best innovation
and it gives us the best models
and so today we're going to look at the
open source that's in Watson X and we're
going to look at it from three different
aspects we're going to look at it from
model training and validation
we're going to look at it from model
tuning and inferencing
and we're going to look at it from data
Gathering and Analytics
okay let's get started with model
training and validation
training and validating models can take
a large amount of cluster resources
especially when the models we're looking
at are those huge multi-billion
parameter Foundation models that
everyone's talking about
so to efficiently use a cluster and to
make it easier for data scientists we
have an open source project called code
flare
code Flair provides user-friendly
abstractions for scaling queuing and
deploying machine learning workloads it
integrates Ray kubrey and pytorch to
provide these features with Ray it
provides a job abstraction kubrey allows
Ray to run on kubernetes platforms like
openshift and we'll talk a little bit
more about pytorch in a minute
let's look at a typical code flare use
case again the first thing it's going to
allow us to do is spin up array cluster
it's then going to allow the data
scientist to submit training jobs to the
cluster if the openshift cluster is
heavily used and there aren't resources
available code flare is able to actually
cue the jobs and wait till there's
resources available to run the jobs
and in some cases if the cluster is full
it can actually be scaled up and so it's
possible to actually scale up the
cluster in certain cases from code flare
and then when all the training and
validation is done it can actually
delete the array jobs and and take them
off the cluster
so again what's nice about code flare is
it enables the data scientist to
efficiently use a cluster or in some
cases multiple openshift clusters
and not have them worry about the
infrastructure underneath
we just looked at how we run model
training and validation on a cluster but
now let's look at how we actually
represent those models
and the open source project that we use
to represent the models is pytorch
pytorch provides some key features for
representing models one of which is
tensor support
what's a tensor well it's a huge
multi-dimensional array that supports
all those weighted values or
probabilities that are in the model that
we tweak over time to get the model
right to be able to predict things
correctly
the other key feature that pytorch
provides are GPU support and distributed
training
when we train the models we're actually
doing large amounts of computation and
the gpus that Pi torch is able to
effectively use allow us to do that very
efficiently and pytorch also provides
distributed training so with those large
Foundation models that wouldn't fit on a
single machine pytorch enables us to do
distributed training across a large
number of machines
let's look at the key features that
pytorch provides one of which is neural
network creation there's different types
of neural networks and pytorch makes it
easy to create all the different popular
types of neural networks
pytorch also provides easy loading of
data another key feature of of Pi
torch's training Loops so built-in easy
to use training Loops that are tweaking
the model data to improve its ability to
more accurately provide inferencing
and finally pie torch also provides
built-in model adjustments the key one
here is the auto gradient calculation so
think from your Calculus days when
you're calculating gradients having that
feature built in making the minor tweaks
to the model so that it improves it and
gets it over time to to do it be a
better predictor and better usage this
is what pytorch provides
we just looked at how to represent
models but now let's look at model
tuning and inferencing and what do we
mean by this well we want to be able to
serve a large number of AI models and be
able to do it at scale on openshift
so the open source projects that we look
at the first key one is k-serv model
mesh so this is what we used to actually
serve up the models and originally there
was just k-serv which would allow us to
put one model in a single pod so one pod
one pod per model
uh that's not very efficient at all and
k-serv was merged with another open
source project called Model mesh and
model mesh is much better at at being
able to efficiently get large thousands
of models in a single pod
so between these two technologies we're
able to serve up thousands of models
efficiently on an openshift cluster now
where are we going to find all these
models well
hugging phase has over 200 000 models
open source models it's typically
referred to as the GitHub of models and
IBM has a partnership with hugging face
and again it's a great place to find
great models to use on our IBM Watson X
offerings the other key open source
Technologies we have are kkit kkit is an
open source project that provides apis
for prompt tuning so again
typically on the inferencing side you're
you're serving up the models but you
also in some cases need to do a little
bit of tuning to to improve the results
and kkit provides tuning apis to do that
the next technology is kubeflow kuflow
provides orchestration of machine
learning workloads and again allow you
to build those machine learning
pipelines that you that you need to make
life easy so again we have a a wonderful
large number of Open Source projects
that provide our prompt tuning and
inferencing all running on openshift
now let's switch gears and look at data
Gathering and Analytics and the open
source project that we use for that is
presto
what is presto presto is an SQL query
engine and it's used for open data
analytics and for the open data lake
house
and let's look at the key features that
it provides
high performance
Presto is highly scalable
it provides Federated queries
and it's able to query the data where it
lives
I hope I've convinced you that Watson X
has continued IBM's long tradition of
contributing to open source and
leveraging open source and its offerings
if you'd like to learn more please check
out the links below