Learning Library

← Back to Library

Data Scientist vs AI Engineer

Key Points

  • Generative AI’s rapid breakthroughs have spun off a distinct discipline—AI engineering—positioning AI engineers as the emerging “sexiest job” of the 21st century.
  • Data scientists act as “data storytellers,” using descriptive (EDA, clustering) and predictive (regression, classification) analytics to turn messy raw data into insights about past and future events.
  • AI engineers are “AI system builders” who leverage foundation models to create generative AI solutions that reshape business processes.
  • Their primary focus is on prescriptive use cases, such as decision‑optimization and recommendation‑engine design, which determine the best possible actions for an organization.
  • The speaker, a former data scientist turned AI engineer at IBM, outlines four key areas where the roles differ, emphasizing the shift from insight generation to actionable AI‑driven system design.

Full Transcript

# Data Scientist vs AI Engineer **Source:** [https://www.youtube.com/watch?v=Vxw0nE1qfZc](https://www.youtube.com/watch?v=Vxw0nE1qfZc) **Duration:** 00:10:32 ## Summary - Generative AI’s rapid breakthroughs have spun off a distinct discipline—AI engineering—positioning AI engineers as the emerging “sexiest job” of the 21st century. - Data scientists act as “data storytellers,” using descriptive (EDA, clustering) and predictive (regression, classification) analytics to turn messy raw data into insights about past and future events. - AI engineers are “AI system builders” who leverage foundation models to create generative AI solutions that reshape business processes. - Their primary focus is on prescriptive use cases, such as decision‑optimization and recommendation‑engine design, which determine the best possible actions for an organization. - The speaker, a former data scientist turned AI engineer at IBM, outlines four key areas where the roles differ, emphasizing the shift from insight generation to actionable AI‑driven system design. ## Sections - [00:00:00](https://www.youtube.com/watch?v=Vxw0nE1qfZc&t=0s) **Data Scientist vs Generative AI Engineer** - The speaker explains how the rise of generative AI has birthed a distinct AI engineering role, contrasting it with traditional data scientists across four key areas of work. ## Full Transcript
0:00for many years data science has been 0:02called the sexiest job of the 21st 0:04century but in recent years it seems 0:06like there's a new job buying for that 0:08title the AI engineer so who even are 0:11these New Kids on the Block are they 0:12just data scientists in Disguise what's 0:15up y'all I'm Isaac key and I'm a former 0:17data scientist turn AI engineer at IBM 0:20to answer these questions I'm going to 0:21lay out four key areas in which the work 0:23of a data scientist differs from an AI 0:26engineer specifically a generative AI 0:28engineer but before before I dive into 0:30these differences we first have to 0:32understand more about what's happening 0:33in the industry so traditionally data 0:36scientists have always used AI models to 0:39do their analysis so what's changed well 0:42with the Advent of generative AI the 0:45boundaries of what AI can do are being 0:46pushed in ways that we've never seen 0:48before and so these breakthroughs have 0:50been so 0:51groundbreaking that generative AI has 0:54split off into its own distinct field 0:56and we call that AI 0:58engineering Okay so now that we 1:00understand the landscape let's dive into 1:02the differences the first area of 1:04difference lies in the use 1:07cases so at a very high level think of a 1:09data scientist as a data Storyteller 1:12they take massive amounts of messy real 1:14world data and they use mathematical 1:16models to translate this data into 1:18insights on the other hand think of an 1:20AI engineer as an AI system builder they 1:24use Foundation models to build 1:26generative AI systems that help to 1:28transform business process 1:31so since data scientists are fantastic 1:33storytellers they use a lot of 1:35descriptive analytics to describe the 1:37past one example of this is through 1:40what's called exploratory data analysis 1:42or Eda which is all about graphing the 1:45data and doing statistical inference 1:48they can also do this through what's 1:50called 1:52clustering which group similar data 1:54points based off of similar 1:55characteristics such as say doing 1:57customer segmentation 1:59now every good story has the reader 2:01trying to figure out what's going to 2:02come next and that's where predictive 2:05use cases comes in as opposed to a book 2:08however a data scientist does not have 2:09the end already written so they have to 2:12use what are called machine learning 2:13models to to make their predictions an 2:16example of this is called regression 2:19models which predict a numeric value 2:22such as say a temperature or Revenue 2:25another type of these models are 2:27classification models which predict a 2:30categorical value such as a success or a 2:33failure so putting on the AI engineering 2:37hat now one of the main use cases that 2:39AI Engineers work on are called 2:41prescriptive use cases which are all 2:43about uh choosing the best course of 2:45action an example of this is a technique 2:49called decision 2:51optimization which enables businesses to 2:54assess a set of possible actions and 2:56then choose the most optimal path based 2:58off a set of requir requirements or 3:01standards another example of a 3:02prescriptive use case is through uh 3:05creating what are called recommendation 3:08engines uh as an example this can 3:11involve suggesting uh targeted marketing 3:13campaigns for a select customer 3:16base in addition to prescriptive use 3:18cases there are also generative use 3:20cases hence the name generative AI now 3:23Foundation models which why I will touch 3:26on more in a bit enable the creation of 3:28what are called intell 3:32assistants uh for example a coding 3:34assistant or a digital 3:36adviser they also enable the creation of 3:39chat Bots as an example which enable 3:43conversational search through 3:45information retrieval and the 3:46summarization of various content so 3:48after we have a use case identified we 3:50need 3:51data now people say that data is a new 3:55oil because like oil you have to search 3:57for and find the right data and then use 4:00the right processes to transform it into 4:02various products which then power 4:04various processes for a data scientist 4:07the oil of choice is often structured 4:09data AKA tabular data uh do note that 4:12data scientists still work with 4:14unstructured data but not as much as AI 4:16Engineers now these tables are often in 4:20the order of hundreds to hundreds of 4:23thousands of 4:24observations and they require a lot of 4:27cleaning and pre-processing before uh 4:29the data can be modeled uh some of the 4:31cleaning involved for example involves 4:34uh removing outliers or joining and 4:37filtering on a new table or even 4:40creating new features alog together this 4:45clean data is then used to train various 4:47machine learning 4:48models now on the other hand an AI 4:51engineer for them the oil of choice is 4:54mainly unstructured data such as text 4:57images videos audio files Etc 5:00uh let's take a text-based foundation 5:02model called an llm or large language 5:05model as an example these models require 5:08anywhere between billions to trillions 5:11of tokens of text to be trained on which 5:14is a lot larger scale compared to 5:16traditional machine learning models this 5:18leads me to the next area of difference 5:20which is the underlying 5:24models so the data science toolbox 5:27consists of hundreds of different models 5:30and different algorithms that they can 5:33choose 5:34from due to the nature of these models 5:37each different use case requires 5:39Gathering a different data set and thus 5:41requires training a different model and 5:43so as a result the scope of these 5:46individual models is a lot more narrow 5:50meaning that it's harder for them to 5:52generalize past the domain of data that 5:54they've been trained on and generally 5:57speaking these models are a lot smaller 5:59and size in terms of the number of 6:02parameters they take less compute power 6:05to train and do inference and they 6:08require less time to 6:09train anywhere between seconds to 6:13hours now on the other hand the 6:16generative AI toolbox is a lot less 6:18cluttered and it really only contains 6:20one type of model and that is called the 6:23foundation model now Foundation models 6:25are revolutionary because they allow for 6:27one single type of model to generalize 6:29to a wide range of tasks without having 6:31to be retrained thus their scope is 6:36called more 6:37wide and due to the sophistication of 6:40these models they are a lot larger in 6:43size often billions of 6:46parameters they acquire require a lot 6:48more compute power to train we're 6:51talking hundreds to thousands of 6:53gpus and they require a lot more 6:56training 6:57time now we're talking anywhere between 6:59weeks to 7:01months due to the differences in the 7:03intrinsic nature between traditional 7:05machine learning models and Foundation 7:07models this also means that the 7:09underlying processes and techniques that 7:13are used to develop Solutions with these 7:15also differ so a typical data science 7:19process will look something like this 7:21you start off with a use case and then 7:23from that use case you pick the right 7:26data then after that data is prepared 7:28you use it to to train and validate a 7:31model using techniques such as feature 7:34engineering cross validation or 7:37hyperparameter tuning as an example this 7:40model then is 7:41deployed at some endpoint for example in 7:44the cloud to do real-time prediction and 7:47inference now on the other hand the 7:50generative AI 7:52process also starts off with a use case 7:55but then we can skip directly to working 7:57with a pre-trained model 8:00and what makes this possible is a 8:01phenomenon called AI democratization 8:04which is a big fancy word that simply 8:05means making AI more widely accessible 8:08to Everyday users some of the best 8:10foundation models out there are 8:12published to open source communities 8:13such as hugging face and since these uh 8:16models are so generalizable and so 8:18powerful out of the box they make it 8:20easy for developers to get started AI 8:23Engineers interact with these Foundation 8:25models via natural language instructions 8:27to prompt them to do various tasks and 8:30this process is known as prompt 8:34engineering now prompt engineering can 8:36be used in conjunction with different 8:38Frameworks to then build larger AI 8:41systems an example of these Frameworks 8:43include uh as one chaining different 8:46prompts together or doing what's called 8:49parameter efficient fine-tuning or PFT 8:52on domain specific 8:54data or doing retrieval augmented 8:57generation AKA rag to ground answers in 9:00truth or even by creating autonomous 9:04agents uh to reason through very complex 9:06multi-step 9:08problems so these are just a few of the 9:10examples of the building blocks that can 9:12be used to build larger AI 9:14applications the last step is to then 9:17embed the AI in a larger system or 9:20workflow Um this can take on the form of 9:23creating assistants or virtual agents uh 9:26building a larger application uh with a 9:29UI or even doing some sort of 9:31automation so okay let's take a step 9:34back and let's look at all the 9:35differences at a very high level as we 9:37can see the breakthroughs in generative 9:39AI underpin many of the differences in 9:41the use cases data models and processes 9:45that data scientists and AI Engineers 9:47work on it's important to note that 9:49there is still overlap between the two 9:51fields for example uh data scientists 9:54will still work on prescriptive use 9:55cases or an AI engineer will still work 9:58with structured data 10:00regardless of these differences both of 10:02these fields are continuing to evolve at 10:04a blazing Fast Pace with new research 10:06papers new models new tools coming out 10:09every single day with data Ai and a 10:12creative mind really anything is 10:14possible with these thank you for tuning 10:16in I hope this was helpful until next 10:18time 10:20peace if you like this video and want to 10:22see more like it please like And 10:24subscribe if you have any questions or 10:26want to share your thoughts about this 10:27topic please leave a comment below e