Choosing Python vs R for Data Science
Key Points
- Your choice between Python and R should depend on factors like prior programming experience, the importance of visualizations, the type of analysis (ML vs. statistical), and what your teammates are already using.
- Python, released in 1989, is a general‑purpose, object‑oriented language prized for readability and backed by popular libraries such as NumPy, pandas, TensorFlow, and a Jupyter notebook workflow.
- R, introduced in 1992, is purpose‑built for statistical analysis and graphics, offering a rich ecosystem of CRAN packages, strong data‑modeling tools, and an RStudio IDE for reporting and visualization.
- Both languages are open source with vibrant communities, so the real advantage lies in leveraging each where it excels—e.g., using R for customer‑behavior analytics and Python for building machine‑learning or computer‑vision applications.
- Ultimately, rather than picking one “the best” language, most data scientists benefit from a hybrid approach, selecting the tool that best fits the specific problem at hand.
Full Transcript
# Choosing Python vs R for Data Science **Source:** [https://www.youtube.com/watch?v=4lcwTGA7MZw](https://www.youtube.com/watch?v=4lcwTGA7MZw) **Duration:** 00:07:08 ## Summary - Your choice between Python and R should depend on factors like prior programming experience, the importance of visualizations, the type of analysis (ML vs. statistical), and what your teammates are already using. - Python, released in 1989, is a general‑purpose, object‑oriented language prized for readability and backed by popular libraries such as NumPy, pandas, TensorFlow, and a Jupyter notebook workflow. - R, introduced in 1992, is purpose‑built for statistical analysis and graphics, offering a rich ecosystem of CRAN packages, strong data‑modeling tools, and an RStudio IDE for reporting and visualization. - Both languages are open source with vibrant communities, so the real advantage lies in leveraging each where it excels—e.g., using R for customer‑behavior analytics and Python for building machine‑learning or computer‑vision applications. - Ultimately, rather than picking one “the best” language, most data scientists benefit from a hybrid approach, selecting the tool that best fits the specific problem at hand. ## Sections - [00:00:00](https://www.youtube.com/watch?v=4lcwTGA7MZw&t=0s) **Choosing Between Python and R** - A quick decision guide helps listeners pick Python or R for data science based on their programming background, visualization needs, problem type, and team usage, while also discussing how to leverage both languages together. ## Full Transcript
python is an open source
programming language commonly used in
data science
as is
are
which one should you be using
at this point you might be expecting a
fence sitting well it depends kind of
answer but no i'm going to tell you
exactly which one to pick
right now so here goes
i ask you a question and based on your
answer you'll know
which language to go for ready
okay so do you have much in the way of
programming experience
none
use r
sum go for python lots
r again i'll i'll explain
okay question two do you care about
awesome looking visualizations and
graphics if yes
go with r
what about the problem you're trying to
solve machine learning stuff go with
python statistical learning r is your
best bet and finally what do most of
your colleagues use
use that
glad to get that off of my chest now we
could all just finish here and go about
our day but i'd like to explain a little
bit more about what these two languages
are and how they're best put to use
because increasingly the question isn't
which to choose but how to make the best
use of both programming languages for
your specific use cases
so let's start with the slightly older
of the two which is python
now python was released in 1989
and it's a general purpose
object-oriented programming language
that emphasizes code readability through
its oh-so-generous use of white space
and it's super popular just behind java
and c in popularity in fact
there are some awesome libraries that
support data science tasks so for
example we have numpty
it's actually num pi
num t
that's british slang
for an idiot
num t
and numpty is used for large dimensional
arrays and then for data manipulation we
have pandas
there are also specialized tools for
deep learning so you can use things like
tensorflow
and you'll often find yourself working
with python in
jupyter
notebooks
as your ide
now let's compare that
to r
which is optimized for statistical
analysis and data visualization so it
was developed just a little later in
1992
and it has a rich ecosystem with complex
data models and elegant tools for data
reporting there are thousands of
packages available via the comprehensive
r archive network otherwise known as
cran
and these things are for deep analytical
tasks
now r provides a broad variety of
libraries and tools for things like
cleansing data creating visualizations
and training deep learning algorithms
and r is commonly used with our
studio which is an integrated
development environment for simplified
statistical analysis visualization and
reporting so
both r and python are open source and
are supported by large communities
continuously extending their libraries
and tools
really the biggest differentiator is how
they are used and r as i've mentioned is
mainly used for statistical analysis
while python provides a more general
approach to data wrangling you might use
r for customer behavior analysis and
then you might use python to build a
facial recognition application
now right up front i said if you have no
programming experience
or quite a lot of programming experience
r was the better bet
if you fall somewhere in between then
python is easier to pick up but how can
how can that be
well
python is
multi-purpose it's considered a
multi-purpose
language
much like c plus and java are and it has
a readable syntax that's easy to learn
it's considered a good language for
beginner programmers or those with
experience in similar languages now r on
the other hand is built by statisticians
and leans heavily into statistical
models and specialized
specialized analytics
now novices can be running data analysis
tasks within minutes with just a few
lines of code using r but the complexity
of advanced functionality in r makes it
more difficult to develop expertise
now a few other considerations to keep
in mind and they all relate specifically
to
data
now when it comes to
data collection
so actually gathering the data in the
first place python supports all kinds of
data formats from comma separated value
files or csv files to jyson source from
the web in contrast r is designed for
data analysts to import to data from
things like excel and text files
now for data exploration
then you can use the pandas library to
filter sort and display data in a matter
of seconds if you use python and r on
the other hand is optimized for
statistical analysis so you can build
probability distributions or apply
different statistical models
and then finally data modeling
has some differences too python has
libraries for data modeling like numpty
in r you'll sometimes have to rely on
packages outside of r's core
functionality
did i see finally there's one more and
that's visualization and with
visualization r has the clear edge with
a base graphics module allowing you to
easily create basic charts and plots and
you can use ggplot2 for more advanced
plots such as complex scatter plots with
regression lines
r and python have their strengths but in
truth
most organizations use a combination of
both languages you might conduct early
stage data analysis and exploration in r
and then switch to python when it's time
to ship some data products so which
should you use
both you're probably going to use a bit
of both
and if you want to see more videos like
this in the future please like and
subscribe thanks for watching