Bag of Words: Concept & Applications
Key Points
- Bag‑of‑Words (B&G foods) is a feature‑extraction method that transforms text into numerical vectors by counting word occurrences, enabling machine‑learning models to process language data.
- A common application is email spam detection, where word frequency patterns help classify messages as legitimate or spam.
- The concept extends to visuals as “bag of visual words,” breaking images into key features (e.g., ears, whiskers) for tasks like object detection.
- Typical downstream uses include text classification, document similarity, and search‑query matching, though the approach has trade‑offs such as loss of word order and context.
Sections
- Bag‑of‑Words Feature Extraction Overview - The speaker introduces the Bag‑of‑Words technique, explaining how it converts text into numeric vectors for machine‑learning tasks such as spam detection, and outlines its advantages, limitations, applications, and possible improvements.
- Bag-of-Words Feature Construction - The speaker explains text classification and document similarity, then demonstrates building a vocabulary from two example sentences and converting those words into numeric bag‑of‑words features for a machine‑learning model.
- Bag-of-Words Feature Overview - The speaker explains how sentences are transformed into word‑count vectors, highlights the method’s simplicity and interpretability, and briefly points out its inherent drawbacks.
- Bag‑of‑Words Model Drawbacks - The speaker outlines how bag‑of‑words cannot capture word correlations, context, or order and suffers from sparsity, causing ambiguous and ineffective representations for machine‑learning tasks.
- N‑grams and Text Normalization - The speaker explains how to enhance a bag‑of‑words model by using n‑grams to capture consecutive word sequences and applying text normalization techniques like stemming to reduce words to their base forms.
- IDF Role in TF‑IDF - The speaker explains how inverse document frequency reduces the TF‑IDF weight of words that appear in many documents—preventing common stop‑words from dominating—and illustrates its application in tasks like classifying support tickets and other document‑classification problems.
- Word Embeddings and Sentiment Analysis - The speaker describes representing words as vectors in an n‑dimensional space to gauge semantic similarity, explains that such embeddings commonly use bag‑of‑words, and shows how bag‑of‑words models can identify sentiment and flag negative or discriminatory language.
Full Transcript
# Bag of Words: Concept & Applications **Source:** [https://www.youtube.com/watch?v=pF9wCgUbRtc](https://www.youtube.com/watch?v=pF9wCgUbRtc) **Duration:** 00:21:07 ## Summary - Bag‑of‑Words (B&G foods) is a feature‑extraction method that transforms text into numerical vectors by counting word occurrences, enabling machine‑learning models to process language data. - A common application is email spam detection, where word frequency patterns help classify messages as legitimate or spam. - The concept extends to visuals as “bag of visual words,” breaking images into key features (e.g., ears, whiskers) for tasks like object detection. - Typical downstream uses include text classification, document similarity, and search‑query matching, though the approach has trade‑offs such as loss of word order and context. ## Sections - [00:00:00](https://www.youtube.com/watch?v=pF9wCgUbRtc&t=0s) **Bag‑of‑Words Feature Extraction Overview** - The speaker introduces the Bag‑of‑Words technique, explaining how it converts text into numeric vectors for machine‑learning tasks such as spam detection, and outlines its advantages, limitations, applications, and possible improvements. - [00:03:06](https://www.youtube.com/watch?v=pF9wCgUbRtc&t=186s) **Bag-of-Words Feature Construction** - The speaker explains text classification and document similarity, then demonstrates building a vocabulary from two example sentences and converting those words into numeric bag‑of‑words features for a machine‑learning model. - [00:06:12](https://www.youtube.com/watch?v=pF9wCgUbRtc&t=372s) **Bag-of-Words Feature Overview** - The speaker explains how sentences are transformed into word‑count vectors, highlights the method’s simplicity and interpretability, and briefly points out its inherent drawbacks. - [00:09:19](https://www.youtube.com/watch?v=pF9wCgUbRtc&t=559s) **Bag‑of‑Words Model Drawbacks** - The speaker outlines how bag‑of‑words cannot capture word correlations, context, or order and suffers from sparsity, causing ambiguous and ineffective representations for machine‑learning tasks. - [00:12:23](https://www.youtube.com/watch?v=pF9wCgUbRtc&t=743s) **N‑grams and Text Normalization** - The speaker explains how to enhance a bag‑of‑words model by using n‑grams to capture consecutive word sequences and applying text normalization techniques like stemming to reduce words to their base forms. - [00:15:32](https://www.youtube.com/watch?v=pF9wCgUbRtc&t=932s) **IDF Role in TF‑IDF** - The speaker explains how inverse document frequency reduces the TF‑IDF weight of words that appear in many documents—preventing common stop‑words from dominating—and illustrates its application in tasks like classifying support tickets and other document‑classification problems. - [00:18:38](https://www.youtube.com/watch?v=pF9wCgUbRtc&t=1118s) **Word Embeddings and Sentiment Analysis** - The speaker describes representing words as vectors in an n‑dimensional space to gauge semantic similarity, explains that such embeddings commonly use bag‑of‑words, and shows how bag‑of‑words models can identify sentiment and flag negative or discriminatory language. ## Full Transcript
We are going shopping for a new concept to learn.
Keep your hands free because we are going to have a lot of bags to deal with.
You guessed it.
The topic for today is B&G foods.
B&G foods is a feature extraction
technique to convert text into numbers,
and it's exactly what it sounds like.
A collection of different words.
A great use case for B&G foods is spam filters in your emails.
For example, you might be receiving different emails
about the latest news,
maybe some interesting messages from your friends,
and perhaps a few spammy content.
Saying that you have won a lottery and you're about to become a millionaire.
Bag of words looks at the different words present
and the frequency in each of these emails and trusted in.
Which of these would be spam?
So today we are going to be looking at
what bag of words means, as well as some examples.
We will be looking at the pros and cons of bag of words,
certain applications,
and also modifications that we can use
to improve our bag of words algorithm.
Like I said,
bag of words is a feature extraction technique, which means that
all of your different texts or different words
are converted into numbers.
After all,
numbers is what our machine learning models understand.
I like to think of Bag of Words as a bag of popcorn.
Let's think of the different words as kernels of popcorn.
And each word represents a kernel.
Or rather, each kernel represents a different word.
The cool thing about Bag of Words is that it's not just limited to words, but
it can also be applied to visual elements,
which is bag of visual words.
Let's say, for instance, you have an image of a cat.
And yes, this is how I draw a cat,
but you can break down this image of a cat
into multiple different key features.
You could have an ear, you could have
whiskers, a body,
legs and a tail.
And each of these different elements help in multiple
computer vision techniques, such as object detection.
So you can use bag of words, not just in words,
but also on visual words, which is images.
Next, let's take a look at what bag of words looks like
for different sentences, and see the pros and cons for it.
Common MLP tasks where bag of words comes in handy is
text classification.
Let's say for example, spam or not,
you could have your email
and depending on what the words in that are, you could identify.
So this is an example of text classification.
Another example could be
that of document similarity
where perhaps you want to compare two different documents
and check how similar they are to each other.
Or maybe you have a particular query,
like the type you put in a search engine,
and you want to find the most relevant
documents.
Both of these examples text classification and document similarity.
Use bag of words in the back end.
Now let's take an example of two sentences
and see how we can convert the text other words
into features on numbers for our machine learning model.
To understand.
Consider two sentences.
Sentence number one
I think.
Therefore, I am.
And sentence number two.
I love learning.
Python.
Now that we have our two examples sentences,
what we are going to begin with is creating our vocabulary
or a dictionary, which is the set of unique words
set up in all of the given documents.
In our case, here are only two sentences that we are looking at.
But let's take a look at all the unique words present in here.
So we have
AI as a unique word.
Think.
Therefore.
AI has already been covered over here, so we move on to the next one.
I'm going to the next sentence.
AI is also included here.
Love learning
Python one?
That's 12345677.
Words are seven.
Unique words is what makes up our dictionary or
our vocabulary based on these two sentences.
Let's look at what the text representation of the bag
of words representation for each of those sentences would be,
and what we are constructing over here is called a document term matrix.
So here are our documents.
We consider our first document.
And these are the different terms or the vocabulary present in here.
So going over the first sentence I occurs a total of two times.
She look at the count of the word I of the particular words.
And you try to see how many times it occurs in that particular sentence.
So I have used a total of two times.
Think once.
Therefore once.
once.
And in our first sentence, love learning and Python do not appear,
which is why they get a score of zero.
Doing the same technique for our second sentence,
I appeals a total of one time.
Think therefore and are absent in that sentence, which is why
they get zero and love learning in Python, each of those other ones,
which is why they get one.
So what you're seeing over here is a vector of numbers
that represent the first sentence.
So we have now taken words and converted it into
a feature representation.
That is we have numbers over here, which is what our machine
learning models used to understand.
And similarly
this is the feature representation for our second sentence.
Now that we've seen
what bag of words looks like or how to calculate it,
the pros are kind of obvious.
It's simple, which is how you saw it.
You count the number of times particular word occurs, and you denote that count
to that particular position for that sentence.
It's easy, which is what we did over here.
And it's explainable
as opposed to certain other algorithms
that maybe are not as intuitive.
Unfortunately, as with all things in life.
There are going to be pros and there are going to be cons.
Next, we'll take a look at the cons of the simplistic algorithm
and see if we can modify it to make it work better for us.
Let's look at some of the drawbacks associated with bag of words.
The first one being a compound word.
Think about words like AI,
artificial intelligence, or New York.
In a simplistic bag of words approach.
You break down artificial and intelligence, and now they are treated
as two separate words with no correlation or no meaning between the two.
That would apply to New York as well, where new is
one word and York is another word.
In this case, we are losing this semantic or the meaning
that exists between the two words, which is a drawback.
Let's look at another example.
Perhaps kick.
And baking.
Maybe racing as well.
Given these three words
cake baking and racing, cake and baking are more likely to co-occur to occur
in the same context, in the same documents
as opposed to cake and racing.
Well, of course, if tomorrow somebody invents
a new sport called cake racing, that's going to change.
But let's hope it doesn't.
In this case, our Bag of words model is not able
to associate the correlations that exist among the words,
which might pose a problem to our machine learning models.
Let's look at another
drawback of Polyphemus words.
Consider the word biting.
Looking at just this word, it's hard to tell
if I'm talking about biting the programing language or Python.
The animal.
Maybe there's another word
that's content or content.
It could mean either of the two, but just looking at the spelling, it's
hard to see which is which.
Another drawback that exists
is that we lose the order associated between the words.
Like I mentioned, Bag of words is nothing but a bag of popcorn,
with each of the kernels being a specific word.
And when you shake that bag, you lose all of the relationships that exist.
As far as the order of the words is concerned.
Let's say, for example,
I have a sentence that says flight
San Francisco,
Mumbai from
unto.
What does this mean?
Am I trying to fly from
San Francisco to Mumbai?
Am I trying to fly the other way around
from Mumbai to San Francisco?
It's hard to tell when we have only
the bag of words available.
Last but not the least
is the problem of sparsity
in our bag of words approach.
We look at each of the unique words which makes up our vocabulary,
and denote the presence of that particular word in a sentence
given a large number of documents.
You could have a very, very high number of vocabulary or words.
Yet in each of the sentences,
there could be maybe only three words or a very, very small proportion of words
that actually are present with most of the other spaces being zeros.
This leads to the problem of sparsity.
Since our matrix are our vectors, over, here is sparse
in the sense most of these elements are unoccupied
because they're denoted by zeros, and very few of them are actually present.
This could also pose a challenge with our models.
Fear not though.
Despite these drawbacks, we do have a certain modification in mind.
Let's take a look at some of the modifications that can help
improve our bag of words.
Approach.
Our first modification is n grams.
Instead of looking at each individual word,
you can now look at a combination of words that occur together.
For example, in our artificial intelligence
being the phrase we don't break it into artificial intelligence,
but now we look at the presence of artificial and intelligence together
and denote how many times it occurred in a particular document.
Similarly, for New York, we look at the presence
of New and York right after each other
and denote the number of counts or the times a duck goes in that document.
In this case, since our words
are made up, or our faces are made up of two words
and is equal to two, you could extend this with n
is equal to three and is equal to five, so on and so forth.
In which case you would look at, for example, if n is equal to three,
you would look at three words that occur right next to each other.
So maybe it is Python
artificial intelligence.
And any time these three words occur in your document,
you would count the number of times that happens.
And given the occurrence in here.
Another modification
that we can do is text normalization.
Text normalization refers to certain preprocessing activities
that you can do before you pass on the text to your bag of words.
Model.
A good example for this is the process of stemming,
in which case
you're trying to remove the ends of the words
in the hope of getting back to its base word or its base stem.
Consider the words coding
coded
codes and code.
When you start removing the ends of the words.
You can try to get to its base word,
which is called in this case.
This is a way to reduce the number of vocabulary
or reduce your dictionary words,
and hopefully that will help with the sparsity issue.
An important concept that builds upon bag of words
is Tf-Idf or term frequency.
Inverse document frequency.
You can think of Tf-Idf
as a weight or a score associated with words,
or perhaps even a feature scaling technique.
TF is the term frequency,
or the number of times a particular word occurs in your document.
Let's say the words votes.
President.
Governments occur a lot of times in your document.
Probably has something to do with
maybe elections or some other government matter.
So higher the term frequency
higher is the score or the weight associated with that word.
That makes sense with inverse document frequency.
However, if you look at the number of documents.
That that particular word occurs in.
And if that word occurs in multiple documents
or a huge proportion of documents, you actually give it a lower score.
So the more number of documents the word occurs in, the lower
the IDF score and the lower the whole Tf-Idf score becomes.
This may seem a little counterintuitive, right?
It's opposite of the term frequency.
But I give you the example of words like d,
un and some.
But what's it?
Don't really have any meaning on their own,
but they're used to create grammatically correct sentences.
As you can imagine, in an English language or a lot of documents
with English language in it, these words would occur a lot of times.
Perhaps, maybe even the most frequently occurring word.
In that case, we do not want
these words to have a high tfidf score,
which is where the IDF component lowers their score.
As these scores are not representative of the topics
or the sentence of the documents.
Let's take a look
at some applications of tf IDF.
Let's consider
document classification as an example.
Perhaps you have a company and a product that you're selling to your customers,
and you have a support channel for them to come and read
certain concerns, complaints or questions about your product.
Maybe you have a chat associated
with your customers or some support tickets,
and you could use the bag of Words approach
to understand which of the teams
are associated with the problem that is there in the ticket.
Maybe you have a building team
or an onboarding team.
Or a trial team.
Or maybe it's a documentation issue.
Looking at the vocabulary
that is present, that is looking at the bag of words, representation
of what is entailed in the customer chat or the support ticket.
You will then be able to identify which of these teams is it, right,
and the appropriate team to deal and resolve the customer's issue.
Another example of bag of words
is what to make.
You might have heard of a to.
These are word embeddings
that exist in an n dimensional space.
Your words are represented
as vectors in this n dimensional space.
For example, king and queen are two words,
and the closer the words are in this n dimensional space,
that means they are more related to each other.
In this case, king and queen would be fairly close to each other,
as you would find documents or sentences where king and queen
appear together.
Maybe you have another word
swim, that comes in those documents as well,
but you wouldn't really associate swim with king or queen
as much as you would with king and queen with each other.
So swim would be further away from the vectors of king and queen.
This is called what to work on word embeddings,
and it does use bag of words as a back end to create this n dimensional space.
Another example where
bag of words comes in handy is for sentiment analysis.
You could look at the collection of words in a given text,
and understand if a lot of those words are positive.
Maybe words like happy, joy,
excited, or words that are negative,
frustrated, angry, hate, terrible.
And depending on the bag of words representation, you would be able
to identify with sentiment at false positive or negative.
You could even take this further and try to create a model
that helps to keep speech.
So you would look
at the negative sentiments or the negative words present in there,
and maybe extended with other words, for example, racism
or other discrimination forms, and try to create a model that helps
you distinguish these annoying or unexpected texts on the internet.
Now that you have this concept in the bag,
I hope this helps you understand a little more about natural language
processing and encourages you
to continue your journey into the field of artificial intelligence.
If you like this video and want to see more like it, please like and subscribe.
If you have any questions or want to share your thoughts about this topic,
please leave a comment below.