NoSQL: Practical Flexibility Guidelines
Key Points
- NoSQL databases embrace flexible, semi‑structured JSON documents (collections of JSON objects) instead of rigid rows and columns, allowing them to handle real‑time, unpredictable data and evolving user behavior.
- Despite the “Not Only SQL” name, NoSQL systems still support relational features such as joins, lookups, and indexing, but they store data as collections (similar to tables) of unique JSON objects.
- For use cases like product catalogs with highly variable attributes, NoSQL avoids the relational pitfalls of multiple tables or massive denormalized tables full of nulls, offering a more natural, schema‑agnostic representation.
- This flexibility makes NoSQL especially suited for event‑driven, high‑transaction workloads, APIs, data pipelines, and AI model integrations where rapid iteration and performance are critical.
- By applying practical NoSQL guidelines, developers can leverage its performance and adaptability today rather than treating it as merely a buzzword.
Sections
- Embracing NoSQL Flexibility for Modern Apps - The speaker outlines practical guidelines for using semi‑structured JSON‑based NoSQL databases—highlighting their ability to handle relations, joins, indexes, and dynamic data—to improve flexibility and performance in real‑time applications and data pipelines.
- JSON Flexibility Over Relational Schemas - The speaker explains how storing product data as JSON in a NoSQL database avoids null‑filled columns and rigid schemas, allowing new products and variable attributes to be added without altering table structures.
- Optimizing Sensor Data Grouping - It explains how aggregating sensor JSON records into time‑ or ID‑based groups reduces read latency while preserving write performance.
- JSON Comment Overflow Strategy - The speaker suggests using an overflow sub‑object to shift comments beyond a set threshold (e.g., 50) into a separate section, keeping the main JSON compact and improving read/write performance.
- Precomputing Comment Metrics in NoSQL - The speaker explains how to avoid costly real‑time calculations by incrementally updating aggregate values like comment counts, leveraging NoSQL’s flexible data models for high‑throughput applications.
Full Transcript
# NoSQL: Practical Flexibility Guidelines **Source:** [https://www.youtube.com/watch?v=pYK4No7ACRE](https://www.youtube.com/watch?v=pYK4No7ACRE) **Duration:** 00:14:39 ## Summary - NoSQL databases embrace flexible, semi‑structured JSON documents (collections of JSON objects) instead of rigid rows and columns, allowing them to handle real‑time, unpredictable data and evolving user behavior. - Despite the “Not Only SQL” name, NoSQL systems still support relational features such as joins, lookups, and indexing, but they store data as collections (similar to tables) of unique JSON objects. - For use cases like product catalogs with highly variable attributes, NoSQL avoids the relational pitfalls of multiple tables or massive denormalized tables full of nulls, offering a more natural, schema‑agnostic representation. - This flexibility makes NoSQL especially suited for event‑driven, high‑transaction workloads, APIs, data pipelines, and AI model integrations where rapid iteration and performance are critical. - By applying practical NoSQL guidelines, developers can leverage its performance and adaptability today rather than treating it as merely a buzzword. ## Sections - [00:00:00](https://www.youtube.com/watch?v=pYK4No7ACRE&t=0s) **Embracing NoSQL Flexibility for Modern Apps** - The speaker outlines practical guidelines for using semi‑structured JSON‑based NoSQL databases—highlighting their ability to handle relations, joins, indexes, and dynamic data—to improve flexibility and performance in real‑time applications and data pipelines. - [00:03:08](https://www.youtube.com/watch?v=pYK4No7ACRE&t=188s) **JSON Flexibility Over Relational Schemas** - The speaker explains how storing product data as JSON in a NoSQL database avoids null‑filled columns and rigid schemas, allowing new products and variable attributes to be added without altering table structures. - [00:06:14](https://www.youtube.com/watch?v=pYK4No7ACRE&t=374s) **Optimizing Sensor Data Grouping** - It explains how aggregating sensor JSON records into time‑ or ID‑based groups reduces read latency while preserving write performance. - [00:09:18](https://www.youtube.com/watch?v=pYK4No7ACRE&t=558s) **JSON Comment Overflow Strategy** - The speaker suggests using an overflow sub‑object to shift comments beyond a set threshold (e.g., 50) into a separate section, keeping the main JSON compact and improving read/write performance. - [00:12:27](https://www.youtube.com/watch?v=pYK4No7ACRE&t=747s) **Precomputing Comment Metrics in NoSQL** - The speaker explains how to avoid costly real‑time calculations by incrementally updating aggregate values like comment counts, leveraging NoSQL’s flexible data models for high‑throughput applications. ## Full Transcript
Data doesn't always fit into neat rows and columns, so why force it?
Let's explore how NoSQL databases offer the flexibility that modern applications need.
Our applications aren't built in a vacuum.
They run on real-time data, unpredictable inputs, and ever-changing user behavior.
So whether we're building APIs, crafting different data pipelines,
or even building AI models, we want to make sure that we're leveraging NoSQL.
to help us create all these great, efficient systems.
So in this video, we're gonna define practical NoSQL database guidelines
so that you can bring the flexibility and performance that comes from NoSQL technologies into your applications.
By the end, you'll be an expert in NoSQL and understand
how it isn't just a buzzword and how it can actually apply to your workloads today.
So first of all, there's been a lot of information about NoSQL.
And just to clear the air, it actually stands for not only SQL.
So that means that, of course, it can still handle relations.
It can still do joins.
It can so do lookups.
There are indexes.
There are all sorts of NoSQL databases.
And for the context of today's conversation, we're going to talk about a semi-structured JSON object focused database.
So this is going to have groups of JSONs that essentially would represent the NoSQL database structure.
So these groups are referred to as collections or sets.
That's very similar to a table in a relational database.
And then instead of a row, there's gonna be a unique JSON object.
So these are great for event-driven, highly transactional workloads
so that we can move quickly with our data and have enough flexibility.
So let's go into some examples.
This is where NoSQL can really do well.
So think you have a product catalog, right?
So with a product catalog, there might be variable, different products.
And they might, in a relational database, for example, because they might be different,
you actually are going to store them in completely separate tables.
So even though you have three different products, you're going to actually have to break them out.
Even though they maybe have some fields that overlap,
there's just enough that are different that you have to keep them separate.
Let's just represent these by shapes.
If they weren't that different, another pattern we see in relational databases
is to keep everything what's called denormalized in one big table.
So your standard columns are here.
Every product has a name.
Every product as a manufacturer.
But then what happens is that for each unique
set of attributes for each different kind of product would have its own unique rows. So you would see that.
kind of broken out three different ways.
And then let's say that this was a, you know, particular kind.
It would basically align its attributes.
And then when it fill in anything for the other ones, and then it would kind of follow in like this for the others patterns.
So the problem with this being that there's a lot of nulls that end up happening.
And basically this table just can continue to grow.
So this is where NoSQL can come in handy, because when things are held in a JSON,
we have much more flexibility with what's actually needed
to be held and that the schema isn't as strict as in we don't have to come up with a
set list of columns that always have to be written to or accounted for.
So with a JSON basically what we would do is you would just have your intro information
on the product.
And then you might, because you have these variable differences, have your details.
as just another sub-object and that's where you would put them and then you'd be able to close it all out.
And so this is a much more flexible model because this is the area here where you have your variation,
so that you can make sure that that variable data is maintained in the JSON,
but this is much more maintainable overall because go through the scenario of let's say a new product comes online.
Now in this scenario, you have to either build a whole new table or you have build new columns.
In this case, you have do nothing.
You can still insert the JSON just like normal.
You're just gonna change the product name to whatever this new product is
and you have enough flexibility that you can just add in these details as needed.
Now, if you wanted to make certain fields required, you can make this as strict as you
want so that it basically could reflect this, but then you lose some of that good flexibility.
So as you can see,
there are some situations where just a NoSQL storage method
is really going to be more advantageous to use just for a maintainability perspective alone.
Another great use case for NoSQL is when working with sensor data or the internet of Things.
let's imagine we have a system where we have tons of sensors
that are constantly pinging your system with some kind of update.
Maybe it's taking the temperature, maybe it's just checking a status, but regardless,
you can multiply that every so many seconds by 100 different sensors,
you're gonna have a lot of information coming in constantly streaming.
So how would this be solved in a relational database?
You would see just probably one row for a transaction.
In this case, we're going to see an individual JSON that's going to come in
so that it basically will continue to be very optimized for your rights.
So you can get that data in very quickly.
So that's really going to help not only just with the speed that things
need to operate, but we can actually optimize our reads as well with NoSQL.
This is where we have the opportunity to actually group these.
So let's say our sensor data, we wanna look across time.
We can actually aggregate all these little JSONs as actually an array or a nested group,
within one larger JSON that represents all the transactions that came in in the last hour.
So basically we can break this out in a couple different ways so that when we are doing our reads,
we don't have to iterate through millions of objects to check where the time stamps are accurate.
We can actually be much more focused and just pull one JSON that has
every single transaction that happened from the last hour and that'd be a much more effective lookup
than going through and trying to find each individual rows.
So this is actually going to help not only optimize our reads, but it's not going to compromise your writes at all.
And this is also another great way, if you don't want to partition it or break up by different timestamps.
Another example is to break it up by categories.
You could do this by maybe unique sensor or let's say each sensor has an ID that's like a number.
These groups can actually represent, if you want to just randomize the groups,
you can come up with just the first two or three digits of the unique ID
so that you're not actually creating a separate group for every single different sensor itself,
but a group of sensors.
And that's another way that you can then tune it so you get the right grouping to optimize your reads.
All right, let's keep thinking about ways that we can use NoSQL databases to help us support modern data workloads.
So here's a common example of social media posts.
Think of this as any kind of situation whereas with a relational database,
you might see what we call a one-to-many relationship.
So that's usually where you have one parent table,
and then it has a child table, and it's usually depicted in this way if you've ever seen a architectural diagram for a database.
So that doesn't really exist within NoSQL.
It can to some degree, but generally what we wanna do first is we wanna nest certain objects.
So let's think about a social media post.
there's usually comments, right?
And those comments can only relate to one post.
So that relationship is pretty well defined.
And why might we want to actually nest the comments in with the post?
Because they're displayed together.
So in the world of relational databases, you're going to be constantly joining between these two tables.
Now, relational databases are built for joins, so there really isn't going to a problem with that.
However, there might be a better way to do this with NoSQL.
Especially when you have one-to-many relationships where this is fine if it's probably like a social media post by you or by me,
and we don't get that many comments.
There's there's only a couple because generally with this you want to make sure your JSON doesn't get blown out,
but what happens when there's 500 comments in the case of relational database?
You just continue to have this one-money relationship,
but you're gonna run into trouble here with your Your NoSQL solution.
So what we can actually do here is we can build what's called like an expansion set of the comments.
So basically you'll just be able to create a whole new, so this is just an example of one JSON,
but we'll be creating a new example just of kind of a subset of the comments. So, we can call this, like, overflow.
And from here, you would just then have the exact same structure.
So we'd have like our comments
just like before.
And basically from here, we can actually have different amounts of comments.
So you can kind of decide where this threshold really is going to be.
For example, let's say we don't want to display more than 50 comments at once.
And anything over 50, so 51, that's where you're going to create this overflow object,
so that we can basically move them over here.
And this allows you to optimize most posts because they aren't going to be hitting this overflow amount,
but we have a control for those exceptional cases where we see a lot.
And this is going to help optimize not only the reads but also the writes as well.
Now it will have to do a little bit more logic in the writes to kind of make it Oh, am I at 50?
Is it 51?" to decide whether or not it goes here,
but that's kind of a small price to pay, to actually have the optimization that's needed
to allow for this kind of really fast readability that you're going to get out of this object.
So then, thinking about it in another way, maybe we don't want to display comments by chronological order,
that we want to actually display the top three ranked.
You can actually do this in the exact same way.
just you would basically limit.
these to only showing three instead of 50
and then you can put anything that isn't in those top three and push them to this overflow.
Again this is going to seriously optimize your reads because then your the actual post JSON is going remain very very small
and it basically prevents you from having to read every single comment every single time you post these posts.
and then recalculate what is the top three ranked comments based on whatever algorithm you have.
So by having this basically pre-filtered and pre-ranked or sorted, you're going to get a lot of benefit.
Again, the complexity here comes in your rights that you will have to, every time
a new comment comes in, you'll have to recalculate that ranking.
But if you do the equation of how often am I writing a comment versus reading one?
because the read is so much higher and more important.
It's a, you know, it's a fine trade-off.
Now, lastly in this example, let's think about maybe we want some kind of summary statistics
because we don't, similar to what I was saying with the ranking, we don' want to have to calculate that every time.
So maybe like a total, you now, comment count would be needed.
So that might be something that we end up adding over here.
So that, you know, let's say it's,
there's 4,200 different comments that are written out.
And from there, we can basically find that we can do that math ahead of time based on every time we add a comment.
We can just basically add a plus 1 every time that we add one in.
that's going to help us.
And by pre-writing out all these calculations,
that can get displayed very easily, and no recalculations really need to be done.
So from there, you can see how we're able to do these kind of optimizations
so that we're well aligned to the kind of high demand workflows that we are supporting in our modern applications.
Today, we've demonstrated the strength of NoSQL and that the strength here really lies in its flexibility.
Variable data models, nested objects, and efficient grouping strategies allow NoSQL to really shine.
These patterns allow you to build fast, scalable applications that adapt to dynamic, real-time data.
Mastering these strategies means you're not just reacting to data, you're staying ahead of it.
So I encourage you to keep refining these skills because the future of data engineering,
software development, and data science will depend on building systems that move as fast as the data itself.