Learning Library

← Back to Library

Elasticsearch: Scalable Distributed JSON Database

9m • Unknown Channel • databases • tutorial • intermediate • Watch on YouTube ↗

Key Points

Elasticsearch is a distributed, NoSQL JSON‑based datastore that scales automatically and continuously ingests large volumes of data.
It is accessed via a RESTful API, allowing you to create indexes, query, and manage data entirely through HTTP calls.
Common use cases include aggregating logs, metrics, and application tracing data into searchable JSON documents for real‑time retrieval.
Compared to relational databases, Elasticsearch uses “indexes” (instead of databases), “index patterns” or “types” (instead of tables), “documents” (instead of rows), and “fields” (instead of columns).
This terminology shift reflects its schema‑flexible nature and highlights how Elasticsearch differs from traditional RDBMSs like MySQL or PostgreSQL.

Sections

00:00:00 Introducing Elasticsearch: Scalable NoSQL Store - IBM developer advocate Jamil Spang outlines Elasticsearch as a distributed, JSON‑based NoSQL database, contrasts it with relational systems, highlights its REST‑ful API, and cites common use cases such as log aggregation, metric collection, and application tracing.

Full Transcript

# Elasticsearch: Scalable Distributed JSON Database **Source:** [https://www.youtube.com/watch?v=ZP0NmfyfsoM](https://www.youtube.com/watch?v=ZP0NmfyfsoM) **Duration:** 00:09:54 ## Summary - Elasticsearch is a distributed, NoSQL JSON‑based datastore that scales automatically and continuously ingests large volumes of data. - It is accessed via a RESTful API, allowing you to create indexes, query, and manage data entirely through HTTP calls. - Common use cases include aggregating logs, metrics, and application tracing data into searchable JSON documents for real‑time retrieval. - Compared to relational databases, Elasticsearch uses “indexes” (instead of databases), “index patterns” or “types” (instead of tables), “documents” (instead of rows), and “fields” (instead of columns). - This terminology shift reflects its schema‑flexible nature and highlights how Elasticsearch differs from traditional RDBMSs like MySQL or PostgreSQL. ## Sections - [00:00:00](https://www.youtube.com/watch?v=ZP0NmfyfsoM&t=0s) **Introducing Elasticsearch: Scalable NoSQL Store** - IBM developer advocate Jamil Spang outlines Elasticsearch as a distributed, JSON‑based NoSQL database, contrasts it with relational systems, highlights its REST‑ful API, and cites common use cases such as log aggregation, metric collection, and application tracing. ## Full Transcript

0:00would you believe me if i told you that 0:02there's a database out there that can 0:04continuously handle large volumes of 0:07information 0:08scale automagically 0:10and be available to keep on continuously 0:13taking on data 0:15hello my name is jamil spang developer 0:17advocate with ibm 0:19and today's topic is the answer to that 0:22elasticsearch 0:24all right it's a great database data 0:27store and i want to talk a little more 0:28about the some of the characteristics of 0:31it we're going to compare it to a 0:33relational database management system 0:35and then talk about the ecosystem that 0:37comes with it 0:39so to get started let's talk about what 0:42is elasticsearch exactly well 0:45first it is distributed in nature 0:48and it is a nosql 0:51json based 0:54datastore 0:56we're going to abbreviate that with the 0:57ds there as well 0:59um 1:00so um 1:02on the spectrum of where databases fall 1:04with postgres in my sequel kind of being 1:07the most structured type of databases 1:10put this on the outer sphere 1:12past mongodb when it comes to how 1:15unstructured and nosql it can be 1:18when it comes to interacting with 1:20elasticsearch interest interests 1:23interestingly enough it's done through a 1:27restful 1:31api 1:33so all your queries happen that way you 1:35programmatically 1:37program all your indexes and all the 1:40stuff that you pretty much anything you 1:42need to interact with it would be 1:43through rest urls 1:45and a lot of the major use cases for 1:48this 1:48you know could be 1:50you can take many different data sources 1:52from logs it could be 1:55any type of metrics you have from 1:57different systems 1:58and maybe even some application 2:01trace data 2:03that comes in and you can have one 2:05system that you can combine all of this 2:08you think about data coming from all 2:09these different sources 2:12and it being able to 2:13uh push them into json documents and 2:16then allow you the ability to search 2:19and get that information back in real 2:21time 2:23so it sounds like a big job that it has 2:25to do and certainly let's 2:27do it from our normal comparison of what 2:30a relational what we know of from 2:32relational databases 2:34to see how that compares and how the 2:36lingo and the context changes well we 2:39know that with relational database 2:42management systems they are called 2:44databases 2:46and in 2:48elasticsearch these are known as 2:52indexes 2:56or 2:57i 2:58n d i c 3:00e s indexes all right 3:03and also in a 3:05uh 3:06relational database we have the term of 3:09tables 3:10okay 3:11and in 3:13this they're going to be called 3:15it could be called kind of index 3:16patterns 3:18and some of the earlier versions they 3:20were known as types 3:22all right so now we know from our tables 3:25in relational database has many tables 3:28all right and we know the obvious second 3:30one we're going to look at is 3:33i'm going to put both of these down as 3:35we're getting to the bottom of the 3:36screen here rows and then columns 3:43okay let's get my other marker here 3:45and 3:46rows just like we know from most 3:49nosql data sources are going to be as 3:52documents 3:57and normally in a relational database 4:00you know you have tables you have the 4:01rows individual columns these are going 4:04to be called fields 4:08so just a quick comparison if you have a 4:11lot of familiarity with a relational 4:13databases like mysql or postgres this is 4:16kind of a way to transition your 4:18understanding 4:19of all that and know how things kind of 4:21map together and when you start planning 4:23out your your structure these are things 4:25that you need to consider that how you 4:27can translate that over so we know that 4:29it's a 4:31json based data store you're going to 4:33interact with it with rest and we're 4:35looking to get many it's very powerful 4:37has the capability to 4:41ingest data from many many data sources 4:43and scale out if i think about the cap 4:45theorem concepts i will probably put 4:47this on an a and a p for availability 4:50and partition tolerance already built in 4:52and depending on how you want to 4:53configure it you could probably achieve 4:55some 4:56different consistency bases as well but 4:59let's get move on to the whole ecosystem 5:02so you hear the name elasticsearch out 5:04there but often you will hear about this 5:07term elk 5:08elk stack 5:12this is how you you will hear about it 5:14being referenced and i think the easiest 5:16way to break down how the stack works 5:19let's diagram it out and then we'll talk 5:21about each counter component and the 5:23place that it fits and that would be a 5:25great way to really help understand this 5:27so let's put 5:29elasticsearch 5:31i'm going to abbreviate this es that's 5:33going to be kind of in the center 5:35of everything here and what we're going 5:38to do the 5:42the k is for cabana 5:46and kambana is a web-based ui 5:51this will be how you actually interact 5:53with a lot of the data that uh 5:56elasticsearch prepares and indexes for 5:58you to use and so you can build um 6:01your dashboard 6:05and you can build different widgets 6:09or visualizations 6:16that can continuously update as well as 6:19data comes in uh on that side so this 6:22could really be your main interface that 6:24you use to keep 6:27keep updating and looking at your data 6:29as it flows in now let's talk about the 6:32other side so we talked about the output 6:34we have this great data store 6:36elasticsearch we're going to be 6:37visualizing things with cabana kind of 6:40our gateway to view our data and how 6:42things are running now let's talk about 6:44how data gets in and there are two parts 6:47that i would like to talk about here 6:51we have something called logstash and 6:53you'll also 6:55hear something called beats 6:58all right 7:00so for logstash 7:04think of this as well it actually is a 7:06very open source 7:08server-side uh processing pipeline 7:12and its main job is to do two things to 7:14take data in 7:21input data from many different sources 7:24is then going to transform 7:29that data 7:30and then you get to what we like to call 7:33so eloquently stash it somewhere all 7:35right now the inputs can be from variety 7:38of things you can actually just put it 7:40in a format most of the time you can add 7:42sdks or things to your 7:44code or or different systems and they 7:46push the data into logstash 7:48transformations may be to do some 7:50formatting on the data minor structuring 7:53before it comes in 7:54if you would like and then you can 7:57output that through 7:59to stash that somewhere 8:00and you can imagine one of the first 8:03plugins that are there is elasticsearch 8:05so let's complete our triangle here 8:08we'll go from logstash 8:10into 8:11uh elasticsearch and so you can 8:13continuously feed things in 8:16now we mentioned the part beats uh that 8:18were here unlike the headphones these 8:21beats are set up to 8:22be kind of agents on different servers 8:25so say you have something in maybe in 8:27serverless or 8:29um or you have some files that you want 8:31to do or different 8:32[Music] 8:33maybe something on windows server so 8:35it's kind of a complementary kind of 8:37component that's very logstash in nature 8:39but it has plug-ins to many different 8:41other services 8:42and one of its outputs is to go directly 8:45into logstash 8:46so collectively you're kind of building 8:48this consistent 8:50pipeline that keeps going 8:52in as you visualize you can kind of say 8:55you program more 8:57things to come in and continuously keep 9:00this 9:01circular nature coming 9:04and keep flowing 9:05now 9:06this can scale up to 9:09massive amounts of information and nodes 9:12that can really already set up to be 9:13distributed in nature and handle a 9:15variety of scenarios but one great thing 9:18is there are containers available that 9:21you can set up this complete 9:23infrastructure all on your laptop to 9:25taste test things out on a very much 9:28smaller scale and have it grow to a much 9:31larger scale effectively making it a 9:33great component in your architecture to 9:35be how you visualize your data that will 9:38be in the data lake that you are 9:39building 9:40thank you very much for your time 9:43if you have any questions please drop us 9:45a line below and if you want to see more 9:48videos like this in the future please 9:50like and subscribe