Elasticsearch: Scalable Distributed JSON Database
Key Points
- Elasticsearch is a distributed, NoSQL JSON‑based datastore that scales automatically and continuously ingests large volumes of data.
- It is accessed via a RESTful API, allowing you to create indexes, query, and manage data entirely through HTTP calls.
- Common use cases include aggregating logs, metrics, and application tracing data into searchable JSON documents for real‑time retrieval.
- Compared to relational databases, Elasticsearch uses “indexes” (instead of databases), “index patterns” or “types” (instead of tables), “documents” (instead of rows), and “fields” (instead of columns).
- This terminology shift reflects its schema‑flexible nature and highlights how Elasticsearch differs from traditional RDBMSs like MySQL or PostgreSQL.
Full Transcript
# Elasticsearch: Scalable Distributed JSON Database **Source:** [https://www.youtube.com/watch?v=ZP0NmfyfsoM](https://www.youtube.com/watch?v=ZP0NmfyfsoM) **Duration:** 00:09:54 ## Summary - Elasticsearch is a distributed, NoSQL JSON‑based datastore that scales automatically and continuously ingests large volumes of data. - It is accessed via a RESTful API, allowing you to create indexes, query, and manage data entirely through HTTP calls. - Common use cases include aggregating logs, metrics, and application tracing data into searchable JSON documents for real‑time retrieval. - Compared to relational databases, Elasticsearch uses “indexes” (instead of databases), “index patterns” or “types” (instead of tables), “documents” (instead of rows), and “fields” (instead of columns). - This terminology shift reflects its schema‑flexible nature and highlights how Elasticsearch differs from traditional RDBMSs like MySQL or PostgreSQL. ## Sections - [00:00:00](https://www.youtube.com/watch?v=ZP0NmfyfsoM&t=0s) **Introducing Elasticsearch: Scalable NoSQL Store** - IBM developer advocate Jamil Spang outlines Elasticsearch as a distributed, JSON‑based NoSQL database, contrasts it with relational systems, highlights its REST‑ful API, and cites common use cases such as log aggregation, metric collection, and application tracing. ## Full Transcript
would you believe me if i told you that
there's a database out there that can
continuously handle large volumes of
information
scale automagically
and be available to keep on continuously
taking on data
hello my name is jamil spang developer
advocate with ibm
and today's topic is the answer to that
elasticsearch
all right it's a great database data
store and i want to talk a little more
about the some of the characteristics of
it we're going to compare it to a
relational database management system
and then talk about the ecosystem that
comes with it
so to get started let's talk about what
is elasticsearch exactly well
first it is distributed in nature
and it is a nosql
json based
datastore
we're going to abbreviate that with the
ds there as well
um
so um
on the spectrum of where databases fall
with postgres in my sequel kind of being
the most structured type of databases
put this on the outer sphere
past mongodb when it comes to how
unstructured and nosql it can be
when it comes to interacting with
elasticsearch interest interests
interestingly enough it's done through a
restful
api
so all your queries happen that way you
programmatically
program all your indexes and all the
stuff that you pretty much anything you
need to interact with it would be
through rest urls
and a lot of the major use cases for
this
you know could be
you can take many different data sources
from logs it could be
any type of metrics you have from
different systems
and maybe even some application
trace data
that comes in and you can have one
system that you can combine all of this
you think about data coming from all
these different sources
and it being able to
uh push them into json documents and
then allow you the ability to search
and get that information back in real
time
so it sounds like a big job that it has
to do and certainly let's
do it from our normal comparison of what
a relational what we know of from
relational databases
to see how that compares and how the
lingo and the context changes well we
know that with relational database
management systems they are called
databases
and in
elasticsearch these are known as
indexes
or
i
n d i c
e s indexes all right
and also in a
uh
relational database we have the term of
tables
okay
and in
this they're going to be called
it could be called kind of index
patterns
and some of the earlier versions they
were known as types
all right so now we know from our tables
in relational database has many tables
all right and we know the obvious second
one we're going to look at is
i'm going to put both of these down as
we're getting to the bottom of the
screen here rows and then columns
okay let's get my other marker here
and
rows just like we know from most
nosql data sources are going to be as
documents
and normally in a relational database
you know you have tables you have the
rows individual columns these are going
to be called fields
so just a quick comparison if you have a
lot of familiarity with a relational
databases like mysql or postgres this is
kind of a way to transition your
understanding
of all that and know how things kind of
map together and when you start planning
out your your structure these are things
that you need to consider that how you
can translate that over so we know that
it's a
json based data store you're going to
interact with it with rest and we're
looking to get many it's very powerful
has the capability to
ingest data from many many data sources
and scale out if i think about the cap
theorem concepts i will probably put
this on an a and a p for availability
and partition tolerance already built in
and depending on how you want to
configure it you could probably achieve
some
different consistency bases as well but
let's get move on to the whole ecosystem
so you hear the name elasticsearch out
there but often you will hear about this
term elk
elk stack
this is how you you will hear about it
being referenced and i think the easiest
way to break down how the stack works
let's diagram it out and then we'll talk
about each counter component and the
place that it fits and that would be a
great way to really help understand this
so let's put
elasticsearch
i'm going to abbreviate this es that's
going to be kind of in the center
of everything here and what we're going
to do the
the k is for cabana
and kambana is a web-based ui
this will be how you actually interact
with a lot of the data that uh
elasticsearch prepares and indexes for
you to use and so you can build um
your dashboard
and you can build different widgets
or visualizations
that can continuously update as well as
data comes in uh on that side so this
could really be your main interface that
you use to keep
keep updating and looking at your data
as it flows in now let's talk about the
other side so we talked about the output
we have this great data store
elasticsearch we're going to be
visualizing things with cabana kind of
our gateway to view our data and how
things are running now let's talk about
how data gets in and there are two parts
that i would like to talk about here
we have something called logstash and
you'll also
hear something called beats
all right
so for logstash
think of this as well it actually is a
very open source
server-side uh processing pipeline
and its main job is to do two things to
take data in
input data from many different sources
is then going to transform
that data
and then you get to what we like to call
so eloquently stash it somewhere all
right now the inputs can be from variety
of things you can actually just put it
in a format most of the time you can add
sdks or things to your
code or or different systems and they
push the data into logstash
transformations may be to do some
formatting on the data minor structuring
before it comes in
if you would like and then you can
output that through
to stash that somewhere
and you can imagine one of the first
plugins that are there is elasticsearch
so let's complete our triangle here
we'll go from logstash
into
uh elasticsearch and so you can
continuously feed things in
now we mentioned the part beats uh that
were here unlike the headphones these
beats are set up to
be kind of agents on different servers
so say you have something in maybe in
serverless or
um or you have some files that you want
to do or different
[Music]
maybe something on windows server so
it's kind of a complementary kind of
component that's very logstash in nature
but it has plug-ins to many different
other services
and one of its outputs is to go directly
into logstash
so collectively you're kind of building
this consistent
pipeline that keeps going
in as you visualize you can kind of say
you program more
things to come in and continuously keep
this
circular nature coming
and keep flowing
now
this can scale up to
massive amounts of information and nodes
that can really already set up to be
distributed in nature and handle a
variety of scenarios but one great thing
is there are containers available that
you can set up this complete
infrastructure all on your laptop to
taste test things out on a very much
smaller scale and have it grow to a much
larger scale effectively making it a
great component in your architecture to
be how you visualize your data that will
be in the data lake that you are
building
thank you very much for your time
if you have any questions please drop us
a line below and if you want to see more
videos like this in the future please
like and subscribe