Learning Library

← Back to Library

Ensuring Consistent Distributed Data with etcd

Key Points

  • etcd is an open‑source, fully replicated key‑value store that acts as the single source of truth for Kubernetes state, configuration, and metadata.
  • It achieves strong consistency by using the Raft consensus algorithm, where a leader node coordinates writes and only commits them after a majority of follower nodes have persisted the change.
  • Clients can read from or write to any cluster node; followers forward read requests to the leader to ensure the most up‑to‑date value is returned.
  • The system is highly available—if the leader fails, remaining nodes hold an election to promptly select a new leader, allowing the cluster to continue operating without a single point of failure.
  • This replication and consensus design enables etcd to provide reliable, low‑latency data storage across distributed environments.

Full Transcript

# Ensuring Consistent Distributed Data with etcd **Source:** [https://www.youtube.com/watch?v=OmphHSaO1sE](https://www.youtube.com/watch?v=OmphHSaO1sE) **Duration:** 00:06:19 ## Summary - etcd is an open‑source, fully replicated key‑value store that acts as the single source of truth for Kubernetes state, configuration, and metadata. - It achieves strong consistency by using the Raft consensus algorithm, where a leader node coordinates writes and only commits them after a majority of follower nodes have persisted the change. - Clients can read from or write to any cluster node; followers forward read requests to the leader to ensure the most up‑to‑date value is returned. - The system is highly available—if the leader fails, remaining nodes hold an election to promptly select a new leader, allowing the cluster to continue operating without a single point of failure. - This replication and consensus design enables etcd to provide reliable, low‑latency data storage across distributed environments. ## Sections - [00:00:00](https://www.youtube.com/watch?v=OmphHSaO1sE&t=0s) **Untitled Section** - - [00:03:23](https://www.youtube.com/watch?v=OmphHSaO1sE&t=203s) **etcd: Consistent, Highly Available Store** - The speaker explains that etcd provides strong consistency, automatic leader election for high availability, fast writes constrained by disk speed, TLS‑secured persistence, a simple HTTP/JSON API, and a watch feature that Kubernetes uses to detect state drift. ## Full Transcript
0:00How can you ensure that your data is stored consistently 0:03and reliably across a distributed system? My name is Whitney Lee and I'm a Cloud 0:09Developer here at IBM. etcd is an open source key value data 0:14store used to manage and store data that help 0:19keep distributed systems running. etcd is most well known for being one of 0:23the core components of Kubernetes, where it stores and manages Kubernetes 0:28state data, configuration data, and metadata. etcd can be relied upon 0:35to be a single source of truth at any given point in time. 0:41Today I'm going to go over some of the features of etcd that allow it to be so 0:45effective in this way. 0:48etcd is fully replicated. 0:55This means that every node in an etcd cluster 0:59has access to the full data store. etcd is also reliably consistent. 1:10Every data read in an etcd cluster is going to return the most recent data 1:15right. Let's talk about how this works. etcd 1:19is built on top of the Raft algorithm that is used for distributed consensus. 1:25So, let's make a very simple etcd cluster of only four nodes. An etcd cluster 1:32always has a leader and then the other nodes in the cluster 1:36are followers. It's a key value data store, so in this 1:40case at key one we have the value of seven. 1:44Let's say a web application comes in 1:49and lets the leader node know at key one we want to store the value of 17 instead 1:55of 7. The leader node does not change its own 2:00local data store, instead it forwards that request to each 2:04of the followers. When a follower changes its local data 2:09store it returns that to the leader, so the 2:12leader knows. When our leader node can see that the 2:16majority of the nodes have been updated to the most current 2:20data that's when the leader will update its own current data store 2:24and return a successful write to the client. 2:29Now client doesn't actually have to concern itself 2:32about which node in the cluster is the leader. The client can make 2:36read and write requests to any node in the cluster. 2:40So, let's say, this all happens over a matter of milliseconds, 2:44but let's say that the client makes a read request to the node that hasn't 2:48updated yet and says what's the value at key one? 2:53Well this follower node knows it's a follower node and knows it's not 2:58authorized to answer the client directly. So what it's going to do is forward that 3:02request into the leader node which will then respond the cluster's 3:07current value at key 1 is 17. And so it will get a response of 17 to 3:14the client. And that's how etcd is replicated. 3:23So every every node in the cluster has access to the full data store 3:28and it's consistent every data read is going to return 3:32the most recent data right. etcd is also highly available. 3:44This means that there's no single point of failure in the etcd cluster. 3:49It can tolerate gracefully tolerate network partitions and hardware failure 3:53too. So, let's say that our leader node goes 3:57down. The followers can declare themselves a 4:00candidate, they'll hold an election where each one 4:03votes based on availability and a new node will be elected the 4:07leader. That leader will go on to manage the 4:10replication for the cluster and the data is unaffected. 4:18etcd is also fast. 4:24etcd is benchmarked at 10,000 writes per second. 4:28With that said, etcd does persist data to disk. 4:32So, etcd's performance is tied to your storage disk speed. 4:37etcd is secure. etcd uses transport layer security with 4:45optional SSL client certificate authentication. 4:49etcd stores vital and highly sensitive configuration data, 4:53so it's important to keep it protected. Finally etcd is simple to use. 5:04A web application can read and write data to etcd uses a 5:07simple http JSON tools. 5:12So the other thing to talk about in etcd that's important 5:15is the watch function. Kubernetes leverages this. 5:19So, as i talked about at the beginning, etcd stores Kubernetes configuration data 5:26and its state data. 5:31So, etcd can use this watch function to compare these to each other. If they 5:39ever go out of sync, etcd will let the Kubernetes 5:42API know and the kubernetes API will reconfigure 5:45the cluster accordingly. 5:49etcd can be used to store your data reliably and consistently across your 5:57distributed system. Thank you. if you have questions please 6:01drop us a line below. If you want to see more videos like this 6:04in the future, please like and subscribe. And don't forget you can 6:09grow your skills and earn a badge with IBM CloudLabs, 6:12which are free browser-based, interactive Kubernetes labs.