etcd operations guide on etcd

Clustering Guide

Mon, 01 Jan 0001 00:00:00 +0000

Overview

Starting an etcd cluster statically requires that each member knows another in the cluster. In a number of cases, the IPs of the cluster members may be unknown ahead of time. In these cases, the etcd cluster can be bootstrapped with the help of a discovery service.

Once an etcd cluster is up and running, adding or removing members is done via runtime reconfiguration. To better understand the design behind runtime reconfiguration, we suggest reading the runtime configuration design document.

Configuration flags

Mon, 01 Jan 0001 00:00:00 +0000

etcd is configurable through command-line flags and environment variables. Options set on the command line take precedence over those from the environment.

The format of environment variable for flag --my-flag is ETCD_MY_FLAG. It applies to all flags.

The official etcd ports are 2379 for client requests and 2380 for peer communication. The etcd ports can be set to accept TLS traffic, non-TLS traffic, or both TLS and non-TLS traffic.

To start etcd automatically using custom settings at startup in Linux, using a systemd unit is highly recommended.

Design of runtime reconfiguration

Mon, 01 Jan 0001 00:00:00 +0000

Runtime reconfiguration is one of the hardest and most error prone features in a distributed system, especially in a consensus based system like etcd.

Read on to learn about the design of etcd’s runtime reconfiguration commands and how we tackled these problems.

Two phase config changes keep the cluster safe

In etcd, every runtime reconfiguration has to go through two phases for safety reasons. For example, to add a member, first inform cluster of new configuration and then start the new member.

Disaster recovery

Mon, 01 Jan 0001 00:00:00 +0000

etcd is designed to withstand machine failures. An etcd cluster automatically recovers from temporary failures (e.g., machine reboots) and tolerates up to (N-1)/2 permanent failures for a cluster of N members. When a member permanently fails, whether due to hardware failure or disk corruption, it loses access to the cluster. If the cluster permanently loses more than (N-1)/2 members then it disastrously fails, irrevocably losing quorum. Once quorum is lost, the cluster cannot reach consensus and therefore cannot continue accepting updates.

etcd gateway

Mon, 01 Jan 0001 00:00:00 +0000

What is etcd gateway

etcd gateway is a simple TCP proxy that forwards network data to the etcd cluster. The gateway is stateless and transparent; it neither inspects client requests nor interferes with cluster responses.

The gateway supports multiple etcd server endpoints. When the gateway starts, it randomly picks one etcd server endpoint and forwards all requests to that endpoint. This endpoint serves all requests until the gateway detects a network failure. If the gateway detects an endpoint failure, it will switch to a different endpoint, if available, to hide failures from its clients. Other retry policies, such as weighted round-robin, may be supported in the future.

gRPC proxy

Mon, 01 Jan 0001 00:00:00 +0000

This is an alpha feature, we are looking for early feedback.

The gRPC proxy is a stateless etcd reverse proxy operating at the gRPC layer (L7). The proxy is designed to reduce the total processing load on the core etcd cluster. For horizontal scalability, it coalesces watch and lease API requests. To protect the cluster against abusive clients, it caches key range requests.

The gRPC proxy supports multiple etcd server endpoints. When the proxy starts, it randomly picks one etcd server endpoint to use. This endpoint serves all requests until the proxy detects an endpoint failure. If the gRPC proxy detects an endpoint failure, it switches to a different endpoint, if available, to hide failures from its clients. Other retry policies, such as weighted round-robin, may be supported in the future.

Hardware recommendations

Mon, 01 Jan 0001 00:00:00 +0000

etcd usually runs well with limited resources for development or testing purposes; it’s common to develop with etcd on a laptop or a cheap cloud machine. However, when running etcd clusters in production, some hardware guidelines are useful for proper administration. These suggestions are not hard rules; they serve as a good starting point for a robust production deployment. As always, deployments should be tested with simulated workloads before running in production.

Maintenance

Mon, 01 Jan 0001 00:00:00 +0000

Overview

An etcd cluster needs periodic maintenance to remain reliable. Depending on an etcd application’s needs, this maintenance can usually be automated and performed without downtime or significantly degraded performance.

All etcd maintenance manages storage resources consumed by the etcd keyspace. Failure to adequately control the keyspace size is guarded by storage space quotas; if an etcd member runs low on space, a quota will trigger cluster-wide alarms which will put the system into a limited-operation maintenance mode. To avoid running out of space for writes to the keyspace, the etcd keyspace history must be compacted. Storage space itself may be reclaimed by defragmenting etcd members. Finally, periodic snapshot backups of etcd member state makes it possible to recover any unintended logical data loss or corruption caused by operational error.

Migrate applications from using API v2 to API v3

Mon, 01 Jan 0001 00:00:00 +0000

The data store v2 is still accessible from the API v2 after upgrading to etcd3. Thus, it will work as before and require no application changes. With etcd 3, applications use the new grpc API v3 to access the mvcc store, which provides more features and improved performance. The mvcc store and the old store v2 are separate and isolated; writes to the store v2 will not affect the mvcc store and, similarly, writes to the mvcc store will not affect the store v2.

Monitoring etcd

Mon, 01 Jan 0001 00:00:00 +0000

Each etcd server exports metrics under the /metrics path on its client port.

The metrics can be fetched with curl:

$ curl -L http://localhost:2379/metrics

# HELP etcd_debugging_mvcc_keys_total Total number of keys.
# TYPE etcd_debugging_mvcc_keys_total gauge
etcd_debugging_mvcc_keys_total 0
# HELP etcd_debugging_mvcc_pending_events_total Total number of pending events to be sent.
# TYPE etcd_debugging_mvcc_pending_events_total gauge
etcd_debugging_mvcc_pending_events_total 0
...

Prometheus

Running a Prometheus monitoring service is the easiest way to ingest and record etcd’s metrics.

First, install Prometheus:

Performance

Mon, 01 Jan 0001 00:00:00 +0000

Understanding performance

etcd provides stable, sustained high performance. Two factors define performance: latency and throughput. Latency is the time taken to complete an operation. Throughput is the total operations completed within some time period. Usually average latency increases as the overall throughput increases when etcd accepts concurrent client requests. In common cloud environments, like a standard n-4 on Google Compute Engine (GCE) or a comparable machine type on AWS, a three member etcd cluster finishes a request in less than one millisecond under light load, and can complete more than 30,000 requests per second under heavy load.

Run etcd clusters inside containers

Mon, 01 Jan 0001 00:00:00 +0000

The following guide shows how to run etcd with rkt and Docker using the static bootstrap process.

rkt

Running a single node etcd

The following rkt run command will expose the etcd client API on port 2379 and expose the peer API on port 2380.

Use the host IP address when configuring etcd.

export NODE1=192.168.1.21

Trust the CoreOS App Signing Key.

sudo rkt trust --prefix coreos.com/etcd
# gpg key fingerprint is: 18AD 5014 C99E F7E3 BA5F 6CE9 50BD D3E0 FC8A 365E

Run the v3.0.6 version of etcd or specify another release version.

Runtime reconfiguration

Mon, 01 Jan 0001 00:00:00 +0000

etcd comes with support for incremental runtime reconfiguration, which allows users to update the membership of the cluster at run time.

Reconfiguration requests can only be processed when the majority of the cluster members are functioning. It is highly recommended to always have a cluster size greater than two in production. It is unsafe to remove a member from a two member cluster. The majority of a two member cluster is also two. If there is a failure during the removal process, the cluster might not able to make progress and need to restart from majority failure.

Security model

Mon, 01 Jan 0001 00:00:00 +0000

etcd supports automatic TLS as well as authentication through client certificates for both clients to server as well as peer (server to server / cluster) communication.

To get up and running, first have a CA certificate and a signed key pair for one member. It is recommended to create and sign a new key pair for every member in a cluster.

For convenience, the cfssl tool provides an easy interface to certificate generation, and we provide an example using the tool here. Alternatively, try this guide to generating self-signed key pairs.

Supported platforms

Mon, 01 Jan 0001 00:00:00 +0000

Current support

The following table lists etcd support status for common architectures and operating systems,

Architecture	Operating System	Status	Maintainers
amd64	Darwin	Experimental	etcd maintainers
amd64	Linux	Stable	etcd maintainers
amd64	Windows	Experimental
arm64	Linux	Experimental
arm	Linux	Unstable
386	Linux	Unstable

etcd-maintainers are listed in https://github.com/etcd-io/etcd/blob/main/OWNERS.

Experimental platforms appear to work in practice and have some platform specific code in etcd, but do not fully conform to the stable support policy. Unstable platforms have been lightly tested, but less than experimental. Unlisted architecture and operating system pairs are currently unsupported; caveat emptor.

Understand failures

Mon, 01 Jan 0001 00:00:00 +0000

Failures are common in a large deployment of machines. A machine fails when its hardware or software malfunctions. Multiple machines fail together when there are power failures or network issues. Multiple kinds of failures can also happen at once; it is almost impossible to enumerate all possible failure cases.

In this section, we catalog kinds of failures and discuss how etcd is designed to tolerate these failures. Most users, if not all, can map a particular failure into one kind of failure. To prepare for rare or unrecoverable failures, always back up the etcd cluster.

Versioning

Mon, 01 Jan 0001 00:00:00 +0000

Service versioning

etcd uses semantic versioning New minor versions may add additional features to the API.

Get the running etcd cluster version with etcdctl:

ETCDCTL_API=3 etcdctl --endpoints=127.0.0.1:2379 endpoint status

API versioning

The v3 API responses should not change after the 3.0.0 release but new features will be added over time.