<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>etcd operations guide on etcd</title><link>https://etcd.io/docs/v3.1/op-guide/</link><description>Recent content in etcd operations guide on etcd</description><generator>Hugo</generator><language>en-us</language><atom:link href="https://etcd.io/docs/v3.1/op-guide/index.xml" rel="self" type="application/rss+xml"/><item><title>Clustering Guide</title><link>https://etcd.io/docs/v3.1/op-guide/clustering/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://etcd.io/docs/v3.1/op-guide/clustering/</guid><description>&lt;h2 id="overview"&gt;Overview&lt;/h2&gt;
&lt;p&gt;Starting an etcd cluster statically requires that each member knows another in the cluster. In a number of cases, the IPs of the cluster members may be unknown ahead of time. In these cases, the etcd cluster can be bootstrapped with the help of a discovery service.&lt;/p&gt;
&lt;p&gt;Once an etcd cluster is up and running, adding or removing members is done via &lt;a href="../runtime-configuration/"&gt;runtime reconfiguration&lt;/a&gt;. To better understand the design behind runtime reconfiguration, we suggest reading &lt;a href="../runtime-reconf-design/"&gt;the runtime configuration design document&lt;/a&gt;.&lt;/p&gt;</description></item><item><title>Configuration flags</title><link>https://etcd.io/docs/v3.1/op-guide/configuration/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://etcd.io/docs/v3.1/op-guide/configuration/</guid><description>&lt;p&gt;etcd is configurable through command-line flags and environment variables. Options set on the command line take precedence over those from the environment.&lt;/p&gt;
&lt;p&gt;The format of environment variable for flag &lt;code&gt;--my-flag&lt;/code&gt; is &lt;code&gt;ETCD_MY_FLAG&lt;/code&gt;. It applies to all flags.&lt;/p&gt;
&lt;p&gt;The &lt;a href="http://www.iana.org/assignments/service-names-port-numbers/service-names-port-numbers.txt" target="_blank" rel="noopener"&gt;official etcd ports&lt;/a&gt; are 2379 for client requests and 2380 for peer communication. The etcd ports can be set to accept TLS traffic, non-TLS traffic, or both TLS and non-TLS traffic.&lt;/p&gt;
&lt;p&gt;To start etcd automatically using custom settings at startup in Linux, using a &lt;a href="http://freedesktop.org/wiki/Software/systemd/" target="_blank" rel="noopener"&gt;systemd&lt;/a&gt; unit is highly recommended.&lt;/p&gt;</description></item><item><title>Design of runtime reconfiguration</title><link>https://etcd.io/docs/v3.1/op-guide/runtime-reconf-design/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://etcd.io/docs/v3.1/op-guide/runtime-reconf-design/</guid><description>&lt;p&gt;Runtime reconfiguration is one of the hardest and most error prone features in a distributed system, especially in a consensus based system like etcd.&lt;/p&gt;
&lt;p&gt;Read on to learn about the design of etcd&amp;rsquo;s runtime reconfiguration commands and how we tackled these problems.&lt;/p&gt;
&lt;h2 id="two-phase-config-changes-keep-the-cluster-safe"&gt;Two phase config changes keep the cluster safe&lt;/h2&gt;
&lt;p&gt;In etcd, every runtime reconfiguration has to go through &lt;a href="../runtime-configuration/#add-a-new-member"&gt;two phases&lt;/a&gt; for safety reasons. For example, to add a member, first inform cluster of new configuration and then start the new member.&lt;/p&gt;</description></item><item><title>Disaster recovery</title><link>https://etcd.io/docs/v3.1/op-guide/recovery/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://etcd.io/docs/v3.1/op-guide/recovery/</guid><description>&lt;p&gt;etcd is designed to withstand machine failures. An etcd cluster automatically recovers from temporary failures (e.g., machine reboots) and tolerates up to &lt;em&gt;(N-1)/2&lt;/em&gt; permanent failures for a cluster of N members. When a member permanently fails, whether due to hardware failure or disk corruption, it loses access to the cluster. If the cluster permanently loses more than &lt;em&gt;(N-1)/2&lt;/em&gt; members then it disastrously fails, irrevocably losing quorum. Once quorum is lost, the cluster cannot reach consensus and therefore cannot continue accepting updates.&lt;/p&gt;</description></item><item><title>etcd gateway</title><link>https://etcd.io/docs/v3.1/op-guide/gateway/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://etcd.io/docs/v3.1/op-guide/gateway/</guid><description>&lt;h2 id="what-is-etcd-gateway"&gt;What is etcd gateway&lt;/h2&gt;
&lt;p&gt;etcd gateway is a simple TCP proxy that forwards network data to the etcd cluster. The gateway is stateless and transparent; it neither inspects client requests nor interferes with cluster responses.&lt;/p&gt;
&lt;p&gt;The gateway supports multiple etcd server endpoints. When the gateway starts, it randomly picks one etcd server endpoint and forwards all requests to that endpoint. This endpoint serves all requests until the gateway detects a network failure. If the gateway detects an endpoint failure, it will switch to a different endpoint, if available, to hide failures from its clients. Other retry policies, such as weighted round-robin, may be supported in the future.&lt;/p&gt;</description></item><item><title>gRPC proxy</title><link>https://etcd.io/docs/v3.1/op-guide/grpc_proxy/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://etcd.io/docs/v3.1/op-guide/grpc_proxy/</guid><description>&lt;p&gt;&lt;em&gt;This is an alpha feature, we are looking for early feedback.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;The gRPC proxy is a stateless etcd reverse proxy operating at the gRPC layer (L7). The proxy is designed to reduce the total processing load on the core etcd cluster. For horizontal scalability, it coalesces watch and lease API requests. To protect the cluster against abusive clients, it caches key range requests.&lt;/p&gt;
&lt;p&gt;The gRPC proxy supports multiple etcd server endpoints. When the proxy starts, it randomly picks one etcd server endpoint to use. This endpoint serves all requests until the proxy detects an endpoint failure. If the gRPC proxy detects an endpoint failure, it switches to a different endpoint, if available, to hide failures from its clients. Other retry policies, such as weighted round-robin, may be supported in the future.&lt;/p&gt;</description></item><item><title>Hardware recommendations</title><link>https://etcd.io/docs/v3.1/op-guide/hardware/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://etcd.io/docs/v3.1/op-guide/hardware/</guid><description>&lt;p&gt;etcd usually runs well with limited resources for development or testing purposes; it’s common to develop with etcd on a laptop or a cheap cloud machine. However, when running etcd clusters in production, some hardware guidelines are useful for proper administration. These suggestions are not hard rules; they serve as a good starting point for a robust production deployment. As always, deployments should be tested with simulated workloads before running in production.&lt;/p&gt;</description></item><item><title>Maintenance</title><link>https://etcd.io/docs/v3.1/op-guide/maintenance/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://etcd.io/docs/v3.1/op-guide/maintenance/</guid><description>&lt;h2 id="overview"&gt;Overview&lt;/h2&gt;
&lt;p&gt;An etcd cluster needs periodic maintenance to remain reliable. Depending on an etcd application&amp;rsquo;s needs, this maintenance can usually be automated and performed without downtime or significantly degraded performance.&lt;/p&gt;
&lt;p&gt;All etcd maintenance manages storage resources consumed by the etcd keyspace. Failure to adequately control the keyspace size is guarded by storage space quotas; if an etcd member runs low on space, a quota will trigger cluster-wide alarms which will put the system into a limited-operation maintenance mode. To avoid running out of space for writes to the keyspace, the etcd keyspace history must be compacted. Storage space itself may be reclaimed by defragmenting etcd members. Finally, periodic snapshot backups of etcd member state makes it possible to recover any unintended logical data loss or corruption caused by operational error.&lt;/p&gt;</description></item><item><title>Migrate applications from using API v2 to API v3</title><link>https://etcd.io/docs/v3.1/op-guide/v2-migration/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://etcd.io/docs/v3.1/op-guide/v2-migration/</guid><description>&lt;p&gt;The data store v2 is still accessible from the API v2 after upgrading to etcd3. Thus, it will work as before and require no application changes. With etcd 3, applications use the new grpc API v3 to access the mvcc store, which provides more features and improved performance. The mvcc store and the old store v2 are separate and isolated; writes to the store v2 will not affect the mvcc store and, similarly, writes to the mvcc store will not affect the store v2.&lt;/p&gt;</description></item><item><title>Monitoring etcd</title><link>https://etcd.io/docs/v3.1/op-guide/monitoring/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://etcd.io/docs/v3.1/op-guide/monitoring/</guid><description>&lt;p&gt;Each etcd server exports metrics under the &lt;code&gt;/metrics&lt;/code&gt; path on its client port.&lt;/p&gt;
&lt;p&gt;The metrics can be fetched with &lt;code&gt;curl&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sh" data-lang="sh"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;$ curl -L http://localhost:2379/metrics
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8f5902;font-style:italic"&gt;# HELP etcd_debugging_mvcc_keys_total Total number of keys.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8f5902;font-style:italic"&gt;# TYPE etcd_debugging_mvcc_keys_total gauge&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;etcd_debugging_mvcc_keys_total &lt;span style="color:#0000cf;font-weight:bold"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8f5902;font-style:italic"&gt;# HELP etcd_debugging_mvcc_pending_events_total Total number of pending events to be sent.&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#8f5902;font-style:italic"&gt;# TYPE etcd_debugging_mvcc_pending_events_total gauge&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;etcd_debugging_mvcc_pending_events_total &lt;span style="color:#0000cf;font-weight:bold"&gt;0&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;...
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="prometheus"&gt;Prometheus&lt;/h2&gt;
&lt;p&gt;Running a &lt;a href="https://prometheus.io/" target="_blank" rel="noopener"&gt;Prometheus&lt;/a&gt; monitoring service is the easiest way to ingest and record etcd&amp;rsquo;s metrics.&lt;/p&gt;
&lt;p&gt;First, install Prometheus:&lt;/p&gt;</description></item><item><title>Performance</title><link>https://etcd.io/docs/v3.1/op-guide/performance/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://etcd.io/docs/v3.1/op-guide/performance/</guid><description>&lt;h2 id="understanding-performance"&gt;Understanding performance&lt;/h2&gt;
&lt;p&gt;etcd provides stable, sustained high performance. Two factors define performance: latency and throughput. Latency is the time taken to complete an operation. Throughput is the total operations completed within some time period. Usually average latency increases as the overall throughput increases when etcd accepts concurrent client requests. In common cloud environments, like a standard &lt;code&gt;n-4&lt;/code&gt; on Google Compute Engine (GCE) or a comparable machine type on AWS, a three member etcd cluster finishes a request in less than one millisecond under light load, and can complete more than 30,000 requests per second under heavy load.&lt;/p&gt;</description></item><item><title>Run etcd clusters inside containers</title><link>https://etcd.io/docs/v3.1/op-guide/container/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://etcd.io/docs/v3.1/op-guide/container/</guid><description>&lt;p&gt;The following guide shows how to run etcd with rkt and Docker using the &lt;a href="../clustering/#static"&gt;static bootstrap process&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id="rkt"&gt;rkt&lt;/h2&gt;
&lt;h3 id="running-a-single-node-etcd"&gt;Running a single node etcd&lt;/h3&gt;
&lt;p&gt;The following rkt run command will expose the etcd client API on port 2379 and expose the peer API on port 2380.&lt;/p&gt;
&lt;p&gt;Use the host IP address when configuring etcd.&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;export NODE1=192.168.1.21
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Trust the CoreOS &lt;a href="https://coreos.com/security/app-signing-key/" target="_blank" rel="noopener"&gt;App Signing Key&lt;/a&gt;.&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;sudo rkt trust --prefix coreos.com/etcd
# gpg key fingerprint is: 18AD 5014 C99E F7E3 BA5F 6CE9 50BD D3E0 FC8A 365E
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;Run the &lt;code&gt;v3.0.6&lt;/code&gt; version of etcd or specify another release version.&lt;/p&gt;</description></item><item><title>Runtime reconfiguration</title><link>https://etcd.io/docs/v3.1/op-guide/runtime-configuration/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://etcd.io/docs/v3.1/op-guide/runtime-configuration/</guid><description>&lt;p&gt;etcd comes with support for incremental runtime reconfiguration, which allows users to update the membership of the cluster at run time.&lt;/p&gt;
&lt;p&gt;Reconfiguration requests can only be processed when the majority of the cluster members are functioning. It is &lt;strong&gt;highly recommended&lt;/strong&gt; to always have a cluster size greater than two in production. It is unsafe to remove a member from a two member cluster. The majority of a two member cluster is also two. If there is a failure during the removal process, the cluster might not able to make progress and need to &lt;a href="#restart-cluster-from-majority-failure"&gt;restart from majority failure&lt;/a&gt;.&lt;/p&gt;</description></item><item><title>Security model</title><link>https://etcd.io/docs/v3.1/op-guide/security/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://etcd.io/docs/v3.1/op-guide/security/</guid><description>&lt;p&gt;etcd supports automatic TLS as well as authentication through client certificates for both clients to server as well as peer (server to server / cluster) communication.&lt;/p&gt;
&lt;p&gt;To get up and running, first have a CA certificate and a signed key pair for one member. It is recommended to create and sign a new key pair for every member in a cluster.&lt;/p&gt;
&lt;p&gt;For convenience, the &lt;a href="https://github.com/cloudflare/cfssl" target="_blank" rel="noopener"&gt;cfssl&lt;/a&gt; tool provides an easy interface to certificate generation, and we provide an example using the tool &lt;a href="https://github.com/etcd-io/etcd/tree/master/hack/tls-setup" target="_blank" rel="noopener"&gt;here&lt;/a&gt;. Alternatively, try this &lt;a href="https://github.com/coreos/docs/blob/master/os/generate-self-signed-certificates.md" target="_blank" rel="noopener"&gt;guide to generating self-signed key pairs&lt;/a&gt;.&lt;/p&gt;</description></item><item><title>Supported platforms</title><link>https://etcd.io/docs/v3.1/op-guide/supported-platform/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://etcd.io/docs/v3.1/op-guide/supported-platform/</guid><description>&lt;h3 id="current-support"&gt;Current support&lt;/h3&gt;
&lt;p&gt;The following table lists etcd support status for common architectures and operating systems,&lt;/p&gt;
&lt;table&gt;
 &lt;thead&gt;
 &lt;tr&gt;
 &lt;th&gt;Architecture&lt;/th&gt;
 &lt;th&gt;Operating System&lt;/th&gt;
 &lt;th&gt;Status&lt;/th&gt;
 &lt;th&gt;Maintainers&lt;/th&gt;
 &lt;/tr&gt;
 &lt;/thead&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
 &lt;td&gt;amd64&lt;/td&gt;
 &lt;td&gt;Darwin&lt;/td&gt;
 &lt;td&gt;Experimental&lt;/td&gt;
 &lt;td&gt;etcd maintainers&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;amd64&lt;/td&gt;
 &lt;td&gt;Linux&lt;/td&gt;
 &lt;td&gt;Stable&lt;/td&gt;
 &lt;td&gt;etcd maintainers&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;amd64&lt;/td&gt;
 &lt;td&gt;Windows&lt;/td&gt;
 &lt;td&gt;Experimental&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;arm64&lt;/td&gt;
 &lt;td&gt;Linux&lt;/td&gt;
 &lt;td&gt;Experimental&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;arm&lt;/td&gt;
 &lt;td&gt;Linux&lt;/td&gt;
 &lt;td&gt;Unstable&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;tr&gt;
 &lt;td&gt;386&lt;/td&gt;
 &lt;td&gt;Linux&lt;/td&gt;
 &lt;td&gt;Unstable&lt;/td&gt;
 &lt;td&gt;&lt;/td&gt;
 &lt;/tr&gt;
 &lt;/tbody&gt;
&lt;/table&gt;
&lt;ul&gt;
&lt;li&gt;etcd-maintainers are listed in &lt;a href="https://github.com/etcd-io/etcd/blob/main/OWNERS" target="_blank" rel="noopener"&gt;https://github.com/etcd-io/etcd/blob/main/OWNERS&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Experimental platforms appear to work in practice and have some platform specific code in etcd, but do not fully conform to the stable support policy. Unstable platforms have been lightly tested, but less than experimental. Unlisted architecture and operating system pairs are currently unsupported; caveat emptor.&lt;/p&gt;</description></item><item><title>Understand failures</title><link>https://etcd.io/docs/v3.1/op-guide/failures/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://etcd.io/docs/v3.1/op-guide/failures/</guid><description>&lt;p&gt;Failures are common in a large deployment of machines. A machine fails when its hardware or software malfunctions. Multiple machines fail together when there are power failures or network issues. Multiple kinds of failures can also happen at once; it is almost impossible to enumerate all possible failure cases.&lt;/p&gt;
&lt;p&gt;In this section, we catalog kinds of failures and discuss how etcd is designed to tolerate these failures. Most users, if not all, can map a particular failure into one kind of failure. To prepare for rare or &lt;a href="../recovery/"&gt;unrecoverable failures&lt;/a&gt;, always &lt;a href="../maintenance/#snapshot-backup"&gt;back up&lt;/a&gt; the etcd cluster.&lt;/p&gt;</description></item><item><title>Versioning</title><link>https://etcd.io/docs/v3.1/op-guide/versioning/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://etcd.io/docs/v3.1/op-guide/versioning/</guid><description>&lt;h3 id="service-versioning"&gt;Service versioning&lt;/h3&gt;
&lt;p&gt;etcd uses &lt;a href="http://semver.org" target="_blank" rel="noopener"&gt;semantic versioning&lt;/a&gt;
New minor versions may add additional features to the API.&lt;/p&gt;
&lt;p&gt;Get the running etcd cluster version with &lt;code&gt;etcdctl&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-sh" data-lang="sh"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;&lt;span style="color:#000"&gt;ETCDCTL_API&lt;/span&gt;&lt;span style="color:#ce5c00;font-weight:bold"&gt;=&lt;/span&gt;&lt;span style="color:#0000cf;font-weight:bold"&gt;3&lt;/span&gt; etcdctl --endpoints&lt;span style="color:#ce5c00;font-weight:bold"&gt;=&lt;/span&gt;127.0.0.1:2379 endpoint status
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="api-versioning"&gt;API versioning&lt;/h3&gt;
&lt;p&gt;The &lt;code&gt;v3&lt;/code&gt; API responses should not change after the 3.0.0 release but new features will be added over time.&lt;/p&gt;</description></item></channel></rss>