简述
Etcd是什么
etcd是一个分布式、一致性的键值存储系统,主要用于配置共享和服务发现。
- 安全:自动TLS,可选客户端证书认证
- 快速:基准测试10,000写/秒
- 使用Raft保证一致性
etcd的优势
- 简单。使用Go语言编写部署简单;使用HTTP作为接口使用简单;使用Raft算法保证强一致性让用户易于理解。
- 数据持久化。etcd默认数据一更新就进行持久化。
- 安全。etcd支持TLS安全认证。
相关名词解释
- Raft:etcd所采用的保证分布式系统强一致性的算法。
- Node:一个Raft状态机实例。
- Member: 一个etcd实例。它管理着一个Node,并且可以为客户端请求提供服务。
- Cluster:由多个Member构成可以协同工作的etcd集群。
- Peer:对同一个etcd集群中另外一个Member的称呼。
- Client: 向etcd集群发送HTTP请求的客户端。
- WAL:预写式日志,etcd用于持久化存储的日志格式。
- snapshot:etcd防止WAL文件过多而设置的快照,存储etcd数据状态
- Proxy:etcd的一种模式,为etcd集群提供反向代理服务。
- Leader:Raft算法中通过竞选而产生的处理所有数据提交的节点
- Follower:竞选失败的节点作为Raft中的从属节点,为算法提供强一致性保证。
- Candidate:当Follower超过一定时间接收不到Leader的心跳时转变为Candidate开始竞选。【候选人】
- Term:某个节点成为Leader到下一次竞选时间,称为一个Term。【任期】
- Index:数据项编号。Raft中通过Term和Index来定位数据。
架构图
一个用户的请求发送过来,会经由HTTP Server转发给Store进行具体的事务处理,如果涉及到节点的修改,则交给Raft模块进行状态的变更、日志的记录,然后再同步给别的etcd节点以确认数据提交,最后进行数据的提交,再次同步。
HTTP Server
用于处理用户发送的API请求以及其它etcd节点的同步与心跳信息请求。
Raft
Raft强一致性算法的具体实现,是etcd的核心。
WAL
Write Ahead Log(预写式日志),是etcd的数据存储方式,用于系统提供原子性和持久性的一系列技术。除了在内存中存有所有数据的状态以及节点的索引以外,etcd就通过WAL进行持久化存储。WAL中,所有的数据提交前都会事先记录日志。
- Entry[日志内容]: 负责存储具体日志的内容。
- Snapshot[快照内容]: Snapshot是为了防止数据过多而进行的状态快照,日志内容发生变化时保存Raft的状态。
Store
用于处理etcd支持的各类功能的事务,包括数据索引、节点状态变更、监控与反馈、事件处理与执行等等,是etcd对用户提供的大多数API功能的具体实现。
Raft 算法
raft算法中涉及三种角色,分别是:
- follower: 跟随者
- candidate: 候选者,选举过程中的中间状态角色
- leader: 领导者
选举
有两个timeout来控制选举,第一个是election timeout,该时间是节点从follower到成为candidate的时间,该时间是150到300毫秒之间的随机值。另一个是heartbeat timeout。
- 当某个节点经历完election timeout成为candidate后,开启新的一个选举周期,他向其他节点发起投票请求(Request Vote),如果接收到消息的节点在该周期内还没投过票则给这个candidate投票,然后节点重置他的election timeout。
- 当该candidate获得大部分的选票,则可以当选为leader。
- leader就开始发送append entries给其他follower节点,这个消息会在内部指定的heartbeat timeout时间内发出,follower收到该信息则响应给leader。
- 这个选举周期会继续,直到某个follower没有收到心跳,并成为candidate。
- 如果某个选举周期内,有两个candidate同时获得相同多的选票,则会等待一个新的周期重新选举。
同步
当选举过程结束,选出了leader,则leader需要把所有的变更同步的系统中的其他节点,该同步也是通过发送Append Entries的消息的方式。
- 首先一个客户端发送一个更新给leader,这个更新会添加到leader的日志中。
- 然后leader会在给follower的下次心跳探测中发送该更新。
- 一旦大多数follower收到这个更新并返回给leader,leader提交这个更新,然后返回给客户端。
网络分区
- 当发生网络分区的时候,在不同分区的节点接收不到leader的心跳,则会开启一轮选举,形成不同leader的多个分区集群。
- 当客户端给不同leader的发送更新消息时,不同分区集群中的节点个数小于原先集群的一半时,更新不会被提交,而节点个数大于集群数一半时,更新会被提交。
- 当网络分区恢复后,被提交的更新会同步到其他的节点上,其他节点未提交的日志会被回滚并匹配新leader的日志,保证全局的数据是一致的。
Etcd启动配置参数
核心参数说明
参数选项 |
说明 |
ETCD_DATA_DIR |
数据存储目录 |
ETCD_NAME |
etcd集群中的节点名,这里可以随意,可区分且不重复就行 |
ETCD_LISTEN_PEER_URLS |
监听地址,用于节点之间通信的url,可多个,集群内数据交互(如选举,数据同步等) |
ETCD_INITIAL_ADVERTISE_PEER_URLS |
建议用于节点之间通信的url,节点间将以该值进行通信 |
ETCD_LISTEN_CLIENT_URLS |
监听地址,的用于客户端通信的url,同样可以监听多个 |
ETCD_ADVERTISE_CLIENT_URLS |
建议使用的客户端通信url,该值用于etcd代理或etcd成员与etcd节点通信 |
ETCD_INITIAL_CLUSTER |
集群中所有的 initial-advertise-peer-urls 的合集 |
ETCD_INITIAL_CLUSTER_TOKEN |
集群的token值,该值后集群将生成唯一id,并为每个节点也生成唯一id |
ETCD_INITIAL_CLUSTER_STATE |
初始化集群的标志,新建使用 new,加入一个存在的集群为 existing |
etcdctl工具
etcdctl是一个命令行的客户端,它提供了一下简洁的命令,可理解为命令工具集,可以方便我们在对服务进行测试或者手动修改数据库内容。etcdctl与其他xxxctl的命令原理及操作类似(例如kubectl,systemctl)。
用法:etcdctl [global options] command [command options][args…]
v2版本
数据库操作命令
etcd 在键的组织上采用了层次化的空间结构(类似于文件系统中目录的概念),数据库操作围绕对键值和目录的 CRUD [增删改查](符合 REST 风格的一套操作:Create, Read, Update, Delete)完整生命周期的管理。
具体的命令选项参数可以通过 etcdctl command —help来获取相关帮助。
对象为键值
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
set[增:无论是否存在]:
etcdctl set key value
mk[增:必须不存在]:
etcdctl mk key value
rm[删]:
etcdctl rm key
update[改]:
etcdctl update key value
get[查]:
etcdctl get key
|
对象为目录
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
setdir[增:无论是否存在]:
etcdctl setdir dir
mkdir[增:必须不存在]:
etcdctl mkdir dir
rmdir[删]:
etcdctl rmdir dir
updatedir[改]:
etcdctl updatedir dir
ls[查]:
etcdclt ls
|
非数据库操作命令
backup [备份etcd的数据]
watch [监测一个键值的变化,一旦键值发生了更新,就会输出最新的值并退出]
exec-watch [监测一个键值的变化,一旦键值发生更新,就执行给定命令]
1
|
etcdctl exec-watch key --sh -c "ls"
|
member [通过list、add、remove、update等命令列出、添加、删除更新etcd实例到etcd集群中]
1
2
3
4
5
6
7
8
|
列出
etcdctl member list
添加
etcdctl member add 实例
删除
etcdctl member remove 实例
更新
etcdctl member update 实例
|
etcdctl cluster-health [检查集群监控状态]
注意:这个命令只有v2版本才有,v3版本已剔除此命令
v3版本
使用etcdctl v2版本时,需要设置环境变量 ETCDCTL_API=3
,除了以下操作不一样,其他操作都一致
指定ectd版本以及集群
1
2
|
ETCDCTL_API=3
ENDPOINTS=10.240.0.17:2379,10.240.0.18:2379,10.240.0.19:2379
|
数据库操作
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
|
1、增
etcdctl --endpoints=$ENDPOINTS put foo "Hello World!"
2、查
etcdctl --endpoints=$ENDPOINTS get foo
# 输出为json格式
etcdctl --endpoints=$ENDPOINTS --write-out="json" get foo
基于相同前缀查找
etcdctl --endpoints=$ENDPOINTS put web1 value1
etcdctl --endpoints=$ENDPOINTS put web2 value2
etcdctl --endpoints=$ENDPOINTS put web3 value3
etcdctl --endpoints=$ENDPOINTS get web --prefix
列出所有的key
etcdctl --endpoints=$ENDPOINTS get / --prefix --keys-only
3、删
etcdctl --endpoints=$ENDPOINTS put key myvalue
etcdctl --endpoints=$ENDPOINTS del key
etcdctl --endpoints=$ENDPOINTS put k1 value1
etcdctl --endpoints=$ENDPOINTS put k2 value2
etcdctl --endpoints=$ENDPOINTS del k --prefix
|
集群状态
集群状态主要是etcdctl endpoint status
和etcdctl endpoint health
两条命令。
1
2
3
4
5
6
7
8
9
10
11
12
|
etcdctl --write-out=table --endpoints=$ENDPOINTS endpoint status
+------------------+------------------+---------+---------+-----------+-----------+------------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
+------------------+------------------+---------+---------+-----------+-----------+------------+
| 10.240.0.17:2379 | 4917a7ab173fabe7 | 3.0.0 | 45 kB | true | 4 | 16726 |
| 10.240.0.18:2379 | 59796ba9cd1bcd72 | 3.0.0 | 45 kB | false | 4 | 16726 |
| 10.240.0.19:2379 | 94df724b66343e6c | 3.0.0 | 45 kB | false | 4 | 16726 |
+------------------+------------------+---------+---------+-----------+-----------+------------+
etcdctl --endpoints=$ENDPOINTS endpoint health
10.240.0.17:2379 is healthy: successfully committed proposal: took = 3.345431ms
10.240.0.19:2379 is healthy: successfully committed proposal: took = 3.767967ms
10.240.0.18:2379 is healthy: successfully committed proposal: took = 4.025451ms
|
集群成员
跟集群成员相关的命令如下:
1
2
3
4
|
member add Adds a member into the cluster
member remove Removes a member from the cluster
member update Updates a member in the cluster
member list Lists all members in the cluster
|
例如 etcdctl member list列出集群成员的命令。
1
2
3
4
5
6
|
etcdctl --endpoints=http://172.16.5.4:12379 member list -w table
+-----------------+---------+-------+------------------------+-----------------------------------------------+
| ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS |
+-----------------+---------+-------+------------------------+-----------------------------------------------+
| c856d92a82ba66a | started | etcd0 | http://172.16.5.4:2380 | http://172.16.5.4:2379,http://172.16.5.4:4001 |
+-----------------+---------+-------+------------------------+-----------------------------------------------+
|
备份与恢复
备份
1
2
|
# mkdir /tmp/backup/etcd/ # 用于存放备份数据
# ETCDCTL_API=3 etcdctl --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key --endpoints="https://192.168.1.92:2379" snapshot save /tmp/backup/etcd/etcd-snapshot-`date +%Y%m%d`.db
|
备份脚本
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
#!/usr/bin/env bash
CACERT="/etc/kubernetes/pki/etcd/ca.crt"
CERT="/etc/kubernetes/pki/etcd/server.crt"
EKY="/etc/kubernetes/pki/etcd/server.key"
ENDPOINTS="https://192.168.1.92:2379"
ETCDCTL_API=3 etcdctl \
--cacert="${CACERT}" --cert="${CERT}" --key="${EKY}" \
--endpoints=${ENDPOINTS} \
snapshot save /tmp/backup/etcd/etcd-snapshot-`date +%Y%m%d`.db
# 备份保留30天
find /tmp/backup/etcd -name *.db -mtime +30 -exec rm -f {} \;
|
恢复
单节点恢复
原集群为单节点集群,现在要拿备份文件做恢复操作,上面我把测试环境的etcd单节点已做备份,现在把它还原到另一台etcd节点上
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
|
安装
[root@etcd1 ~]# wget https://mirrors.huaweicloud.com/etcd/v3.4.10/etcd-v3.4.10-linux-amd64.tar.gz
[root@etcd1 ~]# tar -xf etcd-v3.4.10-linux-amd64.tar.gz
[root@etcd1 ~]# cp /root/etcd-v3.4.10-linux-amd64/{etcd,etcdctl} /usr/bin/
[root@etcd1 ~]# systemctl disable --now firewalld
[root@etcd1 ~]# vim /usr/lib/systemd/system/etcd.service
[Unit]
Description=Etcd Server
After=network.target
[Service]
Type=simple
WorkingDirectory=/data/etcd
EnvironmentFile=-/etc/etcd/etcd.conf
ExecStart=/usr/bin/etcd
[Install]
WantedBy=multi-user.target
[root@etcd1 ~]# mkdir -p /data/etcd
[root@etcd1 ~]# mkdir -p /etc/etcd/
[root@etcd1 ~]# vim /etc/etcd/etcd.conf
ETCD_NAME=default
ETCD_DATA_DIR="/data/etcd/default.etcd/"
ETCD_LISTEN_CLIENT_URLS="http://0.0.0.0:2379"
ETCD_ADVERTISE_CLIENT_URLS="http://0.0.0.0:2379"
[root@etcd1 ~]# systemctl enable --now etcd
1、停止etcd服务
[root@etcd1 ~]# systemctl stop etcd
2、将备份文件拷贝到当前文件
3、备份之前的数据文件
[root@etcd1 ~]# mv /data/etcd/default.etcd{,.bak}
4、恢复
[root@etcd1 ~]# ETCDCTL_API=3 etcdctl snapshot restore etcd-snapshot-20210824.db --data-dir=/data/etcd/default.etcd
{"level":"info","ts":1629804137.2476945,"caller":"snapshot/v3_snapshot.go:296","msg":"restoring snapshot","path":"etcd-snapshot-20210824.db","wal-dir":"/data/etcd/default.etcd/member/wal","data-dir":"/data/etcd/default.etcd","snap-dir":"/data/etcd/default.etcd/member/snap"}
{"level":"info","ts":1629804141.1843638,"caller":"mvcc/kvstore.go:380","msg":"restored last compact revision","meta-bucket-name":"meta","meta-bucket-name-key":"finishedCompactRev","restored-compact-revision":193110114}
{"level":"info","ts":1629804142.0297222,"caller":"membership/cluster.go:392","msg":"added member","cluster-id":"cdf818194e3a8c32","local-member-id":"0","added-peer-id":"8e9e05c52164694d","added-peer-peer-urls":["http://localhost:2380"]}
{"level":"info","ts":1629804142.0416465,"caller":"snapshot/v3_snapshot.go:309","msg":"restored snapshot","path":"etcd-snapshot-20210824.db","wal-dir":"/data/etcd/default.etcd/member/wal","data-dir":"/data/etcd/default.etcd","snap-dir":"/data/etcd/default.etcd/member/snap"}
5、启动
[root@etcd1 ~]# systemctl start etcd
6、验证
[root@etcd1 ~]# ETCDCTL_API=3 etcdctl --endpoints=http://127.0.0.1:2379 get /registry/configmaps/kube-system/kube-flannel-cfg
/registry/configmaps/kube-system/kube-flannel-cfg
k8s
v1 ConfigMap
¬
kube-flannel-cfg
kube-system"*$54eb4186-a47b-11ea-b08d-000c293ad7922¤������
appflannelZ
tiernodeb²
0kubectl.kubernetes.io/last-applied-configuration������iVersion":"v1","data":{"cni-conf.json":"{\n \"name\": \"cbr0\",\n \"cniVersion\": \"0.3.1\",\n \"plugins\": [\n {\n \"type\": \"flannel\",\n \"delegate\": {\n \"hairpinMode\": true,\n \"isDefaultGateway\": true\n }\n },\n {\n \"type\": \"portmap\",\n \"capabilities\": {\n \"portMappings\": true\n }\n }\n ]\n}\n","net-conf.json":"{\n \"Network\": \"10.244.0.0/16\",\n \"Backend\": {\n \"Type\": \"vxlan\"\n }\n}\n"},"kind":"ConfigMap","metadata":{"annotations":{},"labels":{"app":"flannel","tier":"node"},"name":"kube-flannel-cfg","namespace":"kube-system"}}
z¶
cni-conf.json¤{
"name": "cbr0",
"cniVersion": "0.3.1",
"plugins": [
{
"type": "flannel",
"delegate": {
"hairpinMode": true,
"isDefaultGateway": true
}
},
{
"type": "portmap",
"capabilities": {
"portMappings": true
}
}
]
}
Z
net-conf.jsonI{
"Network": "10.244.0.0/16",
"Backend": {
"Type": "vxlan"
}
}
"
|
单节点数据迁移到集群
现在有需求:由于单点etcd不稳靠,所以现在需要把上面单台etcd的数据迁移到集群中。
etcd-1 |
192.168.116.15 |
etcd-2 |
192.168.116.16 |
etcd-3 |
192.168.116.17 |
创建证书
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
|
使用cfssl来生成自签证书,先下载cfssl工具:
[root@etcd-1 ~]# wget https://pkg.cfssl.org/R1.2/cfssl_linux-amd64
[root@etcd-1 ~]# wget https://pkg.cfssl.org/R1.2/cfssljson_linux-amd64
[root@etcd-1 ~]# wget https://pkg.cfssl.org/R1.2/cfssl-certinfo_linux-amd64
[root@etcd-1 ~]# chmod +x cfssl_linux-amd64 cfssljson_linux-amd64 cfssl-certinfo_linux-amd64
[root@etcd-1 ~]# mv cfssl_linux-amd64 /usr/local/bin/cfssl
[root@etcd-1 ~]# mv cfssljson_linux-amd64 /usr/local/bin/cfssljson
[root@etcd-1 ~]# mv cfssl-certinfo_linux-amd64 /usr/local/bin/cfssl-certinfo
# 创建CA(Certificate Authority)
#导出默认配置模板
[root@etcd-1 ~]# mkdir ssl
[root@etcd-1 ~]# cd ssl/
[root@etcd-1 ssl]# cfssl print-defaults config > config.json
#导出默认证书签名请求csr模板
[root@etcd-1 ssl]# cfssl print-defaults csr > csr.json
#根据config.json模板格式创建ca-config.json文件
[root@etcd-1 ssl]# cat config.json
{
"signing": {
"default": {
"expiry": "438000h"
},
"profiles": {
"server": {
"expiry": "438000h",
"usages": [
"signing",
"key encipherment",
"server auth",
"client auth"
]
},
"client": {
"expiry": "438000h",
"usages": [
"signing",
"key encipherment",
"client auth"
]
},
"peer": {
"expiry": "438000h",
"usages": [
"signing",
"key encipherment",
"server auth",
"client auth"
]
}
}
}
}
# 根据csr.json模板格式创建ca-csr.json文件
[root@etcd-1 ssl]# cat ca-csr.json
{
"CN": "etcd",
"key": {
"algo": "ecdsa",
"size": 256
}
}
#生成CA证书和私钥
[root@etcd-1 ssl]# cfssl gencert -initca ca-csr.json | cfssljson -bare ca
该命令会生成运行CA所必需的文件ca-key.pem(私钥)和ca.pem(证书),还会生成 ca.csr(证书签名请求),用于交叉签名或重新签名。
#创建client端证书签名请求csr文件
[root@etcd-1 ssl]# cat client.json
{
"CN": "client",
"key": {
"algo": "ecdsa",
"size": 256
}
}
[root@etcd-1 ssl]# cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=config.json -profile=client client.json | cfssljson -bare client
#创建etcd server端和peer证书请求csr文件
[root@etcd-1 ssl]# cat etcd.json
{
"CN": "etcd",
"hosts": [
"192.168.116.15",
"192.168.116.16",
"192.168.116.17",
"192.168.116.18",
"192.168.116.19",
"192.168.116.20",
"192.168.116.21",
"192.168.116.22",
"192.168.116.23",
"192.168.116.24",
"192.168.116.25"
],
"key": {
"algo": "ecdsa",
"size": 256
},
"names": [
{
"C": "CN",
"L": "GG",
"ST": "SZ"
}
]
}
hosts内的其他节点为预留节点,作为扩容用的
#生成server端证书
[root@etcd-1 ssl]# cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=config.json -profile=server etcd.json | cfssljson -bare server
#生成peer证书
[root@etcd-1 ssl]# cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=config.json -profile=peer etcd.json | cfssljson -bare peer
|
集群安装
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
|
[root@etcd-1 ~]# wget https://mirrors.huaweicloud.com/etcd/v3.4.10/etcd-v3.4.10-linux-amd64.tar.gz
[root@etcd-1 ~]# scp etcd-v3.4.10-linux-amd64.tar.gz 192.168.116.16:
[root@etcd-1 ~]# scp etcd-v3.4.10-linux-amd64.tar.gz 192.168.116.17:
#将证书目录拷贝过去
[root@etcd-1 ~]# scp -r ssl/ 192.168.116.16:
[root@etcd-1 ~]# scp -r ssl/ 192.168.116.17:
解压二进制包,所有机器一样操作
[root@etcd-1 ~]# mkdir /opt/etcd/{bin,cfg,ssl,data} -p
[root@etcd-1 ~]# tar -xf etcd-v3.4.10-linux-amd64.tar.gz -C /opt/
[root@etcd-1 ~]# mv /opt/etcd-v3.4.10-linux-amd64/{etcd,etcdctl} /opt/etcd/bin/
[root@etcd-1 ~]# cp /root/ssl/*.pem /opt/etcd/ssl/
[root@etcd-1 ~]# systemctl disable --now firewalld
etcd-1上操作
#创建etcd的环境变量文件
[root@etcd-3 ~]# cat /opt/etcd/cfg/etcd.conf
NAME="etcd-1"
DATA_DIR="/opt/etcd/data"
LISTEN_PEER_URLS="https://192.168.116.15:2380"
LISTEN_CLIENT_URLS="https://192.168.116.15:2379"
INITIAL_ADVERTISE_PEER_URLS="https://192.168.116.15:2380"
ADVERTISE_CLIENT_URLS="https://192.168.116.15:2379"
INITIAL_CLUSTER="etcd-1=https://192.168.116.15:2380,etcd-2=https://192.168.116.16:2380"
INITIAL_CLUSTER_TOKEN="etcd-cluster"
INITIAL_CLUSTER_STATE="new"
#配置systemd管理etcd
[root@etcd-3 ~]# cat /usr/lib/systemd/system/etcd.service
[Unit]
Description=Etcd Server
After=network.target
After=network-online.target
Wants=network-online.target
[Service]
Type=notify
EnvironmentFile=/opt/etcd/cfg/etcd.conf
ExecStart=/opt/etcd/bin/etcd \
--name=${NAME} \
--data-dir=${DATA_DIR} \
--listen-peer-urls=${LISTEN_PEER_URLS} \
--listen-client-urls=${LISTEN_CLIENT_URLS},https://127.0.0.1:2379 \
--advertise-client-urls=${ADVERTISE_CLIENT_URLS} \
--initial-advertise-peer-urls=${INITIAL_ADVERTISE_PEER_URLS} \
--initial-cluster=${INITIAL_CLUSTER} \
--initial-cluster-token=${INITIAL_CLUSTER_TOKEN} \
--initial-cluster-state=${INITIAL_CLUSTER_STATE} \
--cert-file=/opt/etcd/ssl/server.pem \
--key-file=/opt/etcd/ssl/server-key.pem \
--peer-cert-file=/opt/etcd/ssl/peer.pem \
--peer-key-file=/opt/etcd/ssl/peer-key.pem \
--trusted-ca-file=/opt/etcd/ssl/ca.pem \
--peer-trusted-ca-file=/opt/etcd/ssl/ca.pem
Restart=on-failure
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target
#数据目录权限要求700
[root@etcd-3 ~]# chmod 700 /opt/etcd/data/
etcd-2操作
[root@etcd-2 ~]# cat /opt/etcd/cfg/etcd.conf
NAME="etcd-2"
DATA_DIR="/opt/etcd/data"
LISTEN_PEER_URLS="https://192.168.116.16:2380"
LISTEN_CLIENT_URLS="https://192.168.116.16:2379"
INITIAL_ADVERTISE_PEER_URLS="https://192.168.116.16:2380"
ADVERTISE_CLIENT_URLS="https://192.168.116.16:2379"
INITIAL_CLUSTER="etcd-1=https://192.168.116.15:2380,etcd-2=https://192.168.116.16:2380"
INITIAL_CLUSTER_TOKEN="etcd-cluster"
INITIAL_CLUSTER_STATE="new"
[root@etcd-2 ~]# cat /usr/lib/systemd/system/etcd.service
[Unit]
Description=Etcd Server
After=network.target
After=network-online.target
Wants=network-online.target
[Service]
Type=notify
EnvironmentFile=/opt/etcd/cfg/etcd.conf
ExecStart=/opt/etcd/bin/etcd \
--name=${NAME} \
--data-dir=${DATA_DIR} \
--listen-peer-urls=${LISTEN_PEER_URLS} \
--listen-client-urls=${LISTEN_CLIENT_URLS},https://127.0.0.1:2379 \
--advertise-client-urls=${ADVERTISE_CLIENT_URLS} \
--initial-advertise-peer-urls=${INITIAL_ADVERTISE_PEER_URLS} \
--initial-cluster=${INITIAL_CLUSTER} \
--initial-cluster-token=${INITIAL_CLUSTER_TOKEN} \
--initial-cluster-state=${INITIAL_CLUSTER_STATE} \
--cert-file=/opt/etcd/ssl/server.pem \
--key-file=/opt/etcd/ssl/server-key.pem \
--peer-cert-file=/opt/etcd/ssl/peer.pem \
--peer-key-file=/opt/etcd/ssl/peer-key.pem \
--trusted-ca-file=/opt/etcd/ssl/ca.pem \
--peer-trusted-ca-file=/opt/etcd/ssl/ca.pem
Restart=on-failure
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target
[root@etcd-3 ~]# chmod 700 /opt/etcd/data/
将etcd-1和etcd-2启动
[root@etcd-2 ~]# systemctl enable --now etcd
检查集群状态
[root@etcd-1 ~]# /opt/etcd/bin/etcdctl --cacert=/opt/etcd/ssl/ca.pem --cert=/opt/etcd/ssl/server.pem --key=/opt/etcd/ssl/server-key.pem --endpoints="https://192.168.116.15:2379,https://192.168.116.16:2379" -w table endpoint --cluster status
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://192.168.116.16:2379 | 38e677eb51b6e690 | 3.4.10 | 25 kB | false | false | 78 | 7 | 7 | |
| https://192.168.116.15:2379 | b5779181c59c3700 | 3.4.10 | 25 kB | true | false | 78 | 7 | 7 | |
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
或者
[root@etcd-1 ~]# /opt/etcd/bin/etcdctl --cacert=/opt/etcd/ssl/ca.pem --cert=/opt/etcd/ssl/server.pem --key=/opt/etcd/ssl/server-key.pem --endpoints="https://192.168.116.15:2379,https://192.168.116.16:2379" -w table endpoint status
注意这里是列出的是endpoints里的节点,而endpoint --cluster status 列出的是集群所有节点
|
将etcd3加入集群
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
|
添加节点etcd-3到集群内
[root@etcd-1 ~]# /opt/etcd/bin/etcdctl --cacert=/opt/etcd/ssl/ca.pem --cert=/opt/etcd/ssl/server.pem --key=/opt/etcd/ssl/server-key.pem --endpoints="https://192.168.116.15:2379,https://192.168.116.16:2379" member add etcd-3 --peer-urls=https://192.168.116.17:2380
Member 4238c12f7fcf2ceb added to cluster 908202c1add782e4
ETCD_NAME="etcd-3"
ETCD_INITIAL_CLUSTER="etcd-2=https://192.168.116.16:2380,etcd-3=https://192.168.116.17:2380,etcd-1=https://192.168.116.15:2380"
ETCD_INITIAL_ADVERTISE_PEER_URLS="https://192.168.116.17:2380"
ETCD_INITIAL_CLUSTER_STATE="existing"
#注意新节点的etcd配置文件必须包括以上输出内容
查看集群成员
[root@etcd-1 ~]# /opt/etcd/bin/etcdctl --cacert=/opt/etcd/ssl/ca.pem --cert=/opt/etcd/ssl/server.pem --key=/opt/etcd/ssl/server-key.pem --endpoints="https://192.168.116.15:2379,https://192.168.116.16:2379" -w table member list
+------------------+-----------+--------+-----------------------------+-----------------------------+------------+
| ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | IS LEARNER |
+------------------+-----------+--------+-----------------------------+-----------------------------+------------+
| 4da14f0c9981c64 | unstarted | | https://192.168.116.17:2380 | | false |
| 38e677eb51b6e690 | started | etcd-2 | https://192.168.116.16:2380 | https://192.168.116.16:2379 | false |
| b5779181c59c3700 | started | etcd-1 | https://192.168.116.15:2380 | https://192.168.116.15:2379 | false |
+------------------+-----------+--------+-----------------------------+-----------------------------+------------+
在etcd-3上安装配置etcd
[root@etcd-3 ~]# cat /opt/etcd/cfg/etcd.conf
DATA_DIR="/opt/etcd/data"
LISTEN_PEER_URLS="https://192.168.116.17:2380"
LISTEN_CLIENT_URLS="https://192.168.116.17:2379"
ADVERTISE_CLIENT_URLS="https://192.168.116.17:2379"
INITIAL_CLUSTER_TOKEN="etcd-cluster"
NAME="etcd-3"
INITIAL_CLUSTER="etcd-2=https://192.168.116.16:2380,etcd-3=https://192.168.116.17:2380,etcd-1=https://192.168.116.15:2380"
INITIAL_ADVERTISE_PEER_URLS="https://192.168.116.17:2380"
INITIAL_CLUSTER_STATE="existing"
[root@etcd-3 ~]# cat /usr/lib/systemd/system/etcd.service
[Unit]
Description=Etcd Server
After=network.target
After=network-online.target
Wants=network-online.target
[Service]
Type=notify
EnvironmentFile=/opt/etcd/cfg/etcd.conf
ExecStart=/opt/etcd/bin/etcd \
--name=${NAME} \
--data-dir=${DATA_DIR} \
--listen-peer-urls=${LISTEN_PEER_URLS} \
--listen-client-urls=${LISTEN_CLIENT_URLS},https://127.0.0.1:2379 \
--advertise-client-urls=${ADVERTISE_CLIENT_URLS} \
--initial-advertise-peer-urls=${INITIAL_ADVERTISE_PEER_URLS} \
--initial-cluster=${INITIAL_CLUSTER} \
--initial-cluster-token=${INITIAL_CLUSTER_TOKEN} \
--initial-cluster-state=${INITIAL_CLUSTER_STATE} \
--cert-file=/opt/etcd/ssl/server.pem \
--key-file=/opt/etcd/ssl/server-key.pem \
--peer-cert-file=/opt/etcd/ssl/peer.pem \
--peer-key-file=/opt/etcd/ssl/peer-key.pem \
--trusted-ca-file=/opt/etcd/ssl/ca.pem \
--peer-trusted-ca-file=/opt/etcd/ssl/ca.pem
Restart=on-failure
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target
[root@etcd-3 ~]# chmod 700 /opt/etcd/data/
将etcd-3上的etcd启动
[root@etcd-3 ~]# systemctl enable --now etcd
#再次查看集群成员以及集群状态可以发现etcd-3已经成功加入到了集群内
[root@etcd-1 ~]# /opt/etcd/bin/etcdctl --cacert=/opt/etcd/ssl/ca.pem --cert=/opt/etcd/ssl/server.pem --key=/opt/etcd/ssl/server-key.pem --endpoints="https://192.168.116.15:2379,https://192.168.116.16:2379" -w table member list
+------------------+---------+--------+-----------------------------+-----------------------------+------------+
| ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | IS LEARNER |
+------------------+---------+--------+-----------------------------+-----------------------------+------------+
| 1d1a009f14d49f5d | started | etcd-3 | https://192.168.116.17:2380 | https://192.168.116.17:2379 | false |
| 38e677eb51b6e690 | started | etcd-2 | https://192.168.116.16:2380 | https://192.168.116.16:2379 | false |
| b5779181c59c3700 | started | etcd-1 | https://192.168.116.15:2380 | https://192.168.116.15:2379 | false |
+------------------+---------+--------+-----------------------------+-----------------------------+------------+
[root@etcd-1 ~]# /opt/etcd/bin/etcdctl --cacert=/opt/etcd/ssl/ca.pem --cert=/opt/etcd/ssl/server.pem --key=/opt/etcd/ssl/server-key.pem --endpoints="https://192.168.116.15:2379,https://192.168.116.16:2379" -w table endpoint --cluster status
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://192.168.116.17:2379 | 1d1a009f14d49f5d | 3.4.10 | 25 kB | false | false | 3 | 8 | 8 | |
| https://192.168.116.16:2379 | 38e677eb51b6e690 | 3.4.10 | 25 kB | false | false | 3 | 8 | 8 | |
| https://192.168.116.15:2379 | b5779181c59c3700 | 3.4.10 | 20 kB | true | false | 3 | 8 | 8 | |
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|
数据迁移
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
|
在集群节点中随便找一台执行以下操作,将单节点etcd1上的数据迁移到集群内
[root@etcd-1 ~]# ETCDCTL_API=3 /opt/etcd/bin/etcdctl make-mirror --endpoints="http://192.168.116.10:2379" https://192.168.116.15:2379 --dest-cacert=/opt/etcd/ssl/ca.pem --dest-cert=/opt/etcd/ssl/client.pem --dest-key=/opt/etcd/ssl/client-key.pem
迁移完后验证
例如,查看所有的键,由于键太多了,这里只看前20行的数据
[root@etcd-2 ~]# ETCDCTL_API=3 /opt/etcd/bin/etcdctl --cacert=/opt/etcd/ssl/ca.pem --cert=/opt/etcd/ssl/client.pem --key=/opt/etcd/ssl/client-key.pem --endpoints="https://192.168.116.15:2379,https://192.168.116.16:2379,https://192.168.116.17:2379" get / --prefix --keys-only | head -20
/registry/apiregistration.k8s.io/apiservices/v1.
/registry/apiregistration.k8s.io/apiservices/v1.apps
/registry/apiregistration.k8s.io/apiservices/v1.authentication.k8s.io
/registry/apiregistration.k8s.io/apiservices/v1.authorization.k8s.io
/registry/apiregistration.k8s.io/apiservices/v1.autoscaling
/registry/apiregistration.k8s.io/apiservices/v1.batch
/registry/apiregistration.k8s.io/apiservices/v1.coordination.k8s.io
/registry/apiregistration.k8s.io/apiservices/v1.networking.k8s.io
/registry/apiregistration.k8s.io/apiservices/v1.rbac.authorization.k8s.io
/registry/apiregistration.k8s.io/apiservices/v1.scheduling.k8s.io
注意镜像是不会停止的,每个30s会输出一个已同步键的数量,当键的数量长时间没变化就说明以同步完成
|
集群的备份与恢复
备份
1
2
3
|
集群中任意一台节点备份即可
[root@etcd-1 ~]# ETCDCTL_API=3 /opt/etcd/bin/etcdctl --cacert=/opt/etcd/ssl/ca.pem --cert=/opt/etcd/ssl/client.pem --key=/opt/etcd/ssl/client-key.pem --endpoints="https://192.168.116.15:2379,https://192.168.116.16:2379,https://192.168.116.17:2379" snapshot save etcd-snapshot-`date +%Y%m%d`.db
|
还原
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
|
停止集群中所有的etcd服务,所有节点
[root@etcd-1 ~]# systemctl stop etcd
备份之前的数据目录,所有节点
[root@etcd-1 ~]# mv /opt/etcd/data{,.bak}
准备还原脚本
[root@etcd-1 ~]# cat restore.sh
#!/bin/bash
count=1
for ip in 192.168.116.{15..17}
do
ETCDCTL_API=3 /opt/etcd/bin/etcdctl snapshot restore /root/etcd-snapshot-20210824.db --name etcd-${count} --initial-cluster "etcd-1=https://192.168.116.15:2380,etcd-2=https://192.168.116.16:2380,etcd-3=https://192.168.116.17:2380" \
--initial-cluster-token etcd-cluster --initial-advertise-peer-urls https://${ip}:2380 --data-dir=/root/node${count}
rsync -av --delete /root/node${count}/ $ip:/opt/etcd/data #注意这里/root/node${count}/ 结尾的/不能少,否则是把node1同步到/opt/etcd/data目录下,加上/就是把/root/node${count}/下的member同步到/opt/etcd/data目录下
let count++
done
执行脚本进行还原
[root@etcd-1 ~]# sh restore.sh
修改所有数据目录为700权限
[root@etcd-1 ~]# chmod 700 /opt/etcd/data
依次启动etcd
注意需要修改配置文件,由于刚开始是2个节点,所以需要把etcd-3节点也加入配置文件中
etcd-1 操作
[root@etcd-1 ~]# cat /opt/etcd/cfg/etcd.conf
NAME="etcd-1"
DATA_DIR="/opt/etcd/data"
LISTEN_PEER_URLS="https://192.168.116.15:2380"
LISTEN_CLIENT_URLS="https://192.168.116.15:2379"
INITIAL_ADVERTISE_PEER_URLS="https://192.168.116.15:2380"
ADVERTISE_CLIENT_URLS="https://192.168.116.15:2379"
INITIAL_CLUSTER="etcd-1=https://192.168.116.15:2380,etcd-2=https://192.168.116.16:2380,etcd-3=https://192.168.116.17:2380"
INITIAL_CLUSTER_TOKEN="etcd-cluster"
INITIAL_CLUSTER_STATE="new"
etcd-2操作
[root@etcd-2 ~]# cat /opt/etcd/cfg/etcd.conf
NAME="etcd-2"
DATA_DIR="/opt/etcd/data"
LISTEN_PEER_URLS="https://192.168.116.16:2380"
LISTEN_CLIENT_URLS="https://192.168.116.16:2379"
INITIAL_ADVERTISE_PEER_URLS="https://192.168.116.16:2380"
ADVERTISE_CLIENT_URLS="https://192.168.116.16:2379"
INITIAL_CLUSTER="etcd-1=https://192.168.116.15:2380,etcd-2=https://192.168.116.16:2380,etcd-3=https://192.168.116.17:2380"
INITIAL_CLUSTER_TOKEN="etcd-cluster"
INITIAL_CLUSTER_STATE="new"
etcd-3操作
[root@etcd-3 ~]# cat /opt/etcd/cfg/etcd.conf
DATA_DIR="/opt/etcd/data"
LISTEN_PEER_URLS="https://192.168.116.17:2380"
LISTEN_CLIENT_URLS="https://192.168.116.17:2379"
ADVERTISE_CLIENT_URLS="https://192.168.116.17:2379"
INITIAL_CLUSTER_TOKEN="etcd-cluster"
NAME="etcd-3"
INITIAL_CLUSTER="etcd-2=https://192.168.116.16:2380,etcd-3=https://192.168.116.17:2380,etcd-1=https://192.168.116.15:2380"
INITIAL_ADVERTISE_PEER_URLS="https://192.168.116.17:2380"
INITIAL_CLUSTER_STATE="new"
依次启动
[root@etcd-1 ~]# systemctl start etcd
#验证
[root@etcd-1 ~]# /opt/etcd/bin/etcdctl --cacert=/opt/etcd/ssl/ca.pem --cert=/opt/etcd/ssl/server.pem --key=/opt/etcd/ssl/server-key.pem --endpoints="https://192.168.116.15:2379,https://192.168.116.16:2379,https://192.168.116.17:2379" -w table endpoint status
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://192.168.116.15:2379 | b5779181c59c3700 | 3.4.10 | 25 kB | false | false | 41 | 9 | 9 | |
| https://192.168.116.16:2379 | 38e677eb51b6e690 | 3.4.10 | 20 kB | false | false | 41 | 9 | 9 | |
| https://192.168.116.17:2379 | 630a75ff7591bbf7 | 3.4.10 | 25 kB | true | false | 41 | 9 | 9 | |
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
验证数据是否恢复
[root@etcd-2 ~]# ETCDCTL_API=3 /opt/etcd/bin/etcdctl --cacert=/opt/etcd/ssl/ca.pem --cert=/opt/etcd/ssl/client.pem --key=/opt/etcd/ssl/client-key.pem --endpoints="https://192.168.116.15:2379,https://192.168.116.16:2379,https://192.168.116.17:2379" get / --prefix --keys-only | egrep -v '^$' | wc -l
24172
可以看到数据是已经恢复成功了
|
kubernetes etcd恢复顺序
停止kube-apiserver –> 停止ETCD –> 恢复数据 –> 启动ETCD –> 启动kube-apiserve
历史数据压缩
key空间长期的时候,如果没有做压缩清理,到达上限的阈值时,集群会处于一个只能删除和读的状态,无法进行写操作。因此对集群的历史日志做一个压缩清理是很有必要。
数据压缩并不是清理现有数据,只是对数据的历史版本进行清理,清理后数据的历史版本将不能访问,但不会影响现有最新数据的访问。
手动压缩
1
2
3
4
5
6
7
8
9
10
11
12
|
压缩清理revision为10之前的历史数据
[root@etcd-2 ~]# ETCDCTL_API=3 /opt/etcd/bin/etcdctl --cacert=/opt/etcd/ssl/ca.pem --cert=/opt/etcd/ssl/client.pem --key=/opt/etcd/ssl/client-key.pem --endpoints="https://192.168.116.15:2379,https://192.168.116.16:2379,https://192.168.116.17:2379" get /registry/configmaps/kube-system/kube-flannel-cfg
reversion可以通过以下命令查看
[root@etcd-2 ~]# ETCDCTL_API=3 /opt/etcd/bin/etcdctl --cacert=/opt/etcd/ssl/ca.pem --cert=/opt/etcd/ssl/client.pem --key=/opt/etcd/ssl/client-key.pem --endpoints="https://192.168.116.15:2379,https://192.168.116.16:2379,https://192.168.116.17:2379" get /registry/configmaps/kube-system/kube-flannel-cfg -w json
#访问revision10之前的数据会提示已经不存在
[root@etcd-2 ~]# ETCDCTL_API=3 /opt/etcd/bin/etcdctl --cacert=/opt/etcd/ssl/ca.pem --cert=/opt/etcd/ssl/client.pem --key=/opt/etcd/ssl/client-key.pem --endpoints="https://192.168.116.15:2379,https://192.168.116.16:2379,https://192.168.116.17:2379" get /registry/configmaps/kube-system/kube-flannel-cfg --rev=9
{"level":"warn","ts":"2021-08-24T23:02:30.151+0800","caller":"clientv3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"endpoint://client-26a4a8df-de68-459d-9999-2ef8a949b43e/192.168.116.15:2379","attempt":0,"error":"rpc error: code = OutOfRange desc = etcdserver: mvcc: required revision has been compacted"}
Error: etcdserver: mvcc: required revision has been compacted
|
自动压缩
使用--auto-compaction-retention=1
,表示每小时进行一次数据压缩。
碎片清理
进行compaction操作之后,旧的revision被压缩,会产生内部的碎片,内部碎片是指空闲状态的,能被后端使用但是仍然消耗存储空间的磁盘空间。去碎片化实际上是将存储空间还给文件系统。
1
|
[root@etcd-2 ~]# ETCDCTL_API=3 /opt/etcd/bin/etcdctl --cacert=/opt/etcd/ssl/ca.pem --cert=/opt/etcd/ssl/client.pem --key=/opt/etcd/ssl/client-key.pem --endpoints="https://192.168.116.15:2379,https://192.168.116.16:2379,https://192.168.116.17:2379" defrag
|