机器电源意外被关闭,导致etcd无法启动

作者: 风 哥 分类: Kubernetes 发布时间: 2019-11-21 16:24

公司内网机房突然断电,当电源恢复时机器已正常启动,但是ETCD服务却挂了。看到这条错貌似是文件损坏导致无法启动ignored file 0000000000000152-00000000011aee3e.wal.broken in wal

具体报错日志如下

11月 21 13:56:49 feiba-etcd2-apiserver-192-168-1-14 etcd[10820]: listening for peers on https://192.168.1.14:2380
11月 21 13:56:49 feiba-etcd2-apiserver-192-168-1-14 etcd[10820]: listening for client requests on 127.0.0.1:2379
11月 21 13:56:49 feiba-etcd2-apiserver-192-168-1-14 etcd[10820]: listening for client requests on 192.168.1.14:2379
11月 21 13:56:49 feiba-etcd2-apiserver-192-168-1-14 etcd[10820]: recovered store from snapshot at index 144149548
11月 21 13:56:49 feiba-etcd2-apiserver-192-168-1-14 etcd[10820]: restore compact to 62375722
11月 21 13:56:49 feiba-etcd2-apiserver-192-168-1-14 etcd[10820]: name = etcd2
11月 21 13:56:49 feiba-etcd2-apiserver-192-168-1-14 etcd[10820]: data dir = /var/lib/etcd/default.etcd
11月 21 13:56:49 feiba-etcd2-apiserver-192-168-1-14 etcd[10820]: member dir = /var/lib/etcd/default.etcd/member
11月 21 13:56:49 feiba-etcd2-apiserver-192-168-1-14 etcd[10820]: heartbeat = 100ms
11月 21 13:56:49 feiba-etcd2-apiserver-192-168-1-14 etcd[10820]: election = 1000ms
11月 21 13:56:49 feiba-etcd2-apiserver-192-168-1-14 etcd[10820]: snapshot count = 100000
11月 21 13:56:49 feiba-etcd2-apiserver-192-168-1-14 etcd[10820]: advertise client URLs = https://127.0.0.1:2379,https://192.168.1.14:2379
11月 21 13:56:49 feiba-etcd2-apiserver-192-168-1-14 etcd[10820]: ignored file 0000000000000152-00000000011aee3e.wal.broken in wal
11月 21 13:56:49 feiba-etcd2-apiserver-192-168-1-14 etcd[10820]: restarting member 399b596d3999ba01 in cluster 2d30d048a16b1ee1 at commit index 144182292
11月 21 13:56:49 feiba-etcd2-apiserver-192-168-1-14 etcd[10820]: 399b596d3999ba01 became follower at term 340908
11月 21 13:56:49 feiba-etcd2-apiserver-192-168-1-14 etcd[10820]: newRaft 399b596d3999ba01 [peers: [399b596d3999ba01,7b170c49403f28a9,be31245426ce45a9], term: 340908, commit: 144182292, applied: 144149548, lastinde
11月 21 13:56:49 feiba-etcd2-apiserver-192-168-1-14 etcd[10820]: enabled capabilities for version 3.2
11月 21 13:56:49 feiba-etcd2-apiserver-192-168-1-14 etcd[10820]: added member 399b596d3999ba01 [https://192.168.1.14:2380] to cluster 2d30d048a16b1ee1 from store
11月 21 13:56:49 feiba-etcd2-apiserver-192-168-1-14 etcd[10820]: added member 7b170c49403f28a9 [https://192.168.1.12:2380] to cluster 2d30d048a16b1ee1 from store
11月 21 13:56:49 feiba-etcd2-apiserver-192-168-1-14 etcd[10820]: added member be31245426ce45a9 [https://192.168.1.13:2380] to cluster 2d30d048a16b1ee1 from store
11月 21 13:56:49 feiba-etcd2-apiserver-192-168-1-14 etcd[10820]: set the cluster version to 3.2 from store
11月 21 13:56:49 feiba-etcd2-apiserver-192-168-1-14 bash[10820]: panic: failed to unmarshal lease proto item
11月 21 13:56:49 feiba-etcd2-apiserver-192-168-1-14 bash[10820]: goroutine 1 [running]:
11月 21 13:56:49 feiba-etcd2-apiserver-192-168-1-14 bash[10820]: github.com/coreos/etcd/lease.(*lessor).initAndRecover(0xc4204dce40)
11月 21 13:56:49 feiba-etcd2-apiserver-192-168-1-14 bash[10820]: /builddir/build/BUILD/etcd-1674e682fe9fbecd66e9f20b77da852ad7f517a9/src/github.com/coreos/etcd/lease/lessor.go:532 +0x4b1
11月 21 13:56:49 feiba-etcd2-apiserver-192-168-1-14 bash[10820]: github.com/coreos/etcd/lease.newLessor(0x5594d3fa3980, 0xc420163860, 0x2, 0xc420b667c0)
11月 21 13:56:49 feiba-etcd2-apiserver-192-168-1-14 bash[10820]: /builddir/build/BUILD/etcd-1674e682fe9fbecd66e9f20b77da852ad7f517a9/src/github.com/coreos/etcd/lease/lessor.go:170 +0x1b5
11月 21 13:56:49 feiba-etcd2-apiserver-192-168-1-14 bash[10820]: github.com/coreos/etcd/lease.NewLessor(0x5594d3fa3980, 0xc420163860, 0x2, 0xc420b66780, 0x2)
11月 21 13:56:49 feiba-etcd2-apiserver-192-168-1-14 bash[10820]: /builddir/build/BUILD/etcd-1674e682fe9fbecd66e9f20b77da852ad7f517a9/src/github.com/coreos/etcd/lease/lessor.go:156 +0x41
11月 21 13:56:49 feiba-etcd2-apiserver-192-168-1-14 bash[10820]: github.com/coreos/etcd/etcdserver.NewServer(0xc42016ba00, 0xc42016ba00, 0x0, 0x0)
11月 21 13:56:49 feiba-etcd2-apiserver-192-168-1-14 bash[10820]: /builddir/build/BUILD/etcd-1674e682fe9fbecd66e9f20b77da852ad7f517a9/src/github.com/coreos/etcd/etcdserver/server.go:441 +0x1413
11月 21 13:56:49 feiba-etcd2-apiserver-192-168-1-14 bash[10820]: github.com/coreos/etcd/embed.StartEtcd(0xc42022a000, 0xc4201eea80, 0x0, 0x0)
11月 21 13:56:49 feiba-etcd2-apiserver-192-168-1-14 bash[10820]: /builddir/build/BUILD/etcd-1674e682fe9fbecd66e9f20b77da852ad7f517a9/src/github.com/coreos/etcd/embed/etcd.go:157 +0x78e
11月 21 13:56:49 feiba-etcd2-apiserver-192-168-1-14 bash[10820]: github.com/coreos/etcd/etcdmain.startEtcd(0xc42022a000, 0x6, 0x5594d3529e05, 0x6, 0x1)
11月 21 13:56:49 feiba-etcd2-apiserver-192-168-1-14 bash[10820]: /builddir/build/BUILD/etcd-1674e682fe9fbecd66e9f20b77da852ad7f517a9/src/github.com/coreos/etcd/etcdmain/etcd.go:186 +0x75
11月 21 13:56:49 feiba-etcd2-apiserver-192-168-1-14 bash[10820]: github.com/coreos/etcd/etcdmain.startEtcdOrProxyV2()
11月 21 13:56:49 feiba-etcd2-apiserver-192-168-1-14 bash[10820]: /builddir/build/BUILD/etcd-1674e682fe9fbecd66e9f20b77da852ad7f517a9/src/github.com/coreos/etcd/etcdmain/etcd.go:103 +0x14df
11月 21 13:56:49 feiba-etcd2-apiserver-192-168-1-14 bash[10820]: github.com/coreos/etcd/etcdmain.Main()
11月 21 13:56:49 feiba-etcd2-apiserver-192-168-1-14 bash[10820]: /builddir/build/BUILD/etcd-1674e682fe9fbecd66e9f20b77da852ad7f517a9/src/github.com/coreos/etcd/etcdmain/main.go:39 +0x137
11月 21 13:56:49 feiba-etcd2-apiserver-192-168-1-14 bash[10820]: main.main()
11月 21 13:56:49 feiba-etcd2-apiserver-192-168-1-14 bash[10820]: /builddir/build/BUILD/etcd-1674e682fe9fbecd66e9f20b77da852ad7f517a9/src/github.com/coreos/etcd/main.go:28 +0x22
11月 21 13:56:49 feiba-etcd2-apiserver-192-168-1-14 systemd[1]: etcd.service: main process exited, code=exited, status=2/INVALIDARGUMENT
11月 21 13:56:49 feiba-etcd2-apiserver-192-168-1-14 systemd[1]: Failed to start Etcd Server.
-- Subject: Unit etcd.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit etcd.service has failed.
-- 
-- The result is failed.
11月 21 13:56:49 feiba-etcd2-apiserver-192-168-1-14 systemd[1]: Unit etcd.service entered failed state.
11月 21 13:56:49 feiba-etcd2-apiserver-192-168-1-14 systemd[1]: etcd.service failed.

将 –initial-cluster-state=new 修改成  –initial-cluster-state=existing,重新启动后恢复正常。

[root@docker03 ~]# cat /usr/lib/systemd/system/etcd.service
[Unit]
Description=Etcd Server
After=network.target
After=network-online.target
Wants=network-online.target
 
[Service]
Type=notify
EnvironmentFile=-/opt/kubernetes/cfg/etcd
ExecStart=/opt/kubernetes/bin/etcd \
--name=${ETCD_NAME} \
--data-dir=${ETCD_DATA_DIR} \
--listen-peer-urls=${ETCD_LISTEN_PEER_URLS} \
--listen-client-urls=${ETCD_LISTEN_CLIENT_URLS},http://127.0.0.1:2379 \
--advertise-client-urls=${ETCD_ADVERTISE_CLIENT_URLS} \
--initial-advertise-peer-urls=${ETCD_INITIAL_ADVERTISE_PEER_URLS} \
--initial-cluster=${ETCD_INITIAL_CLUSTER} \
--initial-cluster-token=${ETCD_INITIAL_CLUSTER} \
--initial-cluster-state=existing \  # 将new这个参数修改成existing,启动正常!
--cert-file=/opt/kubernetes/ssl/server.pem \
--key-file=/opt/kubernetes/ssl/server-key.pem \
--peer-cert-file=/opt/kubernetes/ssl/server.pem \
--peer-key-file=/opt/kubernetes/ssl/server-key.pem \
--trusted-ca-file=/opt/kubernetes/ssl/ca.pem \
--peer-trusted-ca-file=/opt/kubernetes/ssl/ca.pem
Restart=on-failure
LimitNOFILE=65536
 
[Install]
WantedBy=multi-user.target

 

最后再将–initial-cluster-state=existing改回–initial-cluster-state=new  ,重新服务,检查数据是否有正常同步。

 

 

发表评论

电子邮件地址不会被公开。 必填项已用*标注