# prometheus-operator安装以及使用
## 安装helm3包管理工具
### 客户端安装
```
wget https://get.helm.sh/helm-v3.2.0-linux-amd64.tar.gz
tar -xf helm-v3.2.0-linux-amd64.tar.gz
cp linux-amd64/helm /usr/local/bin/
```
### 查看配置信息
```
[root@kube-mas ~]# helm env
HELM_BIN="helm"
HELM_DEBUG="false"
HELM_KUBEAPISERVER=""
HELM_KUBECONTEXT=""
HELM_KUBETOKEN=""
HELM_NAMESPACE="default"
HELM_PLUGINS="/root/.local/share/helm/plugins"
HELM_REGISTRY_CONFIG="/root/.config/helm/registry.json"
HELM_REPOSITORY_CACHE="/root/.cache/helm/repository"
HELM_REPOSITORY_CONFIG="/root/.config/helm/repositories.yaml"
```
### 添加公用的仓库
```
[root@kube-mas ~]# helm repo add aliyuncs https://apphub.aliyuncs.com
[root@kube-mas ~]# helm repo update
```
### 搜索prometheus-operator
```
[root@kube-mas ~]# helm search repo prometheus-operator
NAME CHART VERSION APP VERSION DESCRIPTION
aliyuncs/prometheus-operator 8.7.0 0.35.0 Provides easy monitoring definitions for Kubern...
```
## 安装prometheus-operator
### 下载chart包
```
[root@kube-mas ~]# helm pull aliyuncs/prometheus-operator
[root@kube-mas ~]# tar -xf prometheus-operator-8.7.0.tgz -C /opt/
```
### k8s创建对应的名称空间
```
[root@kube-mas ~]# kubectl create ns mon
```
### 修改value.yaml文件
创建etcd需要的secret
```
[root@kube-mas ~]# cd /etc/kubernetes/pki/etcd/
[root@kube-mas etcd]# kubectl create secret generic etcd-cert --from-file=ca.crt --from-file=ca.key --from-file=server.crt --from-file=server.key -n mon
[root@kube-mas ~]# cd /opt/prometheus-operator/
[root@kube-mas prometheus-operator]# vim values.yaml
```
配置kubeetcd信息
```
743 kubeEtcd:
744 enabled: true
745
746 ## If your etcd is not deployed as a pod, specify IPs it can be found on
747 ##
748 endpoints: []
749 # - 10.141.4.22
750 # - 10.141.4.23
751 # - 10.141.4.24
752
753 ## Etcd service. If using kubeEtcd.endpoints only the port and targetPort are used
754 ##
755 service:
756 port: 2379
757 targetPort: 2379
758 # selector:
759 # component: etcd
760
761 ## Configure secure access to the etcd cluster by loading a secret into prometheus and
762 ## specifying security configuration below. For example, with a secret named etcd-client-cert
763 ##
764 ## serviceMonitor:
765 ## scheme: https
766 ## insecureSkipVerify: false
767 ## serverName: localhost
768 ## caFile: /etc/prometheus/secrets/etcd-client-cert/etcd-ca
769 ## certFile: /etc/prometheus/secrets/etcd-client-cert/etcd-client
770 ## keyFile: /etc/prometheus/secrets/etcd-client-cert/etcd-client-key
771 ##
772 serviceMonitor:
773 ## Scrape interval. If not set, the Prometheus default scrape interval is used.
774 ##
775 interval: ""
776 scheme: https
777 insecureSkipVerify: true
778 serverName: ""
779 caFile: "/etc/prometheus/secrets/ca.crt"
780 certFile: "/etc/prometheus/secrets/server.crt"
781 keyFile: "/etc/prometheus/secrets/server.key"
....
1475 secrets: ["etcd-cert"]
```
k8s集群修改kube-proxy metricsBindAddress的地址
```
[root@kube-mas prometheus-operator]# kubectl edit cm kube-proxy -n kube-system
kind: KubeProxyConfiguration
metricsBindAddress: "0.0.0.0:10249"
[root@kube-mas prometheus-operator]# kubectl get po -n kube-system | awk '/kube-proxy/{print "kubectl delete po -n kube-system "$1}' | sh
```
需要注意的是,这里没有使用pv,如果需要持久化存储,需要配置alertmanager以及prometheus storage下面pvc信息
如果alertmanager需要自定制模板的话,需要创建configmap,然后再alertmanager的configMaps里添加上
例如:
```
[root@kube-mas prometheus-operator]# cat alertmanager-cm.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: wechat-tmpl
namespace: mon
data:
wechat.tmpl: |
{{ define "wechat.default.message" }}
{{- if gt (len .Alerts.Firing) 0 -}}
{{ range .Alerts }}
故障
告警类型: {{ .Labels.alertname }}
告警级别: {{ .Labels.severity }}
=====================
===告警详情===
告警主题: {{ .Annotations.summary }}
告警详情: {{ .Annotations.description }}
故障时间: {{ (.StartsAt.Add 28800e9).Format "2006-01-02 15:04:05" }}
===参考信息===
{{- if gt (len .Labels.instance) 0 -}}
故障实例ip: {{ .Labels.instance }}
{{ end -}}
{{- if gt (len .Labels.namespace) 0 -}}
故障实例所在namespace: {{ .Labels.namespace }}
{{ end -}}
{{- if gt (len .Labels.node) 0 -}}
故障物理机ip: {{ .Labels.node }}
{{ end -}}
{{- if gt (len .Labels.pod) 0 -}}
故障pod名称: {{ .Labels.pod }}
{{ end -}}
=====================
{{ end }}
{{ end -}}
{{- if gt (len .Alerts.Resolved) 0 -}}
{{ range .Alerts }}
故障恢复
告警类型: {{ .Labels.alertname }}
告警级别: {{ .Labels.severity }}
=====================
===告警详情===
告警详情: {{ .Annotations.message }}
故障时间: {{ (.StartsAt.Add 28800e9).Format "2006-01-02 15:04:05" }}
恢复时间: {{ (.EndsAt.Add 28800e9).Format "2006-01-02 15:04:05" }}
===参考信息===
{{- if gt (len .Labels.instance) 0 -}}
故障实例ip: {{ .Labels.instance }}
{{ end -}}
{{- if gt (len .Labels.namespace) 0 -}}
故障实例所在namespace: {{ .Labels.namespace }}
{{ end -}}
{{- if gt (len .Labels.node) 0 -}}
故障物理机ip: {{ .Labels.node }}
{{ end -}}
{{- if gt (len .Labels.pod) 0 -}}
故障pod名称: {{ .Labels.pod }}
{{ end -}}
=====================
{{ end }}
{{ end -}}
{{- end }}
[root@kube-mas prometheus-operator]# kubectl apply -f alertmanager-cm.yaml
```
配置alertmanager信息
```
109 config:
110 global:
111 resolve_timeout: 5m
112 wechat_api_url: 'https://qyapi.weixin.qq.com/cgi-bin/'
113 wechat_api_secret: 'xxxxxxxxxxxxxxxxxxxxxxxxx'
114 wechat_api_corp_id: 'xxxxxxxxxxxxxxxxxxxxxxxx'
115 templates:
116 - '/etc/alertmanager/configmaps/wechat-tmpl/*.tmpl'
117 route:
118 group_by: ['job']
119 group_wait: 30s
120 group_interval: 5m
121 repeat_interval: 12h
122 receiver: 'wechat'
123 receivers:
124 - name: 'wechat'
125 wechat_configs:
126 - send_resolved: true
127 corp_id: 'xxxxxxxxxxxxxxxxxx'
128 to_user: '@all'
129 message: '{{ template "wechat.default.message" . }}'
130 agent_id: '1000010'
131 api_secret: 'xxxxxxxxxxxxxxxxxxxxxx'
132 inhibit_rules:
133 - source_match:
134 severity: 'critical'
135 target_match:
136 severity: 'warning'
137 equal: ['alertname', 'dev', 'instance']
....
278 configMaps: ["wechat-tmpl"]
```
### 安装
```
[root@kube-mas prometheus-operator]# helm install promethus-operator --namespace=mon .
如果出现manifest_sorter.go:192: info: skipping unknown hook: "crd-install"
可以忽略
列出所安装的release
列出所有名称空间
[root@kube-mas prometheus-operator]# helm list -A
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
promethus-operator mon 1 2021-08-06 12:13:03.616675919 +0800 CST deployed prometheus-operator-8.7.0 0.35.0
列出某个名称空间
[root@kube-mas prometheus-operator]# helm list -n mon
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
promethus-operator mon 1 2021-08-06 12:13:03.616675919 +0800 CST deployed prometheus-operator-8.7.0 0.35.0
```
将promethus 9200 集群端口类型改为nodeport端口30090
```
[root@master1 ~]# kubectl patch svc -n mon promethus-operator-prometh-prometheus -p '{"spec":{"type":"NodePort","ports":[{"name":"web","port":9090,"nodePort":30090}]}}'
service/promethus-operator-prometh-prometheus patched
```
将grafana 的80 端口映射到nodeport的30080
```
kubectl patch svc -n mon promethus-operator-grafana -p '{"spec":{"type":"NodePort","ports":[{"name":"service","port":80,"nodePort":30080}]}}'
```
将alertmanager 9093端口映射到nodeport的30093
```
kubectl patch svc -n mon promethus-operator-prometh-alertmanager -p '{"spec":{"type":"NodePort","ports":[{"name":"service","port":9093,"nodePort":30093}]}}'
```
```
查看grafana的账号密码
kubectl get secret promethus-operator-grafana -n mon -o=jsonpath={.data.admin-user} | base64 -d
admin
kubectl get secret promethus-operator-grafana -n mon -o=jsonpath='{.data.admin-password}' | base64 -d
prom-operator
```
### 卸载
```
[root@kube-mas prometheus-operator]# helm uninstall promethus-operator -n mon
```