订阅 RSS 源

With the observability service enabled, you can use Red Hat Advanced Cluster Management for Kubernetes (RHACM) to gain insight about and optimize your managed clusters. If the managed cluster is Red Hat OpenShift Container Platform (RHOCP) 4.8+ or KS cluster, you can see alerts from all the managed clusters in the hub cluster.

Screen Shot 2022-06-30 at 8.50.06 AM

You also can configure forward alerts with an external notification system.

In this blog post, I introduce how to use amtool to manage RHACM alerts.

amtool

The amtool is a CLI tool for interacting with the alertmanager API. It's bundled with all releases of Alertmanager. You can install locally with the following command: 
go get github.com/prometheus/alertmanager/cmd/amtool

Note: If you're the user who can access the observability-alertmanager-0 pod directly, you can use amtool, which is bundled with that pod. Use the amtool alert --alertmanager.url=http://localhost:9093 command to list alerts.

Connect to RHACM Alertmanager

RHACM exposes the alertmanager API through a route. You can get the alertmanager URL by using the following command:
oc get route alertmanager -n open-cluster-management-observability -o jsonpath="{.spec.host}"

Before you connect to alertmanager, you also need to pass bearer-token to amtool. You can get the bearer token from RHACM console Configure client or fetch the token with the following command: oc whoami -t if you have logged in your OCP cluster. You can create a config file in YAML format from one of two default config locations: $HOME/.config/amtool/config.yml or /etc/amtool/config.yml. View the following example syntax:

alertmanager.url: https://alertmanager-open-cluster-management-observability.apps.xxx
http.config.file: $HOME/.config/amtool/http_config.yml

Specify the file for the http.config.file parameter and in http_config format. View the following example synntax:

authorization:
type: Bearer
credentials: sha256~xxxxxxx
tls_config:
insecure_skip_verify: true

Configuration

You can use amtool to understand the current alertmanager configuration. View the following alertmanager configuration sample:

global:
resolve_timeout: 5m
http_config: {}
smtp_hello: localhost
smtp_require_tls: true
slack_api_url: <secret>
pagerduty_url: https://events.pagerduty.com/v2/enqueue
hipchat_api_url: https://api.hipchat.com/
opsgenie_api_url: https://api.opsgenie.com/
wechat_api_url: https://qyapi.weixin.qq.com/cgi-bin/
victorops_api_url: https://alert.victorops.com/integrations/generic/20131114/alert/
route:
receiver: default-receiver
group_by:
- alertname
- cluster
repeat_interval: 45m
receivers:
- name: default-receiver
slack_configs:
- send_resolved: true
http_config: {}
api_url: <secret>
...

Examples

Continue reading to learn how you can use amtool to manage alerts.

1. View all active alerts with the following command:

$ amtool alert

Alertname Starts At Summary State
KubeCPUOvercommit 2021-10-27 07:47:32 UTC Cluster has overcommitted CPU resource requests. active

2. View all active alerts with extended outputs by running the following command:

$ amtool alert -o extended

Labels Annotations Starts At Ends At Generator URL State
alertname="KubeCPUOvercommit" cluster="cyang2-kind" severity="warning" description="Cluster has overcommitted CPU resource requests for Pods and cannot tolerate node failure." summary="Cluster has overcommitted CPU resource requests." 2021-10-27 07:47:32 UTC 2021-11-10 13:50:02 UTC http://prometheus-k8s-0:9090/graph?g0.expr=sum%28namespace_cpu%3Akube_pod_container_resource_requests%3Asum%29+%2F+sum%28kube_node_status_allocatable%7Bresource%3D%22cpu%22%7D%29+%3E+%28count%28kube_node_status_allocatable%7Bresource%3D%22cpu%22%7D%29+-+1%29+%2F+count%28kube_node_status_allocatable%7Bresource%3D%22cpu%22%7D%29&g0.tab=1 active

3. Silence a specific alert with the following command:

$ amtool silence add alertname=KubeCPUOvercommit --comment=acked
290bb29e-6457-47b0-b10d-140b10418c4c

4. Silence all alerts with the label matches.

RHACM adds the cluster label for each alert. RHACM uses this label to identify where the alert is from. So you can silence all alerts from 1 cluster by using the following commands:

$ amtool silence add cluster="local-cluster" --comment=acked
48ccecdc-abb1-4196-83fd-593ba010ddf3
$ amtool silence add alertname="KubeCPUOvercommit" cluster=~".+1" --comment=acked
18abf36d-b01e-46a0-ba1a-b814acb4bae0

Similarly, regex matching is also supported. The =~ syntax (similar to Prometheus) is used to represent a regex match. Regex matching can be used in combination with a direct match. This statement adds a silence that matches alerts with the alertname="KubeCPUOvercommit", and cluster is at end of 1 label value pairs set.

5. View silences with the following command:

$ amtool silence query

ID Matchers Ends At Created By Comment
290bb29e-6457-47b0-b10d-140b10418c4c alertname="KubeCPUOvercommit" 2021-11-10 14:54:44 UTC chuyang acked

6. Expire a silence with the following command:

$ amtool silence expire 290bb29e-6457-47b0-b10d-140b10418c4c

7. Expire all silences using the following command:

$ amtool silence expire $(amtool silence query -q)

Conclusion

In conclusion, RHACM supports use of amtool to manage RHACM alerts. I hope this blog is helpful to you!


关于作者

UI_Icon-Red_Hat-Close-A-Black-RGB

按频道浏览

automation icon

自动化

有关技术、团队和环境 IT 自动化的最新信息

AI icon

人工智能

平台更新使客户可以在任何地方运行人工智能工作负载

open hybrid cloud icon

开放混合云

了解我们如何利用混合云构建更灵活的未来

security icon

安全防护

有关我们如何跨环境和技术减少风险的最新信息

edge icon

边缘计算

简化边缘运维的平台更新

Infrastructure icon

基础架构

全球领先企业 Linux 平台的最新动态

application development icon

应用领域

我们针对最严峻的应用挑战的解决方案

Virtualization icon

虚拟化

适用于您的本地或跨云工作负载的企业虚拟化的未来