cAdvisorcontainer advisor,是Google为了对Node机器上的资源及容器进行实时监控和性能数据采集提出的开源解决方还提供基础查询界面和http接口,方便其他组件如Prometheus进行数据抓取,或者cadvisor + influxdb + grafna搭配使用。cAdvisor可以对节点机器上的资源及容器进行实时监控和性能数据采集,包括CPU使用情况、内存使用情况、网络吞吐量及文件系统使用情况。cAdvisor使用Go语言开发,利用Linux的cgroups获取容器的资源使用信息,在 kubernetes 中集成在 kubelet 里作为默认启动项。

部署安装

DaemonSet

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
apiVersion: apps/v1beta2
kind: DaemonSet
metadata:
  name: cadvisor
  namespace: monitoring
  labels:
    app: cadvisor
spec:
  selector:
    matchLabels:
      name: cadvisor
  template:
    metadata:
      labels:
        name: cadvisor
    spec:
      containers:
      - name: cadvisor
        image: google/cadvisor:v0.33.0
        volumeMounts:
        - name: rootfs
          mountPath: /rootfs
          readOnly: true
        - name: var-run
          mountPath: /var/run
          readOnly: false
        - name: sys
          mountPath: /sys
          readOnly: true
        - name: docker
          mountPath: /var/lib/docker
          readOnly: true
        ports:
          - name: http
            containerPort: 8080
            protocol: TCP
        args:
          - --housekeeping_interval=10s
          - --disable_metrics=disk
      terminationGracePeriodSeconds: 30
      volumes:
      - name: rootfs
        hostPath:
          path: /
      - name: var-run
        hostPath:
          path: /var/run
      - name: sys
        hostPath:
          path: /sys
      - name: docker
        hostPath:
          path: /var/lib/docker

Service

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
apiVersion: v1
kind: Service
metadata:
  name: cadvisor
  namespace: monitoring
spec:
  selector:
    name: cadvisor
  ports:
  - name: cadvisor
    protocol: TCP
    port: 8080

Ingress

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: cadvisor
  namespace: monitoring
  annotations:
    kubernetes.io/ingress.class: traefik
spec:
  rules:
  - host: cadvisor.houmin
    http:
      paths:
      - path: /
        backend:
          serviceName: cadvisor
          servicePort: cadvisor

在本地设置好 hosts 之后,访问 http://cadvisor.houmin:<TraefikNode>/ 即可看到 cAdvisor 的界面。

与 Prometheus 集成

Step1: 修改 Prometheus 配置信息,添加 cadvisor 访问地址:

1
2
3
4
5
6
7
8
# prometheus.yml
    scrape_configs:
      - job_name: 'node'
        static_configs:
        - targets: ['node-exporter:9100']
      - job_name: 'container'
        static_configs:
        - targets: ['cadvisor:8080']  # 本地 cadvisor 访问地址

重新加载 Prometheus 配置,访问 http://prometheus.houmin:30869/targets 可以看到新加的 cAdvisor 已经生效。

此时访问 Prometheus 的 graph 页面 http://prometheus.houmin:30869/graph,搜索 container 你将看到容器相关数据。

在 Prometheus 中查看集群内存使用量:

1
sum by (name)(container_memory_usage_bytes{image!=""})

Metrics

分类 字段 描述
cpu cpu_usage_total
cpu_usage_system
cpu_usage_user
cpu_usage_per_cpu
load_average Smoothed average of number of runnable threads x 1000
memory memory_usage Memory Usage
memory_working_set Working set size
network rx_bytes Cumulative count of bytes received
rx_errors Cumulative count of receive errors encountered
tx_bytes Cumulative count of bytes transmitted
tx_errors Cumulative count of transmit errors encountered
filesystem fs_device Filesystem device
fs_limit Filesystem limit
fs_usage Filesystem usage

源码解析

整体架构

img
img

主函数

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
func main() {
    defer glog.Flush()
    flag.Parse()
    if *versionFlag {
        fmt.Printf("cAdvisor version %s (%s)/n", version.Info["version"], version.Info["revision"])
        os.Exit(0)
    }
    setMaxProcs()
    memoryStorage, err := NewMemoryStorage()
    if err != nil {
        glog.Fatalf("Failed to initialize storage driver: %s", err)
    }
    sysFs, err := sysfs.NewRealSysFs()
    if err != nil {
        glog.Fatalf("Failed to create a system interface: %s", err)
    }
    collectorHttpClient := createCollectorHttpClient(*collectorCert, *collectorKey)
    containerManager, err := manager.New(memoryStorage, sysFs, *maxHousekeepingInterval, *allowDynamicHousekeeping, ignoreMetrics.MetricSet, &collectorHttpClient)
    if err != nil {
        glog.Fatalf("Failed to create a Container Manager: %s", err)
    }
    mux := http.NewServeMux()
    if *enableProfiling {
        mux.HandleFunc("/debug/pprof/", pprof.Index)
        mux.HandleFunc("/debug/pprof/cmdline", pprof.Cmdline)
        mux.HandleFunc("/debug/pprof/profile", pprof.Profile)
        mux.HandleFunc("/debug/pprof/symbol", pprof.Symbol)
    }
    // Register all HTTP handlers.
    err = cadvisorhttp.RegisterHandlers(mux, containerManager, *httpAuthFile, *httpAuthRealm, *httpDigestFile, *httpDigestRealm)
    if err != nil {
        glog.Fatalf("Failed to register HTTP handlers: %v", err)
    }
    cadvisorhttp.RegisterPrometheusHandler(mux, containerManager, *prometheusEndpoint, nil)
    // Start the manager.
    if err := containerManager.Start(); err != nil {
        glog.Fatalf("Failed to start container manager: %v", err)
    }
    // Install signal handler.
    installSignalHandler(containerManager)
    glog.Infof("Starting cAdvisor version: %s-%s on port %d", version.Info["version"], version.Info["revision"], *argPort)
    addr := fmt.Sprintf("%s:%d", *argIp, *argPort)
    glog.Fatal(http.ListenAndServe(addr, mux))
}

通过new出来的memoryStorage以及sysfs实例,创建一个manager实例,manager的interface中定义了许多用于获取容器和machine信息的函数。核心代码:

1
2
3
4
5
6
memoryStorage, err := NewMemoryStorage()
sysFs, err := sysfs.NewRealSysFs()
#创建containerManager
containerManager, err := manager.New(memoryStorage, sysFs, *maxHousekeepingInterval, *allowDynamicHousekeeping, ignoreMetrics.MetricSet, &collectorHttpClient)
#启动containerManager
err := containerManager.Start()

核心函数:

img
img

生成manager实例的时候,还需要传递两个额外的参数,分别是

  • maxHousekeepingInterval:存在内存的时间,默认60s
  • allowDynamicHousekeeping:是否允许动态配置housekeeping,也就是下一次开始搜集容器信息的时间,默认true

因为需要暴露服务,所以在handler文件中,将上面生成的containerManager注册进去(cadvisor/http/handler.go),之后就是启动manager,运行其Start方法,开始搜集信息,存储信息的循环操作。

以memory采集为例:

img
img

具体的信息还是通过runc/libcontainer获得,libcontainer是对cgroup的封装。在/sys/fs/cgroup/memory中包含大量的了memory相关的信息(参考docker原生监控文章)

img
img

Prometheus的收集器(cadvisor/metrics/prometheus.go)

img
img

总结

优缺点:

  • 优点:谷歌开源产品,监控指标齐全,部署方便,而且有官方的docker镜像。
  • 缺点:是集成度不高,默认只在本地保存1分钟数据,但可以集成InfluxDB等存储

备注:

爱奇艺参照cadvisor开发的dadvisor,数据写入graphite,
等同于cadvisor+influxdb,但dadvisor并没有开源

参考资料