Garden | Node Exporter

Exporter 是 Prometheus 的一类数据采集组件的总称，负责从目标处搜集数据，并将其转化为 Prometheus 支持的格式。与传统的数据采集组件不同的是，它并不向中央服务器发送数据，而是等待中央服务器主动前来抓取，默认的抓取地址为 http://current_ip:9100/metrics。Node Exporter 用于采集服务器层面的运行指标，包括机器的 loadavg、filesystem、meminfo 等基础监控，类似于传统主机监控维度的 zabbix-agent。Node Export 由prometheus官方提供、维护，不会捆绑安装，但基本上是必备的exporter。

功能

node-exporter用于提供*NIX内核的硬件以及系统指标。

如果是windows系统，可以使用 WMI exporter
如果是采集NVIDIA的GPU指标，可以使用 prometheus-dcgm

根据不同的*NIX操作系统，node-exporter采集指标的支持也是不一样的，如：

diskstats 支持 Darwin, Linux
cpu 支持Darwin, Dragonfly, FreeBSD, Linux, Solaris等

详细信息参考：node_exporter

我们可以使用 –collectors.enabled 参数指定 node_exporter 收集的功能模块,或者用 –no-collector 指定不需要的模块，如果不指定，将使用默认配置。

部署

Service

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19


apiVersion: v1
kind: Service
metadata:
  annotations:
    prometheus.io/scrape: 'true'
  labels:
    app: node-exporter
    name: node-exporter
  name: node-exporter
  namespace: monitoring
spec:
  clusterIP: None
  ports:
  - name: scrape
    port: 9100
    protocol: TCP
  selector:
    app: node-exporter
  type: ClusterIP

DaemonSet

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21


apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: node-exporter
  namespace: monitoring
spec:
  template:
    metadata:
      labels:
        app: node-exporter
      name: node-exporter
    spec:
      containers:
      - image: prom/node-exporter:v1.0.0
        name: node-exporter
        ports:
        - containerPort: 9100
          hostPort: 9100
          name: scrape
      hostNetwork: true
      hostPID: true

得到一个daemonset和一个service对象，部署后，为了能够让Prometheus能够从当前node exporter获取到监控数据，这里需要修改Prometheus配置文件。编辑 prometheus.yml 并在scrape_configs节点下添加以下内容:

1
2
3
4
5


scrape_configs:
  # 采集node exporter监控数据
  - job_name: 'node'
    static_configs:
      - targets: ['localhost:9100']

也可以使用 prometheus.io/scrape: 'true'标识来自动获取service的metric接口

- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]

配置完成后，重启prometheus就能看到对应的指标

直接查看：

如果是二进制或者docker部署，部署成功后可以访问：http://${IP}:9100/metrics

会输出下面格式的内容，包含了node-exporter暴露的所有指标：

# HELP go_gc_duration_seconds A summary of the GC invocation durations.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 6.1872e-05
go_gc_duration_seconds{quantile="0.25"} 0.000119463
go_gc_duration_seconds{quantile="0.5"} 0.000151156
go_gc_duration_seconds{quantile="0.75"} 0.000198764
go_gc_duration_seconds{quantile="1"} 0.009889647
go_gc_duration_seconds_sum 0.257232201
go_gc_duration_seconds_count 1187

# HELP node_cpu Seconds the cpus spent in each mode.
# TYPE node_cpu counter
node_cpu{cpu="cpu0",mode="guest"} 0
node_cpu{cpu="cpu0",mode="guest_nice"} 0
node_cpu{cpu="cpu0",mode="idle"} 68859.19
node_cpu{cpu="cpu0",mode="iowait"} 167.22
node_cpu{cpu="cpu0",mode="irq"} 0
node_cpu{cpu="cpu0",mode="nice"} 19.92
node_cpu{cpu="cpu0",mode="softirq"} 17.05
node_cpu{cpu="cpu0",mode="steal"} 28.1

Prometheus查看：

类似 go_gc_duration_seconds 和 node_cpu 就是metric的名称，如果使用了 Prometheus，则可以在 http://${IP}:9090/ 页面的指标中搜索到以上的指标：

常用指标类型有：

node_cpu：系统CPU使用量
node_disk*：磁盘IO
node_filesystem*：文件系统用量
node_load1：系统负载
node_memeory*：内存使用量
node_network*：网络带宽
node_time：当前系统时间
go_*：node exporter中go相关指标
process_*：node exporter自身进程相关运行指标

Grafana查看：

Prometheus虽然自带了web页面，但一般会和更专业的 Grafana 配套做指标的可视化，Grafana 有很多模板，用于更友好地展示出指标的情况，如 Node Exporter for Prometheus

在grafana中配置好变量、导入模板就会有上图的效果。

深入解读

node-exporter是Prometheus官方推荐的exporter，类似的还有

官方推荐的都会在 https://github.com/prometheus 下，在 exporter推荐页，也会有很多第三方的exporter，由个人或者组织开发上传，如果有自定义的采集需求，可以自己编写 exporter。

Collector

node-exporter的主函数：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16


// Package collector includes all individual collectors to gather and export system metrics.
package collector

import (
    "fmt"
    "sync"
    "time"

    "github.com/prometheus/client_golang/prometheus"
    "github.com/prometheus/common/log"
    "gopkg.in/alecthomas/kingpin.v2"
)


// Namespace defines the common namespace to be used by all metrics.
const namespace = "node"

可以看到 exporter 的实现需要引入 github.com/prometheus/client_golang/prometheus 库，client_golang 是 prometheus 的官方go库，既可以用于集成现有应用，也可以作为连接 Prometheus HTTP API 的基础库。

比如定义了基础的数据类型以及对应的方法：

Counter：收集事件次数等单调递增的数据
Gauge：收集当前的状态，比如数据库连接数
Histogram：收集随机正态分布数据，比如响应延迟
Summary：收集随机正态分布数据，和 Histogram 是类似的