This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

cc-metric-store

ClusterCockpit Metric Store References

Reference information regarding the ClusterCockpit component “cc-metric-store” (GitHub Repo).

1 - Command Line

ClusterCockpit Metric Store Command Line Options

This page describes the command line options for the cc-metric-store executable.


  -config <path>

Function: Specifies alternative path to application configuration file.

Default: ./config.json

Example: -config ./configfiles/configuration.json


  -dev

Function: Enables the Swagger UI REST API documentation and playground at /swagger/.


  -gops

Function: Go server listens via github.com/google/gops/agent (for debugging).


  -loglevel <level>

Function: Sets the logging level.

Options: debug, info, warn (default), err, crit

Example: -loglevel debug


  -logdate

Function: Add date and time to log messages.


  -version

Function: Shows version information and exits.


Running

./cc-metric-store                              # Uses ./config.json
./cc-metric-store -config /path/to/config.json # Custom config path
./cc-metric-store -dev                         # Enable Swagger UI at /swagger/
./cc-metric-store -loglevel debug              # Verbose logging

Example Configuration

See Configuration Reference for detailed descriptions of all options.

{
  "main": {
    "addr": "localhost:8080",
    "jwt-public-key": "kzfYrYy+TzpanWZHJ5qSdMj5uKUWgq74BWhQG6copP0="
  },
  "metrics": {
    "clock": {
      "frequency": 60,
      "aggregation": "avg"
    },
    "cpu_idle": {
      "frequency": 60,
      "aggregation": "avg"
    },
    "cpu_iowait": {
      "frequency": 60,
      "aggregation": "avg"
    },
    "cpu_irq": {
      "frequency": 60,
      "aggregation": "avg"
    },
    "cpu_system": {
      "frequency": 60,
      "aggregation": "avg"
    },
    "cpu_user": {
      "frequency": 60,
      "aggregation": "avg"
    },
    "acc_utilization": {
      "frequency": 60,
      "aggregation": "avg"
    },
    "acc_mem_used": {
      "frequency": 60,
      "aggregation": "sum"
    },
    "acc_power": {
      "frequency": 60,
      "aggregation": "sum"
    },
    "flops_any": {
      "frequency": 60,
      "aggregation": "sum"
    },
    "flops_dp": {
      "frequency": 60,
      "aggregation": "sum"
    },
    "flops_sp": {
      "frequency": 60,
      "aggregation": "sum"
    },
    "ib_recv": {
      "frequency": 60,
      "aggregation": "sum"
    },
    "ib_xmit": {
      "frequency": 60,
      "aggregation": "sum"
    },
    "cpu_power": {
      "frequency": 60,
      "aggregation": "sum"
    },
    "mem_power": {
      "frequency": 60,
      "aggregation": "sum"
    },
    "ipc": {
      "frequency": 60,
      "aggregation": "avg"
    },
    "cpu_load": {
      "frequency": 60,
      "aggregation": null
    },
    "mem_bw": {
      "frequency": 60,
      "aggregation": "sum"
    },
    "mem_used": {
      "frequency": 60,
      "aggregation": null
    }
  },
  "metric-store": {
    "checkpoints": {
      "interval": "12h",
      "directory": "./var/checkpoints"
    },
    "memory-cap": 100,
    "retention-in-memory": "48h",
    "cleanup": {
      "mode": "archive",
      "interval": "48h",
      "directory": "./var/archive"
    }
  }
}

2 - Configuration

ClusterCockpit Metric Store Configuration Option References

Configuration options are located in a JSON file. Default path is config.json in current working directory. Alternative paths to the configuration file can be specified using the command line switch -config <filename>.

All durations are specified as string that will be parsed like this (Allowed suffixes: s, m, h, …).

The configuration is organized into four main sections: main, metrics, nats, and metric-store.

Main Section

  • main: Server configuration (required)
    • addr: Address to bind to, for example localhost:8080 or 0.0.0.0:443 (required)
    • https-cert-file: Filepath to SSL certificate. If also https-key-file is set, use HTTPS (optional)
    • https-key-file: Filepath to SSL key file. If also https-cert-file is set, use HTTPS (optional)
    • user: Drop root permissions to this user once the port was bound. Only applicable if using privileged port (optional)
    • group: Drop root permissions to this group once the port was bound. Only applicable if using privileged port (optional)
    • backend-url: URL of cc-backend for querying job information, e.g., https://localhost:8080 (optional)
    • jwt-public-key: Base64 encoded Ed25519 public key, use this to verify requests to the HTTP API (required)
    • debug: Debug options (optional)
      • dump-to-file: Path to file for dumping internal state (optional)
      • gops: Enable gops agent for debugging (optional)

Metrics Section

  • metrics: Map of metric-name to objects with the following properties (required)
    • frequency: Timestep/Interval/Resolution of this metric in seconds (required)
    • aggregation: Can be "sum", "avg" or null (required)
      • null means aggregation across topology levels is disabled for this metric (use for node-scope-only metrics)
      • "sum" means that values from the child levels are summed up for the parent level
      • "avg" means that values from the child levels are averaged for the parent level

NATS Section

  • nats: NATS server connection configuration (optional)
    • address: URL of NATS.io server, example: nats://localhost:4222 (required if nats section present)
    • username: NATS username for authentication (optional)
    • password: NATS password for authentication (optional)

Metric-Store Section

  • metric-store: Storage engine configuration (required)
    • checkpoints: Checkpoint configuration (required)
      • interval: Create checkpoints every X seconds/minutes/hours (required)
      • directory: Path to checkpoint directory (required)
    • retention-in-memory: Keep all values in memory for at least that amount of time. Should be long enough to cover common job durations (required)
    • memory-cap: Maximum percentage of system memory to use (optional)
    • cleanup: Cleanup/archiving configuration (required)
      • mode: Either "archive" (move and compress old checkpoints) or "delete" (remove old checkpoints) (required)
      • interval: Perform cleanup every X seconds/minutes/hours (required)
      • directory: Path to archive directory (required if mode is "archive")
    • nats-subscriptions: Array of NATS subscription configurations (optional, requires nats section)
      • subscribe-to: NATS subject to subscribe to (required)
      • cluster-tag: Default cluster tag for incoming metrics (required)

3 - Metric Store REST API

ClusterCockpit Metric Store RESTful API Endpoint description

Authentication

JWT tokens

cc-metric-store supports only JWT tokens using the EdDSA/Ed25519 signing method. The token is provided using the Authorization Bearer header.

Example script to test the endpoint:

# Only use JWT token if the JWT authentication has been setup
JWT="eyJ0eXAiOiJKV1QiLCJhbGciOiJFZERTQSJ9.eyJ1c2VyIjoiYWRtaW4iLCJyb2xlcyI6WyJST0xFX0FETUlOIiwiUk9MRV9BTkFMWVNUIiwiUk9MRV9VU0VSIl19.d-3_3FZTsadPjDEdsWrrQ7nS0edMAR4zjl-eK7rJU3HziNBfI9PDHDIpJVHTNN5E5SlLGLFXctWyKAkwhXL-Dw"

curl -X 'GET' 'http://localhost:8080/api/query/' -H "Authorization: Bearer $JWT" \
  -d '{ "cluster": "alex", "from": 1720879275, "to": 1720964715, "queries": [{"metric": "cpu_load","host": "a0124"}] }'

NATS

As an alternative to the REST API, cc-metric-store can receive metrics via NATS messaging. See the NATS configuration for setup details.

Usage of Swagger UI

The Swagger UI is available as part of cc-metric-store if you start it with the -dev option:

./cc-metric-store -dev

You may access it at http://localhost:8080/swagger/ (adjust port to match your main.addr configuration).

API Endpoints

The following REST endpoints are available:

EndpointMethodDescription
/api/query/GET/POSTQuery metrics with selectors
/api/write/POSTWrite metrics (InfluxDB line protocol)
/api/free/POSTFree buffers up to timestamp
/api/debug/GETDump internal state (debugging)
/api/healthcheck/GETNode health status

Payload format for write endpoint

The data comes in InfluxDB line protocol format.

<metric>,cluster=<cluster>,hostname=<hostname>,type=<node/hwthread/etc> value=<value> <epoch_time_in_ns_or_s>

Real example:

proc_run,cluster=fritz,hostname=f2163,type=node value=4i 1725620476214474893

A more detailed description of the ClusterCockpit flavored InfluxDB line protocol and their types can be found here in CC specification.

Example script to test endpoint:

# Only use JWT token if the JWT authentication has been setup
JWT="eyJ0eXAiOiJKV1QiLCJhbGciOiJFZERTQSJ9.eyJ1c2VyIjoiYWRtaW4iLCJyb2xlcyI6WyJST0xFX0FETUlOIiwiUk9MRV9BTkFMWVNUIiwiUk9MRV9VU0VSIl19.d-3_3FZTsadPjDEdsWrrQ7nS0edMAR4zjl-eK7rJU3HziNBfI9PDHDIpJVHTNN5E5SlLGLFXctWyKAkwhXL-Dw"

curl -X 'POST' 'http://localhost:8080/api/write/' -H "Authorization: Bearer $JWT" \
  -d "proc_run,cluster=fritz,hostname=f2163,type=node value=4i 1725620476214474893"

Testing with the Metric Generator

For comprehensive testing of the write endpoint, a Metric Generator Script is available. This script simulates high-frequency metric data and supports both REST and NATS transport modes, as well as internal (integrated into cc-backend) and external (standalone) cc-metric-store deployments.

Swagger API Reference