Reference information regarding the ClusterCockpit component “cc-metric-store” (GitHub Repo).
This is the multi-page printable view of this section. Click here to print.
cc-metric-store
1 - Command Line
This page describes the command line options for the cc-metric-store executable.
-config <path>
Function: Specifies alternative path to application configuration file.
Default: ./config.json
Example: -config ./configfiles/configuration.json
-dev
Function: Enables the Swagger UI REST API documentation and playground at /swagger/.
-gops
Function: Go server listens via github.com/google/gops/agent (for debugging).
-loglevel <level>
Function: Sets the logging level.
Options: debug, info, warn (default), err, crit
Example: -loglevel debug
-logdate
Function: Add date and time to log messages.
-version
Function: Shows version information and exits.
Running
./cc-metric-store # Uses ./config.json
./cc-metric-store -config /path/to/config.json # Custom config path
./cc-metric-store -dev # Enable Swagger UI at /swagger/
./cc-metric-store -loglevel debug # Verbose logging
Example Configuration
See Configuration Reference for detailed descriptions of all options.
{
"main": {
"addr": "localhost:8080",
"jwt-public-key": "kzfYrYy+TzpanWZHJ5qSdMj5uKUWgq74BWhQG6copP0="
},
"metrics": {
"clock": {
"frequency": 60,
"aggregation": "avg"
},
"cpu_idle": {
"frequency": 60,
"aggregation": "avg"
},
"cpu_iowait": {
"frequency": 60,
"aggregation": "avg"
},
"cpu_irq": {
"frequency": 60,
"aggregation": "avg"
},
"cpu_system": {
"frequency": 60,
"aggregation": "avg"
},
"cpu_user": {
"frequency": 60,
"aggregation": "avg"
},
"acc_utilization": {
"frequency": 60,
"aggregation": "avg"
},
"acc_mem_used": {
"frequency": 60,
"aggregation": "sum"
},
"acc_power": {
"frequency": 60,
"aggregation": "sum"
},
"flops_any": {
"frequency": 60,
"aggregation": "sum"
},
"flops_dp": {
"frequency": 60,
"aggregation": "sum"
},
"flops_sp": {
"frequency": 60,
"aggregation": "sum"
},
"ib_recv": {
"frequency": 60,
"aggregation": "sum"
},
"ib_xmit": {
"frequency": 60,
"aggregation": "sum"
},
"cpu_power": {
"frequency": 60,
"aggregation": "sum"
},
"mem_power": {
"frequency": 60,
"aggregation": "sum"
},
"ipc": {
"frequency": 60,
"aggregation": "avg"
},
"cpu_load": {
"frequency": 60,
"aggregation": null
},
"mem_bw": {
"frequency": 60,
"aggregation": "sum"
},
"mem_used": {
"frequency": 60,
"aggregation": null
}
},
"metric-store": {
"checkpoints": {
"interval": "12h",
"directory": "./var/checkpoints"
},
"memory-cap": 100,
"retention-in-memory": "48h",
"cleanup": {
"mode": "archive",
"interval": "48h",
"directory": "./var/archive"
}
}
}
2 - Configuration
Configuration options are located in a JSON file. Default path is config.json
in current working directory. Alternative paths to the configuration file can be
specified using the command line switch -config <filename>.
All durations are specified as string that will be parsed like
this (Allowed suffixes: s, m, h,
…).
The configuration is organized into four main sections: main, metrics,
nats, and metric-store.
Main Section
main: Server configuration (required)addr: Address to bind to, for examplelocalhost:8080or0.0.0.0:443(required)https-cert-file: Filepath to SSL certificate. If alsohttps-key-fileis set, use HTTPS (optional)https-key-file: Filepath to SSL key file. If alsohttps-cert-fileis set, use HTTPS (optional)user: Drop root permissions to this user once the port was bound. Only applicable if using privileged port (optional)group: Drop root permissions to this group once the port was bound. Only applicable if using privileged port (optional)backend-url: URL of cc-backend for querying job information, e.g.,https://localhost:8080(optional)jwt-public-key: Base64 encoded Ed25519 public key, use this to verify requests to the HTTP API (required)debug: Debug options (optional)dump-to-file: Path to file for dumping internal state (optional)gops: Enable gops agent for debugging (optional)
Metrics Section
metrics: Map of metric-name to objects with the following properties (required)frequency: Timestep/Interval/Resolution of this metric in seconds (required)aggregation: Can be"sum","avg"ornull(required)nullmeans aggregation across topology levels is disabled for this metric (use for node-scope-only metrics)"sum"means that values from the child levels are summed up for the parent level"avg"means that values from the child levels are averaged for the parent level
NATS Section
nats: NATS server connection configuration (optional)address: URL of NATS.io server, example:nats://localhost:4222(required if nats section present)username: NATS username for authentication (optional)password: NATS password for authentication (optional)
Metric-Store Section
metric-store: Storage engine configuration (required)checkpoints: Checkpoint configuration (required)interval: Create checkpoints every X seconds/minutes/hours (required)directory: Path to checkpoint directory (required)
retention-in-memory: Keep all values in memory for at least that amount of time. Should be long enough to cover common job durations (required)memory-cap: Maximum percentage of system memory to use (optional)cleanup: Cleanup/archiving configuration (required)mode: Either"archive"(move and compress old checkpoints) or"delete"(remove old checkpoints) (required)interval: Perform cleanup every X seconds/minutes/hours (required)directory: Path to archive directory (required if mode is"archive")
nats-subscriptions: Array of NATS subscription configurations (optional, requiresnatssection)subscribe-to: NATS subject to subscribe to (required)cluster-tag: Default cluster tag for incoming metrics (required)
3 - Metric Store REST API
Authentication
JWT tokens
cc-metric-store supports only JWT tokens using the EdDSA/Ed25519 signing
method. The token is provided using the Authorization Bearer header.
Example script to test the endpoint:
# Only use JWT token if the JWT authentication has been setup
JWT="eyJ0eXAiOiJKV1QiLCJhbGciOiJFZERTQSJ9.eyJ1c2VyIjoiYWRtaW4iLCJyb2xlcyI6WyJST0xFX0FETUlOIiwiUk9MRV9BTkFMWVNUIiwiUk9MRV9VU0VSIl19.d-3_3FZTsadPjDEdsWrrQ7nS0edMAR4zjl-eK7rJU3HziNBfI9PDHDIpJVHTNN5E5SlLGLFXctWyKAkwhXL-Dw"
curl -X 'GET' 'http://localhost:8080/api/query/' -H "Authorization: Bearer $JWT" \
-d '{ "cluster": "alex", "from": 1720879275, "to": 1720964715, "queries": [{"metric": "cpu_load","host": "a0124"}] }'
NATS
As an alternative to the REST API, cc-metric-store can receive metrics via
NATS messaging. See the NATS configuration
for setup details.
Usage of Swagger UI
The Swagger UI is available as part of cc-metric-store if you start it
with the -dev option:
./cc-metric-store -dev
You may access it at http://localhost:8080/swagger/ (adjust port to match your
main.addr configuration).
API Endpoints
The following REST endpoints are available:
| Endpoint | Method | Description |
|---|---|---|
/api/query/ | GET/POST | Query metrics with selectors |
/api/write/ | POST | Write metrics (InfluxDB line protocol) |
/api/free/ | POST | Free buffers up to timestamp |
/api/debug/ | GET | Dump internal state (debugging) |
/api/healthcheck/ | GET | Node health status |
Payload format for write endpoint
The data comes in InfluxDB line protocol format.
<metric>,cluster=<cluster>,hostname=<hostname>,type=<node/hwthread/etc> value=<value> <epoch_time_in_ns_or_s>
Real example:
proc_run,cluster=fritz,hostname=f2163,type=node value=4i 1725620476214474893
A more detailed description of the ClusterCockpit flavored InfluxDB line protocol and their types can be found here in CC specification.
Example script to test endpoint:
# Only use JWT token if the JWT authentication has been setup
JWT="eyJ0eXAiOiJKV1QiLCJhbGciOiJFZERTQSJ9.eyJ1c2VyIjoiYWRtaW4iLCJyb2xlcyI6WyJST0xFX0FETUlOIiwiUk9MRV9BTkFMWVNUIiwiUk9MRV9VU0VSIl19.d-3_3FZTsadPjDEdsWrrQ7nS0edMAR4zjl-eK7rJU3HziNBfI9PDHDIpJVHTNN5E5SlLGLFXctWyKAkwhXL-Dw"
curl -X 'POST' 'http://localhost:8080/api/write/' -H "Authorization: Bearer $JWT" \
-d "proc_run,cluster=fritz,hostname=f2163,type=node value=4i 1725620476214474893"
Testing with the Metric Generator
For comprehensive testing of the write endpoint, a Metric Generator Script is available. This script simulates high-frequency metric data and supports both REST and NATS transport modes, as well as internal (integrated into cc-backend) and external (standalone) cc-metric-store deployments.
Swagger API Reference
Non-Interactive Documentation
This reference is rendered using theswagger-ui plugin based on the original definition file found in the ClusterCockpit
repository,
but without a serving backend.This means that all interactivity (“Try It Out”) will not return actual data.
However, a Curl call and a compiled Request URL will still be displayed, if
an API endpoint is executed.