Configuration

ClusterCockpit Metric Store Configuration Option References

Configuration options are located in a JSON file. Default path is config.json in current working directory. Alternative paths to the configuration file can be specified using the command line switch -config <filename>.

All durations are specified as string that will be parsed like this (Allowed suffixes: s, m, h, …).

The configuration is organized into four main sections: main, metrics, nats, and metric-store.

Main Section

  • main: Server configuration (required)
    • addr: Address to bind to, for example localhost:8080 or 0.0.0.0:443 (required)
    • https-cert-file: Filepath to SSL certificate. If also https-key-file is set, use HTTPS (optional)
    • https-key-file: Filepath to SSL key file. If also https-cert-file is set, use HTTPS (optional)
    • user: Drop root permissions to this user once the port was bound. Only applicable if using privileged port (optional)
    • group: Drop root permissions to this group once the port was bound. Only applicable if using privileged port (optional)
    • backend-url: URL of cc-backend for querying job information, e.g., https://localhost:8080 (optional)
    • jwt-public-key: Base64 encoded Ed25519 public key, use this to verify requests to the HTTP API (required)
    • debug: Debug options (optional)
      • dump-to-file: Path to file for dumping internal state (optional)
      • gops: Enable gops agent for debugging (optional)

Metrics Section

  • metrics: Map of metric-name to objects with the following properties (required)
    • frequency: Timestep/Interval/Resolution of this metric in seconds (required)
    • aggregation: Can be "sum", "avg" or null (required)
      • null means aggregation across topology levels is disabled for this metric (use for node-scope-only metrics)
      • "sum" means that values from the child levels are summed up for the parent level
      • "avg" means that values from the child levels are averaged for the parent level

NATS Section

  • nats: NATS server connection configuration (optional)
    • address: URL of NATS.io server, example: nats://localhost:4222 (required if nats section present)
    • username: NATS username for authentication (optional)
    • password: NATS password for authentication (optional)

Metric-Store Section

  • metric-store: Storage engine configuration (required)
    • retention-in-memory: Keep all values in memory for at least that amount of time. Should be long enough to cover common job durations (required)
    • memory-cap: Upper memory capacity limit used by the metric store in GB (required). If exceeded, buffers still in use by long-running jobs will be freed.
    • num-workers: Number of concurrent workers for checkpoint and archive operations (optional). Defaults to min(NumCPU/2+1, 10).
    • checkpoints: Checkpoint configuration (optional)
      • file-format: Format for checkpoint files. Either "json" (human-readable, periodic) or "wal" (binary snapshot + Write-Ahead Log, crash-safe). Default: "wal" (optional)
      • directory: Path to checkpoint directory. Default: ./var/checkpoints (optional)
    • cleanup: Cleanup/archiving configuration (optional). The cleanup interval always equals retention-in-memory. Defaults to mode: delete if not set.
      • mode: Either "archive" (move old checkpoints to archive directory) or "delete" (remove old checkpoints). Default: "delete" (optional)
      • directory: Path to archive directory (required if mode is "archive")
    • nats-subscriptions: Array of NATS subscription configurations (optional, requires nats section)
      • subscribe-to: NATS subject to subscribe to (required)
      • cluster-tag: Default cluster tag for metrics without a cluster tag (optional)