This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Release specific infos

Settings and issues specific to the current release

    Major changes

    • Metric store integration: The previously external cc-metric-store component was integrated into cc-backend. In this process the configuration for the metric store was made much simpler. It is not possible to use an external time-series database. It is possible though to either send the metric data to multiple time-series backends or to forward all metric-data to cc-backend. We also dropped support for the Prometheus metric data base.
    • Drop support for MySQL/MariaDB: We only support SQLite from now on. SQLite performance better and requires less administration.
    • New slurm adapter: We provide now an official slurm batch job adapter with tighter slurm integration. The REST API should still work but was extended to also provide Slurm node and job states. The job and node-state API is offered as REST API or via NATS.
    • Revised configuration: The structure of the configuration was unified and consolidated. It can now be distributed via multiple files. The UI configuration can be selectively configured. Defaults for the metric plots can be configured per cluster/subcluster.
    • Switch to more flexible .env handling: In previous releases the environment variables must be provided in an .env file which has to exist. We switched to the godotenv package, which is more flexible about where and how to provide the environment variables.

    New experimental features

    • Automatic Job taggers: It is possible to automatically detect application types and classify pathological jobs and tag jobs accordingly. The tagger rules are specified in rules.
    • Alternative job-archive backends: As alternatives to the file-based job archives there exist now an SQLite and S3 compatible object store backends.

    What you need to do

    You need to:

    • Adapt your central config.json to the new configuration option systematic.
    • Revise all of your cluster.json files in the job archive to reflect the current options.
    • Migrate your job database to version 10 (see Database migration).
    • Migrate your job archive to version 3 (see Job Archive migration).
    • Transfer the checkpoints from the external cc-metric-store instance to the cc-backend ./var/checkpoints directory

    The database migration can take more than one day. To minimize the downtime you can copy the existing SQLite database and perform the migration on the copy while the production instance is still running. cc-slurm-adapter will synchronize any missing jobs afterwards. The archive migration should only take 1-2h. This only applies if you do it on a fast storage medium, e.g. an NVMe disk.

    Configuration changes

    GitHub Repository with complete configuration examples. All configuration options are now checked against a JSON schema. The required options are significantly reduced.

    Transfer cc-metric-store checkpoints

    We are currently offering option to use cc-metric-store attached with cc-backend. Meaning both cc-backend and cc-metric-store share same configuration as well as they run on the same server. The checkpoints in your internal cc-metric-store resides in var directory of the cc-backend. If you choose to use cc-metric-store-internal as you metric store, then you can do the following to bring your old checkpoints from your external cc-metric-store:

    Look out for “checkpoints” key in your CCMS and CCB config.json.

    "checkpoints": {
      "interval": "12h",
      "directory": "./var/checkpoints",
      "restore": "48h"
    },
    

    Either you can move the checkpoints manually or you can use this script for moving the checkpoints.

    #!/bin/bash
    
    # The path to your "directory" configured in CCMS and CCB config.json
    # replace the path as shown with the dummy paths.
    CCMS_CHECKPOINTS_DIR="/home/dummy/cc-metric-store/var/checkpoints"
    CCB_CHECKPOINTS_DIR="/home/dummy/cc-backend/var/checkpoints"
    
    # Check if the source directory actually exists
    if [ -d "$CCMS_DIR" ]; then    
        if [ ! -d "$CCB_CHECKPOINTS_DIR" ]; then
            mkdir "$CCB_CHECKPOINTS_DIR"
        fi
    
        mv -f $CCMS_CHECKPOINTS_DIR $CCB_CHECKPOINTS_DIR
        echo "Success: 'checkpoints' moved from $CCMS_CHECKPOINTS_DIR to $CCB_DIR"
    else
        echo "Error: Directory '$CCMS_CHECKPOINTS_DIR' does not exist."
    fi
    

    Known issues

    • Currently energy footprint metrics of type energy are ignored for calculating total energy.
    • With energy footprint metrics of type power the unit is ignored and it is assumed the metric has the unit Watt.