Hands-On Demo
Categories:
Prerequisites
- perl
- go
- npm
- Optional: curl
- Script migrateTimestamp.pl
Documentation
You find READMEs or api docs in
- ./cc-backend/configs
- ./cc-backend/init
- ./cc-backend/api
ClusterCockpit configuration files
cc-backend
- ./.envPasswords and Tokens set in the environment
- ./config.jsonConfiguration options for cc-backend
cc-metric-store
- ./config.jsonOptional to overwrite configuration options
cc-metric-collector
Not yet included in the hands-on setup.
Setup Components
Start by creating a base folder for all of the following steps.
- mkdir clustercockpit
- cd clustercockpit
Setup cc-backend
- Clone Repository- git clone https://github.com/ClusterCockpit/cc-backend.git
- cd cc-backend
 
- Build- make
 
- Activate & configure environment for cc-backend- cp configs/env-template.txt .env
- Optional: Have a look via vim .env
- Copy the config.jsonfile included in this tarball into the root directory of cc-backend:cp ../../config.json ./
 
- Back to toplevel clustercockpit- cd ..
 
- Prepare Datafolder and Database file- mkdir var
- ./cc-backend -migrate-db
 
Setup cc-metric-store
- Clone Repository- git clone https://github.com/ClusterCockpit/cc-metric-store.git
- cd cc-metric-store
 
- Build Go Executable- go get
- go build
 
- Prepare Datafolders- mkdir -p var/checkpoints
- mkdir -p var/archive
 
- Update Config- vim config.json
- Exchange existing setting in metricswith the following:
 
"clock":      { "frequency": 60, "aggregation": null },
"cpi":        { "frequency": 60, "aggregation": null },
"cpu_load":   { "frequency": 60, "aggregation": null },
"flops_any":  { "frequency": 60, "aggregation": null },
"flops_dp":   { "frequency": 60, "aggregation": null },
"flops_sp":   { "frequency": 60, "aggregation": null },
"ib_bw":      { "frequency": 60, "aggregation": null },
"lustre_bw":  { "frequency": 60, "aggregation": null },
"mem_bw":     { "frequency": 60, "aggregation": null },
"mem_used":   { "frequency": 60, "aggregation": null },
"rapl_power": { "frequency": 60, "aggregation": null }
- Back to toplevel clustercockpit- cd ..
 
Setup Demo Data
- mkdir source-data
- cd source-data
- Download JobArchive-Source:- wget https://hpc-mover.rrze.uni-erlangen.de/HPC-Data/0x7b58aefb/eig7ahyo6fo2bais0ephuf2aitohv1ai/job-archive-dev.tar.xz
- tar xJf job-archive-dev.tar.xz
- mv ./job-archive ./job-archive-source
- rm ./job-archive-dev.tar.xz
 
- Download CC-Metric-Store Checkpoints:- mkdir -p cc-metric-store-source/checkpoints
- cd cc-metric-store-source/checkpoints
- wget https://hpc-mover.rrze.uni-erlangen.de/HPC-Data/0x7b58aefb/eig7ahyo6fo2bais0ephuf2aitohv1ai/cc-metric-store-checkpoints.tar.xz
- tar xf cc-metric-store-checkpoints.tar.xz
- rm cc-metric-store-checkpoints.tar.xz
 
- Back to source-data- cd ../..
 
- Run timestamp migration script. This may take tens of minutes!- cp ../migrateTimestamps.pl .
- ./migrateTimestamps.pl
- Expected output:
 
Starting to update start- and stoptimes in job-archive for emmy
Starting to update start- and stoptimes in job-archive for woody
Done for job-archive
Starting to update checkpoint filenames and data starttimes for emmy
Starting to update checkpoint filenames and data starttimes for woody
Done for checkpoints
- Copy cluster.jsonfiles from source to migrated folders- cp source-data/job-archive-source/emmy/cluster.json cc-backend/var/job-archive/emmy/
- cp source-data/job-archive-source/woody/cluster.json cc-backend/var/job-archive/woody/
 
- Initialize Job-Archive in SQLite3 job.db and add demo user- cd cc-backend
- ./cc-backend -init-db -add-user demo:admin:demo
- Expected output:
 
<6>[INFO]    new user "demo" created (roles: ["admin"], auth-source: 0)
<6>[INFO]    Building job table...
<6>[INFO]    A total of 3936 jobs have been registered in 1.791 seconds.
- Back to toplevel clustercockpit- cd ..
 
Startup both Apps
- In cc-backend root: $./cc-backend -server -dev- Starts Clustercockpit at http:localhost:8080- Log: <6>[INFO] HTTP server listening at :8080...
 
- Log: 
- Use local internet browser to access interface- You should see and be able to browse finished Jobs
- Metadata is read from SQLite3 database
- Metricdata is read from job-archive/JSON-Files
 
- Create User in settings (top-right corner)- Name apiuser
- Username apiuser
- Role API
- Submit & Refresh Page
 
- Name 
- Create JTW for apiuser- In Userlist, press Gen. JTWforapiuser
- Save JWT for later use
 
- In Userlist, press 
 
- Starts Clustercockpit at 
- In cc-metric-store root: $./cc-metric-store- Start the cc-metric-store on http:localhost:8081, Log:
 
- Start the cc-metric-store on 
2022/07/15 17:17:42 Loading checkpoints newer than 2022-07-13T17:17:42+02:00
2022/07/15 17:17:45 Checkpoints loaded (5621 files, 319 MB, that took 3.034652s)
2022/07/15 17:17:45 API http endpoint listening on '0.0.0.0:8081'
- Does not have a graphical interface
- Otpional: Test function by executing:
$ curl -H "Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJFZERTQSJ9.eyJ1c2VyIjoiYWRtaW4iLCJyb2xlcyI6WyJST0xFX0FETUlOIiwiUk9MRV9BTkFMWVNUIiwiUk9MRV9VU0VSIl19.d-3_3FZTsadPjDEdsWrrQ7nS0edMAR4zjl-eK7rJU3HziNBfI9PDHDIpJVHTNN5E5SlLGLFXctWyKAkwhXL-Dw" -D - "http://localhost:8081/api/query" -d "{ \"cluster\": \"emmy\", \"from\": $(expr $(date +%s) - 60), \"to\": $(date +%s), \"queries\": [{
  \"metric\": \"flops_any\",
  \"host\": \"e1111\"
}] }"
HTTP/1.1 200 OK
Content-Type: application/json
Date: Fri, 15 Jul 2022 13:57:22 GMT
Content-Length: 119
{"results":[[JSON-DATA-ARRAY]]}
Development API web interfaces
The -dev flag enables web interfaces to document and test the apis:
- Local GQL Playgorund - A GraphQL playground. To use it you must have a authenticated session in the same browser.
- Local Swagger Docs - A Swagger UI. To use it you have to be logged out, so no user session in the same browser. Use the JWT token with role Api generate previously to authenticate via http header.
Use cc-backend API to start job
- Enter the URL - http://localhost:8080/swagger/index.htmlin your browser.
- Enter your JWT token you generated for the API user by clicking the green Authorize button in the upper right part of the window. 
- Click the - /job/start_jobendpoint and click the Try it out button.
- Enter the following json into the request body text area and fill in a recent start timestamp by executing - date +%s.:
{
    "jobId":         100000,
    "arrayJobId":    0,
    "user":          "ccdemouser",
    "subCluster":    "main",
    "cluster":       "emmy",
    "startTime":    <date +%s>,
    "project":       "ccdemoproject",
    "resources":  [
        {"hostname":  "e0601"},
        {"hostname":  "e0823"},
        {"hostname":  "e0337"},
        {"hostname": "e1111"}],
    "numNodes":      4,
    "numHwthreads":  80,
    "walltime":      86400
}
- The response body should be the database id of the started job, for example:
{
  "id": 3937
}
- Check in ClusterCockpit- User ccdemousershould appear in Users-Tab with one running job
- It could take up to 5 Minutes until the Job is displayed with some current data (5 Min Short-Job Filter)
- Job then is marked with a green runningtag
- Metricdata displayed is read from cc-metric-store!
 
- User 
Use cc-backend API to stop job
- Enter the URL http://localhost:8080/swagger/index.htmlin your browser.
- Enter your JWT token you generated for the API user by clicking the green Authorize button in the upper right part of the window.
- Click the /job/stop_job/{id}endpoint and click the Try it out button.
- Enter the database id at id that was returned by start_joband copy the following into the request body. Replace the timestamp with a recent one:
{
  "cluster": "emmy",
  "jobState": "completed",
  "stopTime": <RECENT TS>
}
- On success a json document with the job meta data is returned. 
- Check in ClusterCockpit - User ccdemousershould appear in Users-Tab with one completed job
- Job is no longer marked with a green runningtag -> Completed!
- Metricdata displayed is now read from job-archive!
 
- User 
- Check in job-archive - cd ./cc-backend/var/job-archive/emmy/100/000
- cd $STARTTIME
- Inspect meta.jsonanddata.json
 
Helper scripts
- In this tarball you can find the perl script generate_subcluster.plthat helps to generate the subcluster section for your system. Usage:
- Log into an exclusive cluster node.
- The LIKWID tools likwid-topology and likwid-bench must be in the PATH!
- $./generate_subcluster.ploutputs the subcluster section on- stdout
Please be aware that
- You have to enter the name and node list for the subCluster manually.
- GPU detection only works if LIKWID was build with Cuda avalable and you run likwid-topology also with Cuda loaded.
- Do not blindly trust the measured peakflops values.
- Because the script blindly relies on the CSV format output by likwid-topology this is a fragile undertaking!
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.