Sharing HPM Metrics Between Monitoring and User Profiling
Categories:
Overview
Hardware performance counters (HPM) can only be accessed by one process at a time.
This creates a conflict when both a system monitoring daemon (e.g. using LIKWID) and user
jobs (using LIKWID, VTune, or other perf_event-based tools) need access on the same node.
The approach described here allows sharing counter access by:
- Restricting HPM access to node-exclusive jobs via a SLURM constraint (
hwperf) - Handing ownership of
/var/run/likwid.lockto the job user in the prologue - Returning ownership to the monitoring user in the epilogue
This is the approach used at NHR@FAU.
Security Note
Hardware performance counter access must be restricted to node-exclusive jobs. Allowing shared-node jobs to access performance counters is a security risk, as counters can leak information across processes.SLURM Submit Filter
A SLURM submit filter (Lua) enforces that any job requesting the hwperf constraint
must also request exclusive node access.
-- all jobs with constraint hwperf need to allocate nodes exclusively
for feature in string.gmatch(job_desc.features or "", "[^,]*") do
if ( feature == "hwperf" and job_desc.shared ~= 0 ) then
slurm.log_info("slurm_job_submit: job from uid %u with constraint hwperf but not exclusive", job_desc.user_id )
slurm.user_msg("--constraint=hwperf only available for node-exclusive jobs with --exclusive")
return 2029 --- slurm.ERROR ESLURM_INVALID_FEATURE
end
end
Place this snippet in your site’s submit filter script (typically /etc/slurm/job_submit.lua).
Prologue: /etc/slurm/slurm.prolog
The prologue grants the job user access to /var/run/likwid.lock and opens
perf_event access when the hwperf constraint is set. Otherwise it ensures
the monitoring user retains ownership.
#
# Only change permissions if /var/run/likwid.lock is a regular file.
# Grant permissions to user (if requested) or keep them with the monitoring user.
#
if [ -f /var/run/likwid.lock ]; then
if [[ "$SLURM_JOB_CONSTRAINTS" =~ "hwperf" ]] ; then
chown $SLURM_JOB_USER /var/run/likwid.lock
# Also grant permission to use performance counters via perf interface (e.g. with VTune)
echo 0 > /proc/sys/kernel/perf_event_paranoid
elif [ $(stat -c "%U" /var/run/likwid.lock) != "monitoring" ]; then
chown monitoring /var/run/likwid.lock
fi
elif [[ "$SLURM_JOB_CONSTRAINTS" =~ "hwperf" ]] ; then
echo "ATTENTION: requested access to performance counters cannot be granted as /var/run/likwid.lock does not exist or is no regular file"
fi
Epilogue: /etc/slurm/slurm.epilog
The epilogue returns ownership of the lock file to the monitoring user and
restores the restrictive perf_event_paranoid setting.
#
# Return permission to the monitoring user for system monitoring.
#
if [ -f /var/run/likwid.lock ]; then
if [[ "$SLURM_JOB_CONSTRAINTS" =~ "hwperf" ]] ; then
chown $MONITORING_USER /var/run/likwid.lock
# Disable permission to use performance counters via perf interface (e.g. with VTune)
echo 2 > /proc/sys/kernel/perf_event_paranoid
fi
if [ $(stat -c "%U" /var/run/likwid.lock) != "monitoring" ]; then
chown monitoring:root /var/run/likwid.lock
fi
fi
Note
perf_event_paranoid=2 is the standard restrictive setting. Some Linux distributions
also support the stricter value 4. Check your kernel version and distribution before
using it.Nvidia GPU Profiling
For nodes with Nvidia GPUs, extend the prologue/epilogue with the relevant
nvidia-smi permission calls to grant and revoke GPU performance counter access
in the same manner.
How It Works
| Phase | hwperf constraint set? | Result |
|---|---|---|
| Submit filter | Yes, but --exclusive missing | Job rejected |
| Prologue | Yes | Lock file owned by job user, perf_event_paranoid=0 |
| Prologue | No | Lock file stays with monitoring user |
| Epilogue | Yes | Lock file returned to monitoring user, perf_event_paranoid=2 |
| Epilogue | No | No change needed |
During a job with hwperf, the system monitoring daemon loses counter access for
the duration of that job on the affected node. This is expected and acceptable
since the node is used exclusively by the job.
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.