1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138
|
.TH ATOPGPUD 8 "January 2024" "Linux"
.SH NAME
.B atopgpud
- GPU statistics daemon
.SH SYNOPSIS
.P
.B atopgpud [-v]
.PP
.SH DESCRIPTION
The
.I atopgpud
daemon gathers statistical information from all Nvidia GPUs in the
current system. With a sampling rate of one second, it maintains
the statistics of every GPU, globally (system level) and per process.
When
.I atopgpud
is active on the target system,
.I atop
connects to this daemon via a TCP socket and obtains all GPU statistics
with every interval.
.PP
The approach to gather all GPU statistics in a separate daemon is required,
because the Nvidia driver only offers the GPU busy
percentage of the last second. Suppose that
.I atop
runs with a 10-minute interval and would fetch the GPU busy percentage
directly from the Nvidia driver, it would reflect the busy percentage of
the last second instead of the average busy percentage during 600 seconds.
Therefore, the
.I atopgpud
daemon fetches the GPU busy percentage every second and accumulates this
into a counter that can be retrieved by
.I atop
regularly. The same approach applies to other GPU statistics.
.PP
When the
.I atopgpud
daemon runs with root privileges, more process level counters (i.e.
GPU busy and GPU memory busy per process) are provided that are
otherwise not applicable.
.PP
Notice that certain GPU statistics are only delivered for specific GPU types.
For older or less sophisticated GPUs, the value -1 is returned for counters
that are not maintained. In the output of
.I atop
these counters are shown as 'N/A'.
.PP
When no (Nvidia) GPUs can be found in the target system,
.I atopgpud
immediately terminates with exit code 0.
.PP
Log messages are written via the
.I rsyslogd
daemon with facility 'daemon'.
With the -v flag (verbose),
.I atopgpud
also logs debug messages.
.PP
.SH INSTALLATION
The
.I atopgpud
daemon is written in Python, so
a Python interpreter should be installed on the target system.
This can either be Python version 2 or Python version 3 (the code of
.I atopgpud
is written in a generic way). Take care that the first line of the
.I atopgpud
script contains the proper command name to activate a Python interpreter
that is installed on the target system!
.PP
The
.I atopgpud
daemon depends on the Python module
.I pynvml
to interface with the Nvidia driver.
This module can be installed by the
.I pip
or
.I pip3
command and is usually packaged under the name
.I nvidia-ml-py
.br
Finally, the
.I pynvml
module is a Python wrapper around the
.I libnvidia-ml
shared library that needs to be installed as well.
.PP
After installing the
.I atop
package, the
.I atopgpud
is not automatically started, nor will
the service be enabed by default.
When you want to activate this service (permanently),
enter the following commands (as root):
.PP
.B \ systemctl enable atopgpu
.br
.B \ systemctl start atopgpu
.PP
.SH INTERFACE DESCRIPTION
Client processes can connect to the
.I atopgpud
daemon on TCP port 59123.
Subsequently, such client can send a request of two bytes,
consisting of one byte request code followed by one byte
integer being the API version number.
.br
The request code in the first byte can be 'T' to obtain information
about the GPU types installed in this system (usually only requested once).
.br
The request code can be 'S' to obtain all statistical counter values
(requested for every interval).
.PP
The response of the daemon starts with a 4-byte integer. The
first byte is the API version number that determines the response format
while the subsequent three bytes indicate the length (big endian order) of the
response string that follows.
.br
In the response strings the character '@' introduces system level information
of one specific GPU and the character '#' introduces process level information
related to that GPU.
.br
For further details about the meaning of the counters in a response string,
please consult the source code.
.PP
.SH SEE ALSO
.B atop(1),
.B atopsar(1),
.B atoprc(5),
.B netatop(4),
.B netatopd(8),
.B atopacctd(8)
.br
.B https://www.atoptool.nl
.SH AUTHOR
Gerlof Langeveld (gerlof.langeveld@atoptool.nl)
|