[go: up one dir, main page]

File: atopgpud.8

package info (click to toggle)
atop 2.12.1-1
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 2,328 kB
  • sloc: ansic: 35,620; python: 257; sh: 248; makefile: 193
file content (138 lines) | stat: -rw-r--r-- 4,105 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
.TH ATOPGPUD 8 "January 2024" "Linux"
.SH NAME
.B atopgpud
- GPU statistics daemon
.SH SYNOPSIS
.P
.B atopgpud [-v]
.PP
.SH DESCRIPTION
The
.I atopgpud
daemon gathers statistical information from all Nvidia GPUs in the
current system. With a sampling rate of one second, it maintains
the statistics of every GPU, globally (system level) and per process.
When 
.I atopgpud
is active on the target system,
.I atop
connects to this daemon via a TCP socket and obtains all GPU statistics
with every interval.
.PP
The approach to gather all GPU statistics in a separate daemon is required,
because the Nvidia driver only offers the GPU busy
percentage of the last second. Suppose that
.I atop
runs with a 10-minute interval and would fetch the GPU busy percentage
directly from the Nvidia driver, it would reflect the busy percentage of
the last second instead of the average busy percentage during 600 seconds.
Therefore, the
.I atopgpud
daemon fetches the GPU busy percentage every second and accumulates this
into a counter that can be retrieved by
.I atop
regularly. The same approach applies to other GPU statistics.
.PP
When the
.I atopgpud
daemon runs with root privileges, more process level counters (i.e.
GPU busy and GPU memory busy per process) are provided that are
otherwise not applicable.
.PP
Notice that certain GPU statistics are only delivered for specific GPU types.
For older or less sophisticated GPUs, the value -1 is returned for counters
that are not maintained. In the output of
.I atop
these counters are shown as 'N/A'.
.PP
When no (Nvidia) GPUs can be found in the target system,
.I atopgpud
immediately terminates with exit code 0.
.PP
Log messages are written via the
.I rsyslogd
daemon with facility 'daemon'.
With the -v flag (verbose),
.I atopgpud
also logs debug messages.
.PP
.SH INSTALLATION
The
.I atopgpud
daemon is written in Python, so
a Python interpreter should be installed on the target system.
This can  either be Python version 2 or Python version 3 (the code of
.I atopgpud
is written in a generic way). Take care that the first line of the
.I atopgpud
script contains the proper command name to activate a Python interpreter
that is installed on the target system!
.PP
The
.I atopgpud
daemon depends on the Python module
.I pynvml
to interface with the Nvidia driver.
This module can be installed by the
.I pip
or
.I pip3
command and is usually packaged under the name 
.I nvidia-ml-py
.br
Finally, the
.I pynvml
module is a Python wrapper around the
.I libnvidia-ml
shared library that needs to be installed as well.
.PP
After installing the
.I atop
package, the
.I atopgpud
is not automatically started, nor will
the service be enabed by default.
When you want to activate this service (permanently),
enter the following commands (as root):
.PP
.B \  systemctl enable atopgpu
.br
.B \  systemctl start  atopgpu
.PP
.SH INTERFACE DESCRIPTION
Client processes can connect to the
.I atopgpud
daemon on TCP port 59123.
Subsequently, such client can send a request of two bytes,
consisting of one byte request code followed by one byte
integer being the API version number.
.br
The request code in the first byte can be 'T' to obtain information
about the GPU types installed in this system (usually only requested once).
.br
The request code can be 'S' to obtain all statistical counter values
(requested for every interval).
.PP
The response of the daemon starts with a 4-byte integer. The
first byte is the API version number that determines the response format
while the subsequent three bytes indicate the length (big endian order) of the
response string that follows. 
.br
In the response strings the character '@' introduces system level information
of one specific GPU and the character '#' introduces process level information
related to that GPU.
.br
For further details about the meaning of the counters in a response string,
please consult the source code.
.PP
.SH SEE ALSO
.B atop(1),
.B atopsar(1),
.B atoprc(5),
.B netatop(4),
.B netatopd(8),
.B atopacctd(8)
.br
.B https://www.atoptool.nl
.SH AUTHOR
Gerlof Langeveld (gerlof.langeveld@atoptool.nl)