1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502
|
************************ TIMING AND TESTING ATLAS ***************************
The ATLAS distribution has several different testing and timing methods. For
testing, the most important testers are the standard API testers for the
C and Fortran77 BLAS libraries, and the Fortran77 lapack tester. Sections
1, 2, and 3 deal with performing these tests.
ATLAS also provides its own timer programs that do some rudimentary testing
as well as performing relatively sophisticated timings (involving cache
flushing, etc). The remaining sections deal with using these timer/testers.
1. THE FORTRAN77 INTERFACE BLAS TESTERS
The official BLAS testers for the Fortran77 interface to the legacy BLAS
can be ran in BLDdir/interfaces/blas/F77/testing/. Typing "make" with
no arguments will compile all of the testers (all levels & precisions).
The user may then run the testers by:
./xsblat1
./xdblat1
./xcblat1
./xzblat1
./xsblat2 < SRCdir/interfaces/blas/F77/sblat2.dat
./xdblat2 < SRCdir/interfaces/blas/F77/dblat2.dat
./xcblat2 < SRCdir/interfaces/blas/F77/cblat2.dat
./xzblat2 < SRCdir/interfaces/blas/F77/zblat2.dat
./xsblat3 < SRCdir/interfaces/blas/F77/sblat3.dat
./xdblat3 < SRCdir/interfaces/blas/F77/dblat3.dat
./xcblat3 < SRCdir/interfaces/blas/F77/cblat3.dat
./xzblat3 < SRCdir/interfaces/blas/F77/zblat3.dat
The user may edit the input files to perform more or less comprehensive
tests. For more information on the legacy BLAS testers, go to :
www.netlib.org/blas/faq.html
2. THE ANSI/ISO C INTERFACE BLAS TESTERS
The official BLAS testers for the ANSI/ISO C interface to the legacy BLAS
can be ran in BLDdir/interfaces/blas/C/testing/. Typing "make" with
no arguments will compile all of the testers (all levels & precisions).
The user may then run the testers by:
./xscblat1
./xdcblat1
./xccblat1
./xzcblat1
./xscblat2 < SRCdir/interfaces/blas/C/testing/c_sblat2.dat
./xdcblat2 < SRCdir/interfaces/blas/C/testing/c_dblat2.dat
./xccblat2 < SRCdir/interfaces/blas/C/testing/c_cblat2.dat
./xzcblat2 < SRCdir/interfaces/blas/C/testing/c_zblat2.dat
./xscblat3 < SRCdir/interfaces/blas/C/testing/c_sblat3.dat
./xdcblat3 < SRCdir/interfaces/blas/C/testing/c_dblat3.dat
./xccblat3 < SRCdir/interfaces/blas/C/testing/c_cblat3.dat
./xzcblat3 < SRCdir/interfaces/blas/C/testing/c_zblat3.dat
The user may edit the input files to perform more or less comprehensive
tests. For more information on the legacy BLAS testers, go to :
www.netlib.org/blas/faq.html
3. TESTING THE FORTRAN77 INTERFACE TO LAPACK
You will need to throw the --with-netlib-lapack-tarfile=<lapack.tgz>
flag to use this feature, since ATLAS does not natively provide a complete
LAPACK implementation. Here are some important targets, all of which can
be issued from the BLDdir directory:
make lapack_test_al_ab : test ATLAS's serial lapack
make lapack_test_pt_pt : test ATLAS's threaded lapack
make lapack_test_fl_fb : test the F77 LAPACK wt F77 BLAS
To get a summary of the output of such tests, add "scope_" to the name, eg.:
make scope_test_fl_fb
Since the lapack testers always fail tests when using any blas, it is
typical to contrast an optimized install with a reference install to
narrows down the cases that must be investigated for true errors.
See atlas_install.pdf for a few more details.
4. USING ATLAS BLAS TIMER/TESTERS WITH A SYSTEM BLAS LIBRARY
The ATLAS Level 1-3 tester/timers programs all test one BLAS
implementation against another. These programs compute the Mflop/s
rate for each routine called. In addition, they check the result
matrices computed by calls to the system BLAS and ATLAS library
routines. For more information about the testing implementation in
the Level 3 programs, read section 6.1.
To properly build the programs with your BLAS library, make sure to
set the BLASlib variable in the BLDdir/Make.inc include file correctly:
BLASlib = /path/to/library/libblas.a
By default, this will be set to $(FBLASlib), which means ATLAS will
test and time itself against the Fortran reference BLAS which ATLAS
autocompiles during the install. You can reset this to a vendor-supplied
BLAS if you like.
On some machines, the compiler will recognize certain flags that link
in the vendor-optimized BLAS library. You can place these in the BLASlib
variable as well. There are too many of these to list in detail here, but
here are a few examples of vendor-supplied BLAS:
BLASlib = -xlic_lib=sunperf # on sun machines using Sun workshop compiler
BLASlib = -ldxml # Using Dec/Compaq's compiler
BLASlib = -lcxml # Using Compaq/Dec's compiler
BLASlib = -lessl # IBM machines using IBM compiler
BLASlib = -lesslp2 # IBM Power2 machines using IBM compiler
BLASlib = -lesslp3 # IBM Power3 machines using IBM compiler
BLASlib = -lblas # IRIX using SGI's compiler
After you're sure that the BLASlib variable is set properly, read section
3 and 4 on the ATLAS LEVEL 3 TIMER/TESTER PROGRAMS to learn how to build
and run them.
5. TESTING _WITHOUT_ A BLAS LIBRARY
You may still build and run the ATLAS TESTER/TIMERs programs without a
system BLAS library by testing against the ATLAS provided C reference BLAS.
Just leave the BLASlib variable in the ATLAS/Make.<arch> makefile blank:
BLASlib =
Then, edit ATLAS/bin/l3blastst.c, and change line 87 from:
#define USE_F77_BLAS
to:
#define USE_L3_REFERENCE
Edit ATLAS/bin/l2blastst.c and change line 56 from:
#define USE_F77_BLAS
to:
#define USE_L2_REFERENCE
6. THE ATLAS LEVEL 3 TIMER/TESTER PROGRAMS
To make the single, double, single complex, and double complex
programs, type:
make xsl3blastst
make xdl3blastst
make xcl3blastst
make xzl3blastst
Running the programs without arguments will time _GEMM with square
problem sizes from 100 to 1000 by 100, alpha=1.0 and beta=1.0, and A
and B are non-transpose:
./xdl3blastst
DGEMM
TEST TA TB M N K alpha beta Time Mflop SpUp PASS
==== == == === === === ===== ===== ====== ===== ==== ====
1 N N 100 100 100 1.0 0.0 0.02 200.0 1.00 ---
1 N N 100 100 100 1.0 0.0 0.01 200.0 1.00 YES
2 N N 200 200 200 1.0 0.0 0.09 177.8 1.00 ---
2 N N 200 200 200 1.0 0.0 0.09 177.8 1.00 YES
3 N N 300 300 300 1.0 0.0 0.35 154.3 1.00 ---
3 N N 300 300 300 1.0 0.0 0.29 186.2 1.21 YES
4 N N 400 400 400 1.0 0.0 0.73 175.3 1.00 ---
4 N N 400 400 400 1.0 0.0 0.68 188.2 1.07 YES
5 N N 500 500 500 1.0 0.0 1.48 168.9 1.00 ---
5 N N 500 500 500 1.0 0.0 1.35 185.2 1.10 YES
6 N N 600 600 600 1.0 0.0 2.47 174.9 1.00 ---
6 N N 600 600 600 1.0 0.0 2.30 187.8 1.07 YES
7 N N 700 700 700 1.0 0.0 4.01 171.1 1.00 ---
7 N N 700 700 700 1.0 0.0 3.65 187.9 1.10 YES
8 N N 800 800 800 1.0 0.0 5.74 178.4 1.00 ---
8 N N 800 800 800 1.0 0.0 5.43 188.6 1.06 YES
9 N N 900 900 900 1.0 0.0 8.38 174.0 1.00 ---
9 N N 900 900 900 1.0 0.0 7.68 189.8 1.09 YES
10 N N 1000 1000 1000 1.0 0.0 11.25 177.8 1.00 ---
10 N N 1000 1000 1000 1.0 0.0 10.58 189.0 1.06 YES
NTEST=10, NUMBER PASSED=10, NUMBER FAILURES=0
Notice that there are two entries for each run. The first entry
corresponds to a call to the library that you supply, and the second
entry corresponds to a call to the ATLAS library.
An explanation of each argument follows:
./xd3blastst -help
USAGE: ./xd3blastst -R <rout> -Side <nsides> L/R -Uplo <nuplo> L/U
-Atrans <ntrans> n/t/c -Btrans <ntrans> n/t/c -Diag <ndiags> N/U
-M <m1> <mN> <minc> -N <n1> <nN> <ninc> -K <k1> <kN> <kinc>
-n <n> -m <m> -k <k> -a <nalphas> <alpha1> ... <alphaN>
-b <nbetas> <beta1> ... <betaN> -Test <0/1>
-R <rout> Specifies the routines which you would like to
test/time. The routines for the single and double
precision programs are gemm, symm, syrk, syr2k, trmm,
and trsm (note the omission of the prefix s and d). The
additional routines for the single complex and double
complex programs are hemm, herk, and her2k. You can
also specify the argument like this:
./xd3blastst -R all
which will time all the routines. Or you can specify
some of the routines like this:
./xd3blastst -R 1 symm
./xd3blastst -R 4 syrk trsm symm gemm
but NOT like this:
./xd3blastst -R 2 syr2k all
-Side <nsides> L/R
Specifies the number of Side parameters you would like
to test for the appropriate routines. If a routine does
not take the side parameter, then the argument is ignored.
You can specify the argument like this:
./xd3blastst -R symm -Side 1 L
./xd3blastst -R symm -Side 2 L R
./xd3blastst -R symm -Side 3 R R L
The <nsides> argument is not optional; it must be present.
-Uplo <nuplo> L/U
Specifies the number of Uplo parameters you would like to
test. It's use follows the same behavior as -Side, like this:
./xd3blastst -R 2 syrk syr2k -Uplo 1 U
./xd3blastst -R 2 syrk syr2k -Uplo 2 U L
./xd3blastst -R 2 syrk syr2k -Uplo 4 U U U U
-Diag <ndiag> N/U
Specifies the number of Diag parameters you would like to
test. It's use follows the same behavior as -Side, like this:
./xd3blastst -R trmm -Diag 1 N
./xd3blastst -R trmm -Diag 2 U N
./xd3blastst -R trmm -Diag 4 U N U U
-Btrans <ntrans> N/T/C
Specifies the number of Btrans parameters you would like to
test (only used with gemm). It's use follows the same
behavior as -Side, like this:
./xd3blastst -R gemm -Btrans 1 N
./xd3blastst -R gemm -Btrans 2 T N
./xd3blastst -R gemm -Btrans 4 T N T T
-Atrans <ntrans> N/T/C
Specifies the number of Atrans parameters you would like to
test. It's use follows the same behavior as -Side, like this:
./xd3blastst -R gemm -Atrans 1 N
./xd3blastst -R gemm -Atrans 2 T N
./xd3blastst -R gemm -Atrans 4 T N N T
Also, use -Atrans for routines which only take one TRANS
argument:
./xd3blastst -R trmm -Atrans 2 T N
-M <m1> <mN> <mInc>
-N <n1> <nM> <nInc>
-K <k1> <kK> <kInc>
Specifies the combination of problem sizes to run.
To specify square problem sizes, use -N:
./xd3blastst -R gemm -N 1 10 1
will time all square matrices from dimension 1 to 10.
./xd3blastst -R gemm -M 10 100 10 -N 10 100 10 -K 10 100 10
will time every single problem size imaginable between
10 and 100 incrementing by 10.
-m <m>
-n <n>
-k <k>
Fixes the dimension in question to one value:
./xd3blastst -R gemm -K 1 100 1 -m 100 -n 100
-a <nalphas> <alpha1> ... <alphan>
-b <nbetas> <beta1> ... <betan>
Specifies the number and the value of alphas/betas to try.
./xd3blastst -R gemm -a 4 -1.0 0.0 1.0 2.0 -b 1 0.0
For the complex precision programs, you must specify both
the real and imaginary parts for alpha and beta.
./xz3blastst -R gemm -a 2 -1.0 0.0 1.0 0.0 -b 1 0.0 0.0
For those complex routines that take a real scalar
alpha/beta instead of a complex scalar alpha/beta, the
imaginary part must still be specified, but is
ignored.
./xz3blastst -R her2k -a 1 2.0 3.0
will time her2k with alpha=2.0.
-Test 0/1
Specifies whether or not to test the results of each run.
A brief explanation of testing is provided below.
6.1 TESTING IMPLEMENTATION
The LEVEL 3 TESTER/TIMER programs were created to make performance
analysis easier, not as a validation tool, thus the testing
implementation is modest. For a complete test of ATLAS's LEVEL 3
BLAS implementation, run the CBLAS TESTER described in section 5.
For all routines, except _TRSM, we compute:
||C-D||
x = -----------------------------------------
||A|| * ||B|| * |alpha| * eps * max(M,N,K)
where A, B, and alpha are arguments to the routine, C is the result
matrix from the call to a trusted BLAS library, D is the result matrix from
the call to ATLAS, eps is the epsilon value for the machine, and
max(M,N,K) is the largest value of M, N, K which describe the
dimensions for the argument and result matrices to the routine. The
operation ||N|| is the column norm of matrix N, and x <= O(1).
For _TRSM, we compute:
||B-A*X||
x = ----------------------------------------
||A|| * ||X|| * |alpha| * eps * max(M,N)
where A, B, and alpha are arguments to the routine, X is the result
matrix from the ATLAS _TRSM call, and max(M,N) is the larger
value of M an K.
The data for the argument matrices are generated internally, using the
ANSI C rand() function, and are distributed over the interval (-.5,+.5).
In any case, if x > 1 then an error will be output:
DGEMM
TEST TA TB M N K alpha beta Time Mflop SpUp PASS
==== == == === === === ===== ===== ====== ===== ==== ====
1 N N 100 100 100 1.0 0.0 0.01 259.7 1.00 ---
ERROR: ferr is 4860974538.606986
1 N N 100 100 100 1.0 0.0 0.01 227.9 0.88 NO
2 N N 200 200 200 1.0 0.0 0.05 291.6 1.00 ---
ERROR: ferr is 8411267408.031064
2 N N 200 200 200 1.0 0.0 0.06 274.5 0.94 NO
3 N N 300 300 300 1.0 0.0 0.17 327.2 1.00 ---
ERROR: ferr is 2895940442.476244
3 N N 300 300 300 1.0 0.0 0.20 272.5 0.83 NO
Ferr is the value of x.
What can we infer from the error? Not much. If the two result
matrices are 'roughly the same', then no error is
produced. Otherwise, the result matrices are 'not roughly the same'.
However, if you see this error message it's best to test both
libraries (if ATLAS doesn't fail, test your ``trusted'' BLAS)
with the BLAS testers from netlib:
www.netlib.org/blas/sblat3
www.netlib.org/blas/dblat3
www.netlib.org/blas/cblat3
www.netlib.org/blas/zblat3
7. Timing the Level 2 BLAS
The level 2 timer/tester is very similar in action to the level 3 timer.
to make, in BLDdir/bin/, type:
make xsl2blastst
make xdl2blastst
make xcl2blastst
make xzl2blastst
The flags are very similar to those accepted by the level 3 BLAS timer.
For usage help, type
./xdl2blastst -h
8. Timing the Level 1 BLAS
The level 1 timer/tester is very similar in action to the level 2 timer.
to make, in BLDdir/bin/, type:
make xsl1blastst
make xdl1blastst
make xcl1blastst
make xzl1blastst
The flags are very similar to those accepted by the level 2 BLAS timer.
For usage help, type
./xdl1blastst -h
9. Timing the factorization/solves
You can time and test the factor and solve of square linear systems by:
make xsslvtst
make xdslvtst
make xcslvtst
make xzslvtst
You can vary the type of factor timed by setting the -U flag:
l/u : perform Cholesky factor of Lower or Upper positive def matrix
g : perform a LU factor and solve
q : perform a QR factor and solve
You can test all factors at once using this flag, i.e.
-U 4 l u g q
If you want to test/time non-square cases, then you will need to use
the individal factorization testers described next.
10.Timing ATLAS LU, QR and Cholesky
The factor timers may be built in ATLAS/bin/<arch> by:
make xslutst
make xdlutst
make xclutst
make xzlutst
make xsqrtst
make xdqrtst
make xcqrtst
make xzqrtst
make xsllttst
make xdllttst
make xcllttst
make xzllttst
These timers time ATLAS's LU and Cholesky. If you wish to time LAPACK or
some other library's LU and Cholesky for comparison purposes, set your
Make.inc macro FLAPACKlib to point to the appropriate library, and then
make xslutstF
make xdlutstF
make xclutstF
make xzlutstF
make xsllttstF
make xdllttstF
make xcllttstF
make xzllttstF
Both LU and Cholesky testers will run default cases between 100 and 1000
if no arguments are supplied. Both will supply terse usage information
if the -h flag is thrown. These testers are similar to the level 3 tester
in the flags they accept (i.e., -m, -M, -n -N, etc. all work the same). In
addition, the user may pass:
-O <norders> <order1>...<orderN> :
Whether Row-Major or Column-major storage LU/LLt is to be tested
(i.e., R and C are the only legal values for orderX). Note that
non-ATLAS implementations (such as provided by x<pre>lutstF) can only
test Column-major arrays (the default).
-T <thresh> :
supply a floating point threshhold the residual must pass. If set to
negative, no testing is done (saving time and space). If set to zero,
all tests will be flagged as failed.
The QR tester is similar.
11. More detailed LAPACK timings with latime.
The factor/solve timers above are relatively crude. We have a general
LAPACK timer which is more sophisticated, but does only timing and no
testing. The most important targets are:
x<pre>[s,t]latime
x<pre>[s,t]latime_al_ab
x<pre>[s,t]latime_sl_sb
x<pre>[s,t]latime_fl_fb
Where:
<pre> : selectcs type/precision: s/d/c/z
[s,t]: s: serial, t: threaded
_al : ATLAS's lapack
_sl : The system LAPACK
_fl : F77 reference LAPACK
_ab : ATLAS's BLAS
_sb : The system BLAS
_fb : F77 reference BLAS
There are many more variants, as you can fin in BLDdir/bin/Makefile.
12. Other timers/testers, including threading.
ATLAS provides other timer/testers. In particular, note that the timers
in the bin directory have versions to test the threaded interface. To
build these, one simply adds the "_pt" suffix to the timer/tester name
(eg., "make xdlutst_pt" rather than "make xdlutst"). Many of these
timers also have a "_dyn" suffix, which allows you to test against
the dynamically-linked ATLAS libs, assuming you have built them.
In addition to the lu and llt tests mentioned above, we also have
an inversion tester ("make xdinvtst"), an U*U' tester ("make xduumtst").
and a solver tester ("make xdslvtst"). These work similarly to the
LU and LLt testers covered above. The solve tester allows for testing
LU, Cholesky, and for some cases, QR solves.
|