Advanced Computing Platform for Theoretical Physics

README 4.03 KB
Newer Older
rbabich's avatar
rbabich committed
1

2
Release Notes for QUDA v0.2.2                         16 February 2009
3
-----------------------------
4

5
6
7
8
9
10
11
12
13
14
15
16
17
18
Overview:

QUDA is a library for performing calculations in lattice QCD on
graphics processing units (GPUs) using NVIDIA's "C for CUDA" API.
This release includes optimized kernels for applying the Wilson Dirac
operator and clover-improved Wilson Dirac operator, kernels for
performing various BLAS-like operations, and full inverters built on
these kernels.  Mixed-precision implementations of both CG and
BiCGstab are provided, with support for double, single, and half
(16-bit fixed-point) precision.


Software compatibility:

19
The library has been tested under Linux (CentOS 5.3 and Ubuntu 8.04)
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
using release 2.3 of the CUDA toolkit.  There are known issues with
releases 2.1 and 2.2, but 2.0 should work if one is forced to use an
older version (for compatibility with an old driver, for example).

Under Mac OS X, the library fails to compile due to bugs in CUDA 2.3.
It might work with CUDA 2.2 or 2.0, but this hasn't been tested.


Hardware compatibility:

For a list of supported devices, see

http://www.nvidia.com/object/cuda_learn_products.html

Before building the library, you should determine the "compute
capability" of your card, either from NVIDIA's documentation or by
running the deviceQuery example in the CUDA SDK, and set GPU_ARCH in
rbabich's avatar
rbabich committed
37
38
make.inc appropriately.  Setting 'GPU_ARCH = sm_13' will enable double
precision support.
39
40
41
42


Installation:

rbabich's avatar
rbabich committed
43
44
45
46
In the source directory, copy 'make.inc.example' to 'make.inc', and
edit the first few lines to specify the CUDA install path, the
platform (x86 or x86_64), and the GPU architecture (see "Hardware
compatibility" above).  Then type 'make' to build the library.
47

48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
As an optional step, 'make tune' will invoke tests/blas_test to
perform autotuning of the various BLAS-like functions needed by the
inverters.  This involves testing many combinations of parameters
(corresponding to different numbers of CUDA threads per block and
blocks per grid for each kernel) and writing the optimal values to
lib/blas_param.h.  The new values will take effect the next time the
library is built.  Ideally, the autotuning should be performed on the
machine where the library is to be used, since the optimal parameters
will depend on the CUDA device and host hardware.

In summary, for an optimized install, run

    make && make tune && make

By default, the autotuning is performed using CUDA device 0.  To
select a different device number, set DEVICE in make.inc
appropriately.

66
67
68

Using the library:

rbabich's avatar
rbabich committed
69
70
71
72
Include the header file include/quda.h in your application, link
against lib/libquda.a, and study tests/invert_test.c for an example of
the interface.  The various inverter options are enumerated in
include/enum_quda.h.
73

74
75
76

Known issues:

77
78
79
80
81
82
83
* When building for the 'sm_13' GPU architecture (which enables double
  precision support), one of the stages in the build process requires
  over 5 GB of memory.  If too little memory is available, the
  compilation will either take a very long time (given enough swap
  space) or fail completely.  In addition, the CUDA C compiler
  requires over 1 GB of disk space in /tmp for the creation of
  temporary files.
84

85
86
87
88
89
90
* For compatibility with CUDA, on 32-bit platforms the library is
  compiled with the GCC option -malign-double.  This differs from the
  GCC default and may affect the alignment of various structures,
  notably those of type QudaGaugeParam and QudaInvertParam, defined in
  quda.h.  Therefore, any code to be linked against QUDA should also
  be compiled with this option.
91
92
93
94
95
96
97


Contact information:

For help or to report a bug, please contact Mike Clark
(mikec@seas.harvard.edu) or Ron Babich (rbabich@bu.edu).

98
If you find this code useful in your work, please cite:
99

100
101
102
M. A. Clark, R. Babich, K. Barros, R. Brower, and C. Rebbi, "Solving
Lattice QCD systems of equations using mixed precision solvers on
GPUs" (2009), arXiv:0911.3191 [hep-lat].
103

104
Please also drop us a note so that we may inform you of updates and
105
106
bug-fixes.  The most recent public release will always be available
online at http://lattice.bu.edu/quda/
rbabich's avatar
rbabich committed
107