libsmm_acc
is a library for small matrix-matrix multiplication on a GPU-accelerator. Stacks of matrix-matrix multiplication indices are passed from DBCSR to libsmm_acc
which performs the multiplications on the GPU.
For a description of the library (some details are outdated, but this nevertheless provides a very good introduction), see Chapter 8.4 of:
WALKER, R. C., & GOETZ, A. W. (2016). Electronic structure calculations on graphics processing units: from quantum chemistry to condensed matter physics.
libsmm_acc
is compiled from within DBCSR, there is no separate compilation.
kernels/
: GPU kernels (CUDA- and HIP-compatible) for matrix-matrix multiplication and python interface to autotuning and predictive code.notebooks/
: jupyter notebooks for exploring data generated from autotuning and prediction.generate_*.py
: utility scripts for libsmm_acc
compilationlibsmm_acc*
: libsmm_acc C++ and CUDA / HIP codeparameters/
: contains parameters_GPU.json
files. These are sets of matrix-matrix multiplication parameters for different (m, n, k)-triplets optimized for a given GPU card. You can explore these parameters interactively using the provided jupyter notebookpredict/
: scripts for prediction of optimal parameter sets, see predictive modeling of kernel parameterstune/
: scripts for autotuning of optimal parameter sets, see autotuning of kernel parametersFor a given matrix-matrix multiplication triplet characterized by dimensions
libsmm_acc
can run 5 different matrix-matrix multiplication kernels:
which take between 3 - 7 parameters (see figure at the top):
grouping
is bigger, less blocks are launched)tile_m
* tile_n
= dimensions of the result block T
P_A
and P_B
)P_C
)The performance of the matrix-matrix multiplication kernels is highly dependent on the choice of algorithm and parameters. For this reason, libsmm_acc
provides lists of optimal parameters for different GPU cards and different (m, n, k)-triplets. These sets of optimal parameters can be found either through autotuning or predictive modeling.
Follow the autotuning procedure
Follow the predictive modeling procedure
Choose a kernel name
Add the kernel's code (must be able to compile by both nvcc
and hip
) in file kernels/smm_acc_dnt_name.h
Add python kernel class inheriting from base class kernels/smm_acc_dnt_name.py
Add the new kernel to the kernel_algorithm
data structure in kernels/smm_acc_predict.py
Add the GPU's compute architecture properties to kernels/gpu_properties.json
. For more information on where to find these properties, please refer to the "info" field of kernels/gpu_properties.json
.
Add the GPU to the gpu_architectures
data structure in kernels/smm_acc.py
.
Add the necessary code for setting ARCH_NUMBER
correctly in the CMakeLists
. Also add this GPU to the list of SUPPORTED_CUDA_ARCHITECTURES
or SUPPORTED_HIP_ARCHITECTURES
in the CMakeLists
.
Add a minimal JSON file parameters_GPU.json
, containing:
{
}
then add matrix-matrix multiplication parameters for this GPU using autotuning and predictive modeling