libsmm_acc
is a library for small matrix-matrix multiplication on a GPU-accelerator. Stacks of matrix-matrix multiplication indices are passed from DBCSR to libsmm_acc
which performs the multiplications on the GPU.
For a description of the library (some details are outdated, but this nevertheless provides a very good introduction), see Chapter 8.4 of:
WALKER, R. C., & GOETZ, A. W. (2016). Electronic structure calculations on graphics processing units: from quantum chemistry to condensed matter physics.
libsmm_acc
is compiled from within DBCSR, there is no separate compilation.
kernels/
: GPU kernels (CUDA- and HIP-compatible) for matrix-matrix multiplication and python interface to autotuning and predictive code.notebooks/
: jupyter notebooks for exploring data generated from autotuning and prediction.generate_*.py
: utility scripts for libsmm_acc
compilationlibsmm_acc*
: libsmm_acc C++ and CUDA / HIP codeparameters/
: contains parameters_GPU.json
files. These are sets of matrix-matrix multiplication parameters for different (m, n, k)-triplets optimized for a given GPU card. You can explore these parameters interactively using the provided jupyter notebookpredict/
: scripts for prediction of optimal parameter sets, see predictive modeling of kernel parameterstune/
: scripts for autotuning of optimal parameter sets, see autotuning of kernel parametersFor a given matrix-matrix multiplication triplet characterized by dimensions
libsmm_acc
can run 5 different matrix-matrix multiplication kernels:
which take between 3 - 7 parameters (see figure at the top):
grouping
is bigger, less blocks are launched)tile_m
* tile_n
= dimensions of the result block T
P_A
and P_B
)P_C
)The performance of the matrix-matrix multiplication kernels is highly dependent on the choice of algorithm and parameters. For this reason, libsmm_acc
provides lists of optimal parameters for different GPU cards and different (m, n, k)-triplets. These sets of optimal parameters can be found either through autotuning or predictive modeling.
We expect users to contribute to the library by providing new optimized kernels and support for new GPUs.
Follow the autotuning procedure
Follow the predictive modeling procedure
Choose a kernel name
Add the kernel's code (must be able to compile by both nvcc
and hip
) in file kernels/smm_acc_dnt_name.h
Add python kernel class inheriting from base class kernels/smm_acc_dnt_name.py
Add the new kernel to the kernel_algorithm
data structure in kernels/smm_acc_predict.py
Add the GPU's compute architecture properties to kernels/gpu_properties.json
. For more information on where to find these properties, please refer to the "info" field of kernels/gpu_properties.json
.
Add the GPU to the gpu_architectures
data structure in kernels/smm_acc.py
.
Add the necessary code for setting ARCH_NUMBER
correctly in the CMakeLists
. Also add this GPU to the list of SUPPORTED_CUDA_ARCHITECTURES
or SUPPORTED_HIP_ARCHITECTURES
in the CMakeLists
.
Add a minimal JSON file parameters_GPU.json
, containing:
{
}
then add matrix-matrix multiplication parameters for this GPU using autotuning and predictive modeling