libsmm_acc is a library for small matrix-matrix multiplication on a GPU-accelerator. Stacks of matrix-matrix multiplication indices are passed from DBCSR to
libsmm_acc which performs the multiplications on the GPU.
For a description of the library (some details are outdated, but this nevertheless provides a very good introduction), see Chapter 8.4 of:
WALKER, R. C., & GOETZ, A. W. (2016). Electronic structure calculations on graphics processing units: from quantum chemistry to condensed matter physics.
libsmm_acc is compiled from within DBCSR, there is no separate compilation.
kernels/: GPU kernels (CUDA- and HIP-compatible) for matrix-matrix multiplication and python interface to autotuning and predictive code.
notebooks/: jupyter notebooks for exploring data generated from autotuning and prediction.
generate_*.py: utility scripts for
libsmm_acc*: libsmm_acc C++ and CUDA / HIP code
parameters_GPU.jsonfiles. These are sets of matrix-matrix multiplication parameters for different (m, n, k)-triplets optimized for a given GPU card. You can explore these parameters interactively using the provided jupyter notebook
predict/: scripts for prediction of optimal parameter sets, see predictive modeling of kernel parameters
tune/: scripts for autotuning of optimal parameter sets, see autotuning of kernel parameters
For a given matrix-matrix multiplication triplet characterized by dimensions
libsmm_acc can run 5 different matrix-matrix multiplication kernels:
which take between 3 - 7 parameters (see figure at the top):
groupingis bigger, less blocks are launched)
tile_n= dimensions of the result block
The performance of the matrix-matrix multiplication kernels is highly dependent on the choice of algorithm and parameters. For this reason,
libsmm_acc provides lists of optimal parameters for different GPU cards and different (m, n, k)-triplets. These sets of optimal parameters can be found either through autotuning or predictive modeling.
Follow the autotuning procedure
Follow the predictive modeling procedure
Choose a kernel
Add the kernel's code (must be able to compile by both
hip) in file
Add python kernel class inheriting from base class
Add the new kernel to the
kernel_algorithm data structure in
Add the GPU to the
gpu_architectures data structure in
Add a minimal JSON file
then add matrix-matrix multiplication parameters for this GPU using autotuning and predictive modeling