Compile the DBCSR library, using
The examples require MPI. Furthermore, if you are using threading, MPI_THREAD_FUNNELED mode is required.
You can run the examples, for instance from the
build directory, as follows:
srun -N 1 --ntasks-per-core 2 --ntasks-per-node 12 --cpus-per-task 2 ./examples/dbcsr_example_1
Run tensor examples
How to run (this example and DBCSR for tensors in general):
- best performance is obtained by running with mpi and one openmp thread per rank.
- ideally number of mpi ranks should be composed of small prime factors (e.g. powers of 2).
- for sparse data & heterogeneous block sizes, DBCSR should be run on CPUs with libxsmm backend.
- for dense data best performance is obtained by choosing homogeneous block sizes of 64 and by compiling with GPU support.