- dbcsr_example_1 : how to create a dbcsr matrix (fortran)
- dbcsr_example_2 : how to set a dbcsr matrix (fortran)
- dbcsr_example_3: how to multiply two dbcsr matrices (in fortran: dbcsr_example_3) and in c++: dbcsr_example_3)
- dbcsr_tensor_example_1 : how to create a dbcsr matrix (fortran)
- the example can be run with different parameters, controlling block size, sparsity, verbosity and more

- dbcsr_tensor_example_2: tensor contraction example (cpp)
- tensor1 x tensor2 = tensor3, (13|2)x(54|21)=(3|45)

Compile the DBCSR library, using `-DUSE_MPI=ON -DWITH_EXAMPLES=ON`

.

The examples require MPI. Furthermore, if you are using threading, MPI_THREAD_FUNNELED mode is required.

You can run the examples, for instance from the `build`

directory, as follows:

```
srun -N 1 --ntasks-per-core 2 --ntasks-per-node 12 --cpus-per-task 2 ./examples/dbcsr_example_1
```

How to run (this example and DBCSR for tensors in general):

- best performance is obtained by running with mpi and one openmp thread per rank.
- ideally number of mpi ranks should be composed of small prime factors (e.g. powers of 2).
- for sparse data & heterogeneous block sizes, DBCSR should be run on CPUs with libxsmm backend.
- for dense data best performance is obtained by choosing homogeneous block sizes of 64 and by compiling with GPU support.