Sets up an iterator
Contiguous pointers Contiguous pointers may incur reallocation penalties but enable quick passing of arrays to routines with unspecified interfaces (i.e., direct calls to BLACS or MPI).
Threading The TYPE(dbcsr_iterator) variable should be thread-private.
The iterator has several modes of operation when used with OpenMP. Two options can be set to influence the behavior.
Threading: shared vs. non-shared The "shared" flag specifies that several threads will be iterating through the same matrix. - Sharing is the default when called from an active parallel region. In the shared mode no two threads will receive the same block; i.e., the work is split among the threads. - If each (or one) thread needs to iterator through all blocks then shared should be set to .FALSE.. (E.g., when called from an enclosing MASTER region or when each thread has its own matrix.) - It is safe to use an iterator in non-shared mode with only one thread. No thread synchronization constructs are used in this case)
Threading in shared mode When in shared mode there are three possibilities to select how the blocks are distributed to the threads.
Type | Intent | Optional | Attributes | Name | ||
---|---|---|---|---|---|---|
type(dbcsr_iterator), | intent(out) | :: | iterator |
the iterator |
||
type(dbcsr_type), | intent(in) | :: | matrix |
DBCSR matrix |
||
logical, | intent(in), | optional | :: | shared |
The matrix is shared between several iterators. Default is .TRUE. Threads are given blocks regardless of the thread distribution; default is .FALSE. Threads are given blocks regardless of the thread distribution, but still grouped by rows; default is .FALSE. Whether returned pointers need to be contiguous; default is FALSE. User promises not to change returned data; default is FALSE |
|
logical, | intent(in), | optional | :: | dynamic |
The matrix is shared between several iterators. Default is .TRUE. Threads are given blocks regardless of the thread distribution; default is .FALSE. Threads are given blocks regardless of the thread distribution, but still grouped by rows; default is .FALSE. Whether returned pointers need to be contiguous; default is FALSE. User promises not to change returned data; default is FALSE |
|
logical, | intent(in), | optional | :: | dynamic_byrows |
The matrix is shared between several iterators. Default is .TRUE. Threads are given blocks regardless of the thread distribution; default is .FALSE. Threads are given blocks regardless of the thread distribution, but still grouped by rows; default is .FALSE. Whether returned pointers need to be contiguous; default is FALSE. User promises not to change returned data; default is FALSE |
|
logical, | intent(in), | optional | :: | contiguous_pointers |
The matrix is shared between several iterators. Default is .TRUE. Threads are given blocks regardless of the thread distribution; default is .FALSE. Threads are given blocks regardless of the thread distribution, but still grouped by rows; default is .FALSE. Whether returned pointers need to be contiguous; default is FALSE. User promises not to change returned data; default is FALSE |
|
logical, | intent(in), | optional | :: | read_only |
The matrix is shared between several iterators. Default is .TRUE. Threads are given blocks regardless of the thread distribution; default is .FALSE. Threads are given blocks regardless of the thread distribution, but still grouped by rows; default is .FALSE. Whether returned pointers need to be contiguous; default is FALSE. User promises not to change returned data; default is FALSE |
SUBROUTINE dbcsr_iterator_start(iterator, matrix, shared, dynamic, & dynamic_byrows, contiguous_pointers, read_only) !! Sets up an iterator !! !! Contiguous pointers !! Contiguous pointers may incur reallocation penalties but enable quick !! passing of arrays to routines with unspecified interfaces (i.e., direct !! calls to BLACS or MPI). !! !! Threading !! The TYPE(dbcsr_iterator) variable should be thread-private. !! !! The iterator has several modes of operation when used with !! OpenMP. Two options can be set to influence the behavior. !! !! Threading: shared vs. non-shared !! The "shared" flag specifies that several threads will be !! iterating through the same matrix. !! - Sharing is the default when called from an active parallel !! region. In the shared mode no two threads will receive the !! same block; i.e., the work is split among the threads. !! - If each (or one) thread needs to iterator through all blocks !! then shared should be set to .FALSE.. (E.g., when called !! from an enclosing MASTER region or when each thread has its !! own matrix.) !! - It is safe to use an iterator in non-shared mode with only !! one thread. No thread synchronization constructs are used !! in this case) !! !! Threading in shared mode !! When in shared mode there are three possibilities to select !! how the blocks are distributed to the threads. !! <DL> !! <DT>Thread distribution</DT> !! <DD>The default is to use the thread distribution. The thread !! distribution statically maps rows to threads and should be !! used whenever retaining a consistent mapping among !! subsequent iterations is important.</DD> !! <DT>Dynamic scheduling</DT> !! <DD>If the dynamic flag is .TRUE., then blocks are given to !! threads dynamically. By default the assignment is grouped !! by rows (to minimize synchronization); however, if the !! dynamic_byrows flag is .FALSE. then every block is !! assigned dynamically.</DD></DL> TYPE(dbcsr_iterator), INTENT(OUT) :: iterator !! the iterator TYPE(dbcsr_type), INTENT(IN) :: matrix !! DBCSR matrix LOGICAL, INTENT(IN), OPTIONAL :: shared, dynamic, dynamic_byrows, & contiguous_pointers, read_only !! The matrix is shared between several iterators. Default is .TRUE. !! Threads are given blocks regardless of the thread distribution; default is .FALSE. !! Threads are given blocks regardless of the thread distribution, but still grouped by rows; default is .FALSE. !! Whether returned pointers need to be contiguous; default is FALSE. !! User promises not to change returned data; default is FALSE CHARACTER(len=*), PARAMETER :: routineN = 'dbcsr_iterator_start' INTEGER :: error_handle TYPE(dbcsr_distribution_obj) :: dist ! --------------------------------------------------------------------------- MARK_USED(dynamic) ! only used with OMP CALL timeset(routineN, error_handle) iterator%shared = .TRUE. !$ iterator%shared = omp_in_parallel() IF (PRESENT(shared)) iterator%shared = shared iterator%dynamic = .TRUE. !$ iterator%dynamic = .FALSE. !$ IF (PRESENT(dynamic)) iterator%dynamic = dynamic IF (PRESENT(dynamic_byrows)) THEN iterator%dynamic_byrows = dynamic_byrows IF (iterator%dynamic_byrows) iterator%dynamic = .TRUE. ELSE iterator%dynamic_byrows = iterator%dynamic !$ iterator%dynamic_byrows = iterator%dynamic END IF !$ IF (.NOT. iterator%shared) THEN !$ iterator%dynamic = .FALSE. !$ END IF dist = dbcsr_distribution(matrix) !$ IF (.NOT. dbcsr_distribution_has_threads(dist)) & !$ DBCSR_WARN("Thread distribution should be defined for OpenMP.") IF (.NOT. iterator%dynamic .AND. .NOT. dbcsr_distribution_has_threads(dist)) & DBCSR_ABORT("Thread distribution must be defined for non-dynamic iterator.") !$ IF (omp_in_parallel() .AND. omp_get_num_threads() /= dbcsr_distribution_num_threads(dist)) & !$ CALL dbcsr_abort(__LOCATION__, & !$ "Number of threads has changed from "// & !$ stringify(dbcsr_distribution_num_threads(dist))// & !$ " to "//stringify(omp_get_num_threads())//"!") !Synchronize the positions NULLIFY (iterator%common_pos) IF (iterator%dynamic) THEN ! All threads point into the master thread's data space ! (temporarily using the common_int_pointer variable). This is ! not the nicest OpenMP way of doing this but it is also not ! explicitly forbidden. ! !$OMP BARRIER !$OMP MASTER ALLOCATE (iterator%common_pos) common_int_pointer => iterator%common_pos common_int_pointer = 0 !$OMP FLUSH (common_int_pointer) !$OMP END MASTER !$OMP BARRIER IF (.NOT. ASSOCIATED(iterator%common_pos)) THEN iterator%common_pos => common_int_pointer END IF !$OMP BARRIER END IF ! IF (PRESENT(contiguous_pointers)) THEN iterator%contiguous_pointers = contiguous_pointers ELSE iterator%contiguous_pointers = .TRUE. END IF IF (PRESENT(read_only)) THEN iterator%read_only = read_only ELSE iterator%read_only = .FALSE. END IF iterator%row = 0 iterator%pos = 0 iterator%rbs => array_data(matrix%row_blk_size) iterator%cbs => array_data(matrix%col_blk_size) iterator%roff => array_data(matrix%row_blk_offset) iterator%coff => array_data(matrix%col_blk_offset) iterator%local_indexing = matrix%local_indexing !IF(iterator%local_indexing .AND. .NOT. iterator%dynamic) & ! DBCSR_ABORT("Locally-indexed matrices can only have a dynamic iterator.") IF (iterator%local_indexing .AND. .NOT. array_exists(matrix%local_rows)) & CALL dbcsr_abort(__LOCATION__, & "Local rows mapping array should exist when local indexing is used.") IF (iterator%local_indexing .AND. .NOT. array_exists(matrix%global_rows)) & CALL dbcsr_abort(__LOCATION__, & "Global rows mapping array should exist when local indexing is used.") iterator%global_rows => array_data(matrix%global_rows) iterator%local_rows => array_data(matrix%local_rows) iterator%transpose = .FALSE. !matrix%transpose iterator%nblks = matrix%nblks IF (iterator%transpose) THEN iterator%nblkrows_total = matrix%nblkcols_total ELSE iterator%nblkrows_total = matrix%nblkrows_total END IF iterator%row_p => matrix%row_p iterator%col_i => matrix%col_i iterator%blk_p => matrix%blk_p !$OMP CRITICAL (crit_data) iterator%data_area = matrix%data_area CALL dbcsr_data_hold(iterator%data_area) !$OMP END CRITICAL (crit_data) iterator%row_size = 0 IF (.NOT. iterator%dynamic) THEN iterator%tdist => array_data(dbcsr_distribution_thread_dist(dist)) ELSE NULLIFY (iterator%tdist) END IF !$ IF (iterator%dynamic) THEN !$OMP SINGLE !$ IF (iterator%dynamic_byrows) THEN !$ iterator%common_pos = omp_get_num_threads() !$ END IF !$OMP END SINGLE !$ CALL dbcsr_iterator_seek(iterator, omp_get_thread_num() + 1) !$ ELSE CALL dbcsr_iterator_seek(iterator, 1) !$ END IF CALL timestop(error_handle) END SUBROUTINE dbcsr_iterator_start