Sets up an iterator
Contiguous pointers Contiguous pointers may incur reallocation penalties but enable quick passing of arrays to routines with unspecified interfaces (i.e., direct calls to BLACS or MPI).
Threading The TYPE(dbcsr_iterator) variable should be thread-private.
The iterator has several modes of operation when used with OpenMP. Two options can be set to influence the behavior.
Threading: shared vs. non-shared The "shared" flag specifies that several threads will be iterating through the same matrix. - Sharing is the default when called from an active parallel region. In the shared mode no two threads will receive the same block; i.e., the work is split among the threads. - If each (or one) thread needs to iterator through all blocks then shared should be set to .FALSE.. (E.g., when called from an enclosing MASTER region or when each thread has its own matrix.) - It is safe to use an iterator in non-shared mode with only one thread. No thread synchronization constructs are used in this case)
Threading in shared mode When in shared mode there are three possibilities to select how the blocks are distributed to the threads.
Type | Intent | Optional | Attributes | Name | ||
---|---|---|---|---|---|---|
type(dbcsr_iterator), | intent(out) | :: | iterator |
the iterator |
||
type(dbcsr_type), | intent(in) | :: | matrix |
DBCSR matrix |
||
logical, | intent(in), | optional | :: | shared |
The matrix is shared between several iterators. Default is .TRUE. Threads are given blocks regardless of the thread distribution; default is .FALSE. Threads are given blocks regardless of the thread distribution, but still grouped by rows; default is .FALSE. Whether returned pointers need to be contiguous; default is FALSE. User promises not to change returned data; default is FALSE |
|
logical, | intent(in), | optional | :: | dynamic |
The matrix is shared between several iterators. Default is .TRUE. Threads are given blocks regardless of the thread distribution; default is .FALSE. Threads are given blocks regardless of the thread distribution, but still grouped by rows; default is .FALSE. Whether returned pointers need to be contiguous; default is FALSE. User promises not to change returned data; default is FALSE |
|
logical, | intent(in), | optional | :: | dynamic_byrows |
The matrix is shared between several iterators. Default is .TRUE. Threads are given blocks regardless of the thread distribution; default is .FALSE. Threads are given blocks regardless of the thread distribution, but still grouped by rows; default is .FALSE. Whether returned pointers need to be contiguous; default is FALSE. User promises not to change returned data; default is FALSE |
|
logical, | intent(in), | optional | :: | contiguous_pointers |
The matrix is shared between several iterators. Default is .TRUE. Threads are given blocks regardless of the thread distribution; default is .FALSE. Threads are given blocks regardless of the thread distribution, but still grouped by rows; default is .FALSE. Whether returned pointers need to be contiguous; default is FALSE. User promises not to change returned data; default is FALSE |
|
logical, | intent(in), | optional | :: | read_only |
The matrix is shared between several iterators. Default is .TRUE. Threads are given blocks regardless of the thread distribution; default is .FALSE. Threads are given blocks regardless of the thread distribution, but still grouped by rows; default is .FALSE. Whether returned pointers need to be contiguous; default is FALSE. User promises not to change returned data; default is FALSE |
SUBROUTINE dbcsr_iterator_start(iterator, matrix, shared, dynamic, &
dynamic_byrows, contiguous_pointers, read_only)
!! Sets up an iterator
!!
!! Contiguous pointers
!! Contiguous pointers may incur reallocation penalties but enable quick
!! passing of arrays to routines with unspecified interfaces (i.e., direct
!! calls to BLACS or MPI).
!!
!! Threading
!! The TYPE(dbcsr_iterator) variable should be thread-private.
!!
!! The iterator has several modes of operation when used with
!! OpenMP. Two options can be set to influence the behavior.
!!
!! Threading: shared vs. non-shared
!! The "shared" flag specifies that several threads will be
!! iterating through the same matrix.
!! - Sharing is the default when called from an active parallel
!! region. In the shared mode no two threads will receive the
!! same block; i.e., the work is split among the threads.
!! - If each (or one) thread needs to iterator through all blocks
!! then shared should be set to .FALSE.. (E.g., when called
!! from an enclosing MASTER region or when each thread has its
!! own matrix.)
!! - It is safe to use an iterator in non-shared mode with only
!! one thread. No thread synchronization constructs are used
!! in this case)
!!
!! Threading in shared mode
!! When in shared mode there are three possibilities to select
!! how the blocks are distributed to the threads.
!! <DL>
!! <DT>Thread distribution</DT>
!! <DD>The default is to use the thread distribution. The thread
!! distribution statically maps rows to threads and should be
!! used whenever retaining a consistent mapping among
!! subsequent iterations is important.</DD>
!! <DT>Dynamic scheduling</DT>
!! <DD>If the dynamic flag is .TRUE., then blocks are given to
!! threads dynamically. By default the assignment is grouped
!! by rows (to minimize synchronization); however, if the
!! dynamic_byrows flag is .FALSE. then every block is
!! assigned dynamically.</DD></DL>
TYPE(dbcsr_iterator), INTENT(OUT) :: iterator
!! the iterator
TYPE(dbcsr_type), INTENT(IN) :: matrix
!! DBCSR matrix
LOGICAL, INTENT(IN), OPTIONAL :: shared, dynamic, dynamic_byrows, &
contiguous_pointers, read_only
!! The matrix is shared between several iterators. Default is .TRUE.
!! Threads are given blocks regardless of the thread distribution; default is .FALSE.
!! Threads are given blocks regardless of the thread distribution, but still grouped by rows; default is .FALSE.
!! Whether returned pointers need to be contiguous; default is FALSE.
!! User promises not to change returned data; default is FALSE
CHARACTER(len=*), PARAMETER :: routineN = 'dbcsr_iterator_start'
INTEGER :: error_handle
TYPE(dbcsr_distribution_obj) :: dist
! ---------------------------------------------------------------------------
MARK_USED(dynamic) ! only used with OMP
CALL timeset(routineN, error_handle)
iterator%shared = .TRUE.
!$ iterator%shared = omp_in_parallel()
IF (PRESENT(shared)) iterator%shared = shared
iterator%dynamic = .TRUE.
!$ iterator%dynamic = .FALSE.
!$ IF (PRESENT(dynamic)) iterator%dynamic = dynamic
IF (PRESENT(dynamic_byrows)) THEN
iterator%dynamic_byrows = dynamic_byrows
IF (iterator%dynamic_byrows) iterator%dynamic = .TRUE.
ELSE
iterator%dynamic_byrows = iterator%dynamic
!$ iterator%dynamic_byrows = iterator%dynamic
END IF
!$ IF (.NOT. iterator%shared) THEN
!$ iterator%dynamic = .FALSE.
!$ END IF
dist = dbcsr_distribution(matrix)
!$ IF (.NOT. dbcsr_distribution_has_threads(dist)) &
!$ DBCSR_WARN("Thread distribution should be defined for OpenMP.")
IF (.NOT. iterator%dynamic .AND. .NOT. dbcsr_distribution_has_threads(dist)) &
DBCSR_ABORT("Thread distribution must be defined for non-dynamic iterator.")
!$ IF (omp_in_parallel() .AND. omp_get_num_threads() /= dbcsr_distribution_num_threads(dist)) &
!$ CALL dbcsr_abort(__LOCATION__, &
!$ "Number of threads has changed from "// &
!$ stringify(dbcsr_distribution_num_threads(dist))// &
!$ " to "//stringify(omp_get_num_threads())//"!")
!Synchronize the positions
NULLIFY (iterator%common_pos)
IF (iterator%dynamic) THEN
! All threads point into the master thread's data space
! (temporarily using the common_int_pointer variable). This is
! not the nicest OpenMP way of doing this but it is also not
! explicitly forbidden.
!
!$OMP BARRIER
!$OMP MASTER
ALLOCATE (iterator%common_pos)
common_int_pointer => iterator%common_pos
common_int_pointer = 0
!$OMP FLUSH (common_int_pointer)
!$OMP END MASTER
!$OMP BARRIER
IF (.NOT. ASSOCIATED(iterator%common_pos)) THEN
iterator%common_pos => common_int_pointer
END IF
!$OMP BARRIER
END IF
!
IF (PRESENT(contiguous_pointers)) THEN
iterator%contiguous_pointers = contiguous_pointers
ELSE
iterator%contiguous_pointers = .TRUE.
END IF
IF (PRESENT(read_only)) THEN
iterator%read_only = read_only
ELSE
iterator%read_only = .FALSE.
END IF
iterator%row = 0
iterator%pos = 0
iterator%rbs => array_data(matrix%row_blk_size)
iterator%cbs => array_data(matrix%col_blk_size)
iterator%roff => array_data(matrix%row_blk_offset)
iterator%coff => array_data(matrix%col_blk_offset)
iterator%local_indexing = matrix%local_indexing
!IF(iterator%local_indexing .AND. .NOT. iterator%dynamic) &
! DBCSR_ABORT("Locally-indexed matrices can only have a dynamic iterator.")
IF (iterator%local_indexing .AND. .NOT. array_exists(matrix%local_rows)) &
CALL dbcsr_abort(__LOCATION__, &
"Local rows mapping array should exist when local indexing is used.")
IF (iterator%local_indexing .AND. .NOT. array_exists(matrix%global_rows)) &
CALL dbcsr_abort(__LOCATION__, &
"Global rows mapping array should exist when local indexing is used.")
iterator%global_rows => array_data(matrix%global_rows)
iterator%local_rows => array_data(matrix%local_rows)
iterator%transpose = .FALSE. !matrix%transpose
iterator%nblks = matrix%nblks
IF (iterator%transpose) THEN
iterator%nblkrows_total = matrix%nblkcols_total
ELSE
iterator%nblkrows_total = matrix%nblkrows_total
END IF
iterator%row_p => matrix%row_p
iterator%col_i => matrix%col_i
iterator%blk_p => matrix%blk_p
!$OMP CRITICAL (crit_data)
iterator%data_area = matrix%data_area
CALL dbcsr_data_hold(iterator%data_area)
!$OMP END CRITICAL (crit_data)
iterator%row_size = 0
IF (.NOT. iterator%dynamic) THEN
iterator%tdist => array_data(dbcsr_distribution_thread_dist(dist))
ELSE
NULLIFY (iterator%tdist)
END IF
!$ IF (iterator%dynamic) THEN
!$OMP SINGLE
!$ IF (iterator%dynamic_byrows) THEN
!$ iterator%common_pos = omp_get_num_threads()
!$ END IF
!$OMP END SINGLE
!$ CALL dbcsr_iterator_seek(iterator, omp_get_thread_num() + 1)
!$ ELSE
CALL dbcsr_iterator_seek(iterator, 1)
!$ END IF
CALL timestop(error_handle)
END SUBROUTINE dbcsr_iterator_start