Next: Brief Background on Accelerators
Up: NSPCG User's Guide
Previous: Choices for Accelerator
Brief Background on Preconditioners
The choice of a preconditioner involves the selection of a
matrix Q, called the splitting matrix, such that the
preconditioned system
Q-1Au = Q-1b
is better conditioned than the original system, Au = b. Clearly,
one requirement for Q is that it be easily invertible (i.e., linear
systems having Q as the coefficient matrix can be solved with much
less effort than solving the original system Au=b). To reduce the
number of iterations, it is also desirable that Q be ``close" to A
in the sense that the spectral radius of I-Q-1A be small. Thus,
in choosing a preconditioner, the user must select between methods
which usually perform a large number of ``cheap" iterations or a
small number of ``expensive" iterations. NSPCG provides a great
number of preconditioning methods to aid the user in selecting a
preconditioner for a production code.
Left preconditioning is illustrated above. Many nonsymmetric
accelerators in the package allow other orientations for the
preconditioner, such as right preconditioning,
(AQ-1)(Qu) = b
or two-sided preconditioning,
( QL-1AQR-1) (QRu) = QL-1b
in the event that the preconditioner can be split into
Q = QLQR .
The variable IQLR [= IPARM(22)] is set by the user to determine
the orientation of the preconditioner (left, right, or two-sided)
for those accelerators which allow orientations of the preconditioner
other than left preconditioning. Left preconditioning is the
default and is available for all accelerators.
NSPCG supplies preconditioners which can be classified as ``point"
or ``line" preconditioners. Iterative methods are often used in
the solution of large sparse linear systems which arise from the
discretization of partial differential equations using finite
differences or finite elements. The solution to the partial
differential equation is approximated on a discrete set of unknowns
associated with a mesh defined on the domain of the equation. The terms
``point" and ``line" (or ``block") refer to whether the unknowns are
updated one at a time, as with a single node in the mesh, or
several at a time, as with a line of nodes in the mesh.
Let matrix A be written as
If matrix A is viewed as a point matrix, then n is the order of
the system and each Aij is a scalar. In that case, A can be
written as a sum of point matrices
A = D - CL - CU
where D is a diagonal matrix, CL is a strictly lower triangular
matrix, and CU is a strictly upper triangular matrix. If matrix
A is viewed as a block matrix corresponding to a partitioning of the unknowns, then n is the number of groups in the partition
and each Ai,j is itself a matrix. In that case, A can be
written as a sum of block matrices
where is a block diagonal matrix,
is a
strictly lower triangular block matrix, and
is a
strictly upper triangular block matrix. In NSPCG, the following
restrictions apply to block matrices:
- If the matrix is not permuted by NSPCG, then block preconditioners
can only be used with symmetric or nonsymmetric diagonal
storage format of the matrix. In that case, each Aii
is required to be a dense banded matrix, and all submatrices
must be of equal size. Also, the user must specify the
variable KBLSZ [= IPARM(19)]. KBLSZ is the order of the
1-D block diagonal submatrices Aii which must be
constant for all i.
- If the matrix is permuted by NSPCG, then block preconditioners
can be used with any storage format for the matrix. In that case,
each Aii is required to be a dense banded matrix, and the
submatrices can be of unequal size. In the case of primary
storage format, each Aii is required to be a diagonal
matrix.
In the line versions of the preconditioners, a common operation is the
solution of banded subsystems corresponding to the Aii diagonal
block submatrices. NSPCG does not employ a cyclic reduction algorithm
to solve such systems, but will attempt to vectorize the solution if
such a system is actually composed of multiple independent subsystems
of the same size. In this case, the algorithm used is a scalar
forward and back substitution for each individual subsystem, but
with each operation being done on all the subsystems in a vectorizable
way.
POINT PRECONDITIONERS
The point preconditioners available in the package are the
following:
- Richardson method (RF):
- For this method, Q = I, so, in effect, this represents
the null preconditioner.
- Jacobi method:
- For this method, Q = D, the diagonal of A. If the
matrix is known to have positive diagonal elements,
the Jacobi method may be more efficiently applied by
requesting diagonal scaling of the matrix and applying
the RF method to the system.
- Successive Overrelaxation method (SOR):
- For this method,
The SOR iteration parameter
can be determined by an
automatic adaptive procedure. This method allows no
acceleration.
- Symmetric Successive Overrelaxation method (SSOR):
- For this method,
SSOR iteration parameter
can be determined by an
automatic adaptive procedure if SSOR is used with either
conjugate gradient or Chebyshev acceleration. Otherwise,
a user-supplied fixed
is required.
- Incomplete LU Decomposition method (ILU(k)):
- This preconditioner uses an incomplete LU decomposition of
the matrix as a preconditioner. The parameter k denotes
the level of fill-in which is to be allowed in the
factorization. The form of Q is:
Case I occurs when matrix A has Property A (i.e., is 2-cyclic)
and k=0. Case II occurs if either condition fails.
Here,
is a diagonal matrix containing the factorization
pivots, S is a strictly lower triangular matrix, and T is
a strictly upper triangular matrix.
It can be seen that if Case I is true, then a considerable
savings in storage is possible since only a vector of pivots
of length N has to be stored. Q can then be implicitly
represented from just
and the given matrix elements,
which are already stored. If Case II is true, then
it is necessary to store T as well (and S, if A is
nonsymmetric).
A fill-in level of k=0 indicates that no fill-in beyond
the original matrix pattern of nonzeros is to be allowed
in the factorization. For k=1, fill-in resulting from
the original nonzero pattern is allowed but no fill-in
resulting from this newly-created fill-in is allowed. In
general, fill-in at level k results from fill-in from levels
As k grows, the number of
iterations should decrease but at the expense of increased
storage and time per iteration.
- Modified Incomplete LU Decomposition method (MILU(k)):
- This factorization is similar to the
ILU(k) preconditioner
except that the diagonal pivots of the factorization are
adjusted so that Q-A has zero row sums. For many matrices,
this requirement produces a better condition number for
Q-1A than for the ILU(k) preconditioner. Also, this
requirement forces Q-1A to have at least one eigenvalue
equal to one. As in the previous preconditioner, a variable
level of fill-in is allowed.
- Neumann Polynomial method:
- For this method, a truncated Neumann series approximation
to A-1 is used. The user can specify the degree of the
polynomial to be used. Assuming A is written as A=D-C,
where D is the diagonal of A and C=CL+CU,
then
A=(I-CD-1)D so
A truncated form of this series is then used for Q-1.
Q-1 is not explicitly computed; Q-1p is
evaluated for a vector p by using a sequence of matrix-vector
multiplications. This method will be effective if the spectral
radius of CD-1 is less than one.
- Least Squares Polynomial method:
- For this method, a least squares polynomial is used to
approximate the inverse of A:
Since it is desired that the spectral radius of the iteration
matrix
G=I-Q-1A be small, ps is selected so that the
function
f(x)=1-ps(x)x has minimum 2-norm over a domain
which contains the spectrum of A. Note that G=f(A) is the
iteration matrix. This preconditioner is effective only if
A is SPD (symmetric and positive definite) or nearly so,
in which case, the domain is
.
- Reduced System method (RS):
- This method first requires that the system Au=b be permuted to a red-black system:
where DR and DB are diagonal matrices. This will only
be possible if matrix A has Property A [35]. The
black unknowns can be eliminated from the system to yield the
reduced system:
(DR - H DB-1 K) uR = bR - H DB-1 bB
which becomes the new matrix problem to be solved. Once
uR has converged to an answer, uB is found by
uB = DB-1 ( bB - K uR )
The reduced system preconditioner is DR. Note that the
reduced system is not explicitly computed with this
method. However, NSPCG contains a facility for explicitly
computing the reduced system and applying any of the
package preconditioners to it. See Section 11
for more details.
LINE PRECONDITIONERS
The line preconditioners available in the package are the
following:
- Line Jacobi method:
- For this method,
,
the block diagonal part
of A. For many matrices resulting from finite difference
discretizations of partial differential equations,
is a tridiagonal matrix. The solution to the
preconditioning equation
is accomplished by a recursive forward and back solve.
If, however,
consists of multiple independent
subsystems of size KBLSZ, NSPCG will perform each step
of the factorization and solution process across all
the subsystems in an independent manner. This method
will vectorize on computers allowing constant stride
vector operations.
- Line Jacobi (approximate inverse) method:
- This method uses a banded approximation to the inverse of
.
The solution of
is accomplished by
where the
operator indicates a truncation of the
matrix to a banded system. The variable LTRUNC [= IPARM(17)]
is used to determine the half-bandwidth to be used
in the truncation. If
is diagonally dominant, then
the diagonals of
decay at an exponential
rate the further the diagonal is away from the main diagonal.
Hence for diagonally dominant ,
a banded approximation
to the inverse of
will suffice. Thus, the
preconditioning step has been changed from a solve to a
matrix-vector multiply, a vectorizable operation.
- Line SOR (LSOR):
- For this method,
- Line SSOR (LSSOR):
- For this method,
- Incomplete Block LU Decomposition, Version 1:
- For this method, Q takes the form:
Case I occurs when matrix A has block Property A and no
fill-in of blocks is allowed. Case II occurs otherwise.
M here is a block diagonal matrix. Truncated inverses of
diagonal pivot block matrices are used in the construction
of the factorization. We illustrate the factorization
process in the case that A is block tridiagonal:
Then A has block Property A, so Case I is used. Thus we seek
a factorization of the form
Next: Brief Background on Accelerators
Up: NSPCG User's Guide
Previous: Choices for Accelerator