Parallel execution in Abaqus/Explicit

Domain-level parallelization

The domain-level method splits the model into a number of topological domains. These domains are referred to as parallel domains to distinguish them from other domains associated with the analysis. The domains are distributed evenly among the available processors. The analysis is then carried out independently in each domain. However, information must be passed between the domains in each increment because the domains share common boundaries. Both MPI and thread-based parallelization modes are supported with the domain-level method.

During initialization, the domain-level method divides the model so that the resulting domains take approximately the same amount of computational expense. The load balance is defined as the ratio of the computational expense of all domains in the most expensive process to that of all domains in the least expensive process. For cases exhibiting significant load imbalance, either because the initial load balancing is not adequate (static imbalance) or because imbalance develops over time (dynamic imbalance), the dynamic load balancing technique may be applied (see Abaqus/Standard and Abaqus/Explicit execution). Dynamic load balancing is based on over-decomposition: the user selects a number of domains that is a multiple of the number of processors. During the calculation, Abaqus/Explicit will regularly measure the computational expense and redistribute the domains over the processors so as to minimize the load imbalance. The following functionality is not supported with dynamic load balancing:

Selective subcycling (Selective subcycling)
Co-simulation (About co-simulation)
Predefined fields using a results file (Predefined Fields)

The efficiency of the dynamic load balancing scheme depends on the load imbalance inherent to the problem, on the degree of overdecomposition, and on the efficiency of the hardware. Most imbalanced problems will see optimal performance improvement when the number of domains is two to four times the number of processors. The efficiency may be significantly reduced on systems with a slow interconnect, such as Gigabit Ethernet clusters. Best results are obtained when an external interconnect is not needed, such as within a multicore node of a cluster, or on a shared-memory system. Applications most likely to benefit from dynamic load balancing are problems with a strongly time-dependent and/or spatially varying computational load. Examples are models containing airbags, where contact-impact activity is highly localized and time dependent; and coupled Lagrangean-Eulerian problems, where constitutive activity follows the material as it moves through empty space.

Element and node sets are created for each domain and can be inspected in Abaqus/CAE. The sets are named domain_n, where n is the domain number.

During the analysis, separate state (job-name.abq) and selected results (job-name.sel) files are created. There will be one state and one selected results file for each processor. The naming convention is to append the processor number to the file name. For example, the state files are named job-name.abq.n, where n is the processor number. At the completion of the analysis the individual files are merged automatically into a single file (for example, job-name.abq), and the individual files are deleted.

Input File Usage

Enter the following input on the command line:

abaqus job=job-name cpus=n parallel=domain domains=m dynamic_load_balancing

For example, the following input will run the job “beam” on two processors with the domain-level parallelization method:

abaqus job=beam
cpus=2 parallel=domain domains=2

The domain-level parallelization method can also be set in the environment file using the environment file parameters parallel=DOMAIN and domains.

Abaqus/CAE Usage

Job module: job editor: Parallelization: toggle on Use multiple processors
and specify the number of processors, n; Number of domains: m;
toggle on Activate dynamic load balancing; Parallelization method:
Domain

You can activate dynamic load balancing when the number of domains is a multiple of the number of processors.

Consistency of results

The analysis results are independent of the number of processors used for the analysis. However, the results do depend on the number of parallel domains used during the domain decomposition. Except for cases in which the single- and multiple-domain models are different due to features that are not yet available with multiple parallel domains (discussed below), these differences should be triggered only by finite precision effects. For example, the order of the nodal force assembly may depend on the number of parallel domains, which can result in differences in trailing digits in the computed force. Some physical systems are highly sensitive to small perturbations, so a tiny difference in the force applied in one increment can result in noticeable differences in results in subsequent increments. Simulations involving buckling and other bifurcations tend to be sensitive to small perturbations.

To obtain consistent analysis results from run to run, the number of domains used in the domain decomposition should be constant. Increasing the number of domains increases the computational cost slightly; therefore, unless dynamic load balancing is being applied, it is recommended that the number of domains be set equal to the maximum number of processors used for analysis execution for optimal performance. If you do not specify the number of domains, the number defaults to the number of processors.

Features that do not allow domain-level parallelization

The use of the domain-level parallelization method is not allowed with the following features:

Extreme value output.
Steady-state detection.

If these features are included, an error message will be issued.

Features that cannot be split across domains

Certain features cannot be split across domains. The domain decomposition algorithm automatically takes this into account and forces these features to be contained entirely within one domain. If fewer domains than requested processors are created, Abaqus/Explicit issues an error message. Even if the algorithm succeeds in creating the requested number of domains, the load may be balanced unevenly. If this behavior is not acceptable, the job should be run with the loop-level parallelization method.

Adaptive smoothing domains cannot span parallel domain boundaries. The nodes on the boundary between an adaptive smoothing domain and a nonadaptive domain as well as the adaptive nodes on the surface of the adaptive smoothing domain cannot be shared with another parallel domain. To enforce this in a consistent manner when parallel domains are specified, all nodes shared by adjacent adaptive smoothing domains will be set as nonadaptive. In this case the analysis results may be significantly different from that of a serial run with no parallel domains. Set the number of parallel domains to 1, and switch to the loop-level parallelization method if this behavior is undesirable. See Defining ALE adaptive mesh domains in Abaqus/Explicit for details.

A contact pair cannot be split across parallel domains, but separate contact pairs are not restricted to be in the same parallel domain. A contact pair that uses the kinematic contact algorithm requires that all of the nodes associated with the involved surfaces be within a single parallel domain and not be shared with any other parallel domains. A contact pair that uses the penalty contact algorithm requires that the associated nodes be part of a single parallel domain, but these nodes may also be part of other parallel domains. Analyses in which a large percentage of nodes are involved in contact may not scale well if contact pairs are used, especially with kinematic enforcement of contact constraints. General contact does not limit the domain decomposition boundaries.

Nodes involved in kinematic constraints (About Kinematic Constraints), with the exception of surface-based shell-to-solid constraints, will be within a single parallel domain; and they will not be shared with another parallel domain. However, two kinematic constraints that do not share nodes can be placed within different parallel domains.

In some cases beam elements that share a node may be forced into the same parallel domain. This happens only for beams whose center of mass does not coincide with the location of the beam node or for beams with additional inertia (see Adding inertia to the beam section behavior for Timoshenko beams).

User influence on domain decomposition

You can influence the domain decomposition by specifying one or more regions that are independently decomposed into a user-specified number of parallel domains or by specifying that an element set should be constrained to the same parallel domain.

Specifying a domain decomposition region can be useful when a local region of the model is computationally intensive. Performance gains may be achieved by identifying the local region as an independent domain decomposition region, thereby distributing computation of the local regions among all processors. You can specify the domain decomposition region by defining an element set directly, or Abaqus/Explicit can generate the domain decomposition region consisting of all elements within a user-specified box. The part of the model that is not included in any user-specified domain decomposition region is considered as the global region and is also decomposed into the user-specified number of parallel domains. You can specify that each decomposition region can be decomposed using a recursive coordinate bisection (RCB) algorithm or a graph partitioning algorithm that minimizes the number of shared nodes. The RCB algorithm is the default for all domain decomposition regions. You can also specify that each domain decomposition region can be decomposed into $N * n_{d o m a i n s U s e r}$ domains by specifying a decompose factor N. The domains from each independent domain decomposition are distributed evenly among the available processors, but these domains can be reassigned to different processors during the analysis if dynamic load balancing is activated. The total number of parallel domains for the simulation is

n_{d o m a i n s} = (n_{g l o b a l R e g i o n s} * N_{g} + \sum_{i = 1}^{n_{l o c a l Re g i o n s}} N_{i}) * n_{d o m a i n s U s e r},

where

$n_{l o c a l R e g i o n s}$: is the number of local regions identified as independent domain decomposition regions;
$n_{g l o b a l R e g i o n s}$: is equal to 1 if any elements are not included in local regions identified as independent domain decomposition regions; otherwise, $n_{g l o b a l R e g i o n s}$ is 0;
$N_{i}$: is the decompose factor for domain decomposition region $i$ ;
$N_{g}$: is the decompose factor for the global domain decomposition region; and
$n_{d o m a i n s U s e r}$: is the user-specified number of domains per domain decomposition region (see Domain-level parallelization).

Separate domain decomposition regions may be desired, for example, in bird-strike models (where contact-impact activity is highly localized and time dependent) and coupled Eulerian-Lagrangian problems with localized adaptive mesh refinement (where elements are refined adding to the computational cost). The example below (Figure 1) shows a spherical projectile impacting a flat plate with a failure model, thus allowing the projectile to perforate the plate. One of the domains contains the projectile as well as a significant portion of the impact area. Specifying a domain decomposition region consisting of the projectile as well as the computationally intensive impact area results in a more balanced parallel processing (Figure 2). In this example $n_{l o c a l R e g i o n s} = 1$ and $n_{g l o b a l R e g i o n s} = 1$ ; therefore, $n_{d o m a i n s} = 2 * n_{d o m a i n s U s e r}$ .

Figure 1. Original domain decomposition.

Figure 2. Modified domain decomposition.

Multiple domain decomposition regions can be specified. In the case of overlap between the domain decomposition regions, by default, the first specified decomposition keeps the overlapped elements. Some modeling features cannot be split across domains, and Abaqus/Explicit automatically merges the domain decomposition regions that contain features that cannot be split.

Input File Usage

Use the following option to define a domain decomposition region using a user-specified element set:

DOMAIN DECOMPOSITION, DEFINITION=ELSET,ELSET=element_set_name

Use the following option to define a domain decomposition region by specifying that Abaqus/Explicit should generate the domain decomposition region consisting of all elements contained within the user-specified box:

DOMAIN DECOMPOSITION, DEFINITION=BOX,ELSET=element_set_name

Use the following option to decompose the region using a recursive coordinate bisection algorithm (default):

DOMAIN DECOMPOSITION, METHOD=RCB

Use the following option to decompose the region using a graph partitioning algorithm:

DOMAIN DECOMPOSITION, METHOD=GRAPH PARTITIONING

Use the following option to specify a decompose factor for the region (default is 1):

DOMAIN DECOMPOSITION, DECOMPOSE FACTOR=N

Use the following option to constrain an element set to the same parallel domain:

DOMAIN DECOMPOSITION
element_set_name, SAME DOMAIN

Restart

There are certain restrictions for restart when using domain-level parallelization. To ensure that optimal parallel speedup is achieved, the number of processors used for the restart analysis must be chosen so that the number of parallel domains used during the original analysis can be distributed evenly among the processors. Because the domain decomposition is based only on the features specified in the original analysis and steps defined therein, features that affect domain decomposition are restricted from being defined in restart steps only if they would invalidate the original domain decomposition. Because the newly added features will be added to existing domains, there is a potential for load imbalance and a corresponding degradation of parallel performance.

The restart analysis requires that the separate state and selected results files created during the original analysis be converted into single files, as described in Abaqus/Standard and Abaqus/Explicit execution. This should be done automatically at the conclusion of the original analysis. If the original analysis fails to complete successfully, you must convert the state and selected results files prior to restart. An Abaqus/Explicit analysis packaged to run with a domain-level parallelization technique cannot be restarted or continued with a loop-level parallelization technique.

Co-simulation

The co-simulation technique (About co-simulation) for run-time coupling of Abaqus/Explicit to Abaqus/Standard or to third-party analysis programs can be used with Abaqus/Explicit running either in serial or parallel.

Parallel execution in Abaqus/Explicit

Invoking parallel processing

Domain-level parallelization

Consistency of results

Features that do not allow domain-level parallelization

Features that cannot be split across domains

User influence on domain decomposition

Restart

Co-simulation

Loop-level parallelization

Restart

Measuring parallel performance

Output