I'm using rocks (a centos-4 based cluster distribution) trying to get the
parallel version of Gale working with our infinipath MPI stack.
petsc-2.3.2-pl10 seems to work fine. But when I run:
/usr/bin/mpirun -q 0 -np 4 /share/apps/gale-1.2.2/bin/Gale \
input/benchmarks/falling_sphere/sphere_in_cylinder.xml
I get after an 30-120 minutes of running on 4,8,16 or 20 CPUs running on
opterons I get (no normal output whatsoever, just the error):
Gale: build/StgFEM/SLE/SystemSetup/src/StiffnessMatrix.c:545:
_StiffnessMatrix_Build: Assertion `self->rowLocalSize' failed.
Gale:30908 terminated with signal 6 at PC=3c8972e21d SP=7fbffff268. Backtrace:
/lib/../lib64/tls/libc.so.6(gsignal+0x3d)[0x3c8972e21d]
/lib/../lib64/tls/libc.so.6(abort+0xfe)[0x3c8972fa1e]
/lib/../lib64/tls/libc.so.6(__assert_fail+0xf1)[0x3c89727ae1]
/share/apps/gale-1.2.2/bin/Gale(_StiffnessMatrix_Build+0x179)[0x537cde]
Gale: build/StgFEM/SLE/SystemSetup/src/StiffnessMatrix.c:545:
_StiffnessMatrix_Build: Assertion `self->rowLocalSize' failed.
Gale:7810 terminated with signal 6 at PC=3b2f22e21d SP=7fbfffeed8. Backtrace:
/lib/../lib64/tls/libc.so.6(gsignal+0x3d)[0x3b2f22e21d]
/lib/../lib64/tls/libc.so.6(abort+0xfe)[0x3b2f22fa1e]
/lib/../lib64/tls/libc.so.6(__assert_fail+0xf1)[0x3b2f227ae1]
/share/apps/gale-1.2.2/bin/Gale(_StiffnessMatrix_Build+0x179)[0x537cde]
MPIRUN.icompute-4-19: 2 ranks have not yet exited 60 seconds after rank 3 (node
icompute-3-11) exited without reaching MPI_Finalize().
MPIRUN.icompute-4-19: Waiting at most another 60 seconds for the remaining ranks
to do a clean shutdown before terminating 2 node processes
Any idea if this is a problem with Gale? Petsc? Or something specific to my
install? I tried another example which seems to get a normal response from:
/usr/bin/mpirun -q 0 -np 4 /share/apps/gale-1.2.2/bin/Gale
input/benchmarks/extension.xml
TimeStep = 1, Start time = 0 + 0 prev timeStep dt
TimeStep = 1, Start time = 0 + 0 prev timeStep dt
TimeStep = 1, Start time = 0 + 0 prev timeStep dt
TimeStep = 1, Start time = 0 + 0 prev timeStep dt
( it's still running)
|