Loudwater
Systems
Management
Notes on an Interesting Failure
Today the CTC failed, bringing S2 down. One of our purge jobs
was running.
As DB2D started coming back up, it automatically rolled back all
changes that were uncommitted at the moment of failure. This included
our purge.
Clearly we will have to rerun the purge -- no worries. Pity about
DB2D diligently rebuilding data we don't need.
Ah BUT! You see, the time taken by DB2D to restart is dependent
on how much there is to roll back. Unfortunately our purge job
was using the SQL DELETE command to purge MVS_ADDRSPACE_T.
The restart has been rolling 2 hours now and isn't yet complete.
MORAL OF THE STORY
The key to improving this is to know that it is UNCOMMITTED changes
that get rolled back. So I've put SQL COMMIT commands into the
stream of DELETE commands that the job executes. So at least the job
won't go back all the way to the beginning, nor will DB2D take quite
so long to restart, if this ever happens again.