Bob Lawrence knows the pain that comes with accidentally deleting several production IMS databases while they are
online in a CICS production system. Now 20 years older and somewhat wiser, Lawrence recalled the scenario, and his recovery from the trauma, for SearchDatabase.com's True DBA Blooper series.
"At my previous company, we were doing development work for another operating company. My application project manager insisted that I create the production databases for them. This request came two days before I was to go on vacation, and I expected it would take four days to complete.
So, I dutifully created new delete/define parameters by copying the production parameters, and I created a job to run the delete/define pointing to the new members (I had not yet changed to the new IMS dataset names). I was rushing to get this done. I hit the submit PFK rather than the end PFK.
No problem, I thought. I will get a dataset in use by another user (in the online CICS region). I was wrong! Tech support, that day, had separated our batch production from online production to our other system (we had two systems with shared spool and DASD, but no GRS or other global resource sharing systems). The datasets were gone.
Now, it's major panic time. Trembling with fear, I told my boss the news. We shut down the databases from the CICS region, copied the CICS journal and notified the help desk. I thought the databases would be empty. There were only a few records in them when the in-flight transactions were flushed to the databases at shutdown.
How do I recover the databases, I wondered? We did not have DBRC (database recovery control) implemented; it may not have available at this time with IMS1.3.
I had to track down every job that updated the databases after the nightly backup the previous night. Luckily computer operations logged every job submitted and the time it was submitted. I had to scan every job submitted for the databases affected and determine if log files were created. I hoped they were because it was our requirement, but IMS did not require it, and sometimes the programmers skipped creating logfiles.
I gathered the names of all of the logs and the time they were created from our tape management system. I then built restore jobs using the previous night's backups and the logs in chronological order. I then ran the recovery jobs. After the databases were recovered, I ran our SMU pointer checker jobs and prayed to the DBA gods.
There were no reported pointer errors. Everyone involved breathed a sigh of relief. I then image copied the databases and brought them back online and prayed. I did not know then whether or now I was out of the woods, so I told operations to let me know if there were any problems overnight with the applications that used those databases. I woke up every hour during the night.
At 5AM, a junior operator came to my door and said that I was needed in operations for a problem. He did not know what it was; he only knew that I was needed. I started worrying about how I could fix any other problem with the databases.
I got to work 15 minutes later, and the operations supervisor told me that all of the batch processing of the affected databases had finished two hours earlier, but that one of the programmers needed my assistance on an unrelated problem. In the end, it took me 10 seconds to solve the programmer's problem!"
For more true DBA bloopers, click:
Have your own tale of woe to share? Submit your backup/recovery snafus, tuning disasters and ugly upgrades. Stories of good intentions gone bad, over-ambitious and under-trained newbies, clueless consultants, and even more clueless managers will all be accepted. The submitter of the most amusing or wince-inducing blooper of the month will receive a free copy of Craig Mullins' new book Database administration: The complete guide to practices and procedures. Send your bloopers to us today!