Every now and then, you come across a special project. You know the sort, where some business user decides that they know exactly what they need and exactly how it should be built. They get the buy-in of some C-level shmoe by making sure that their lips have intimate knowledge of said C-level butt. Once they have funding, they have people hired and begin to bark orders.

Toonces, the Driving Cat

About 8 years ago, I had the privilege experience of being on such a project. When we were given the phase-I specs, all the senior tech people immediately said that there was no way to perform a sane daily backup and data-roll for the next day. The response was "We're not going to worry about backups and daily book-rolls until later". We all just cringed, made like good little HPCs and followed our orders to march onward.

Fast forward about 10 months and the project had a sufficient amount of infrastructure that the business user had no choice but to start thinking about how to close the books each day, and roll things forward for the next day. The solution he came up with was as follows:

   1. Shut down all application servers and the DB
   2. Remove PK/FK relationships and rename all the tables in the database from: xxx to: xxx.yyyymmdd
   3. Create all new empty tables in the database (named: xxx)
   4. Create all the PK/FK relationships, indices, triggers, etc.
   5. Prime the new: xxx tables with data from the: xxx.<prev-business-date> tables
   6. Run a job to mirror the whole thing to offsite DB servers
   7. Run the nightly backups (to tape)
   8. Fire up the DB and application servers

Naturally, all the tech people groaned, mentioning things like history tables, wasted time regenerating indices, nightmares if errors occurred while renaming tables, etc., but they were ignored.

Then it happened. As is usually the case when non-technical people try to do technical designs, the business user found himself designed into a corner.

The legitimate business-need came up to make adjustments to transactions for the current business day after the table-roll to the next business day had completed.

The business user pondered it for a bit and came up with the following:

    1. Shut down all application servers and the DB
    2. Remove PK/FK relationships and rename the post-roll tables of tomorrow from xxx to xxx.tomorrow
    3. Copy SOME of the xxx.yyyymmdd tables from the pre-roll current day back to: xxx
       (leaving the PK's and indices notably absent)
    4. Restart the DB and application servers (with some tables rolled and some not rolled)
    5. Let the users make changes as needed
    6. Shut down the application and DB servers
    7. Manually run ad-hoc SQL to propagate all changes to the xxx.tomorrow table(s)
    8. Rename the: xxx tables to: xxx.yyyymmdd.1 
       (or 2 or 3, depending upon how many times this happened per day)
    9. Rename the xxx.tomorrow tables back to: xxx
   10. Rebuild all the PK/FK relationships, create new indices and re-associate triggers, etc.
   11. Rerun the mirroring and backup scripts
   12. Restart the whole thing

When we pointed out the insanity of all of this, and the extremely high likelihood of any failure in the table-renaming/moving/manual-updating causing an uncorrectable mess that would result in losing the entire day of transactions, we were summarily terminated as our services were no longer required — because they needed people who knew how to get things done.

I'm the first to admit that there are countless things that I do not know, and the older I get, the more that list seems to grow.

I'm also adamant about not making mistakes I know will absolutely blow up in my face - even if it costs me a job. If you need to see inside of a gas tank, throwing a lit match into it will illuminate the inside, but you probably won't like how it works out for you.

Five of us walked out of there, unemployed and laughing hysterically. We went to our favorite watering hole and decided to keep tabs on the place for the inevitable explosion.

Sure enough, 5 weeks after they had junior offshore developers (who didn't have the spine to say "No") build what they wanted, someone goofed in the rollback, and then goofed again while trying to unroll the rollback.

It took them three days to figure out what to restore and in what sequence, then restore it, rebuild everything and manually re-enter all of the transactions since the last backup. During that time, none of their customers got the data files that they were paying for, and had to find alternate sources for the information.

When they finally got everything restored, rebuilt and updated, they went to their customers and said "We're back". In response, the customers told them that they had found other ways of getting the time-sensitive information and no longer required their data product.

Not only weren't the business users fired, but they got big bonuses for handling the disaster that they had created.

[Advertisement] Forget logs. Next time you're struggling to replicate error, crash and performance issues in your apps - Think Raygun! Installs in minutes. Learn more.