• August 17, 2018

Feel the burn! What to expect during iCluster burn-in

Once iCluster is up and running and the product is “burnt in” there is usually very little administration required. But what can you expect during the burn in period? In fitness terms instructors talk about “no pain, no gain” and “feel the burn.” What this usually means is unless you feel a little pain, i.e feel the burn, you are never sure that things are getting better and you are getting fit. This is also true with High Availability and Disaster Recovery testing.

Unless you test your HA environment and experience a few issues, you can never really be sure that things are working and you are protected. My favourite gotcha story is from Deb Saugen at IBM. For many years one of her customers had always come to the disaster recovery test site with one tape. They would restore the OS, restore the data from a tape and test and everything worked. Then one year the customer arrived with 2 tapes – obviously they had some business growth. When they received the mount request for the second tape, the restore finished almost immediately, but the remaining libraries were missing from the backup box. Many reruns later, Deb asked the customer exactly what happens during the weekly backup. The customer replied that when the operator gets the message to mount the second tape – he replies C to Continue! Well of course the proper response is G to Go and C to Cancel, and they never tested restoring from the second tape.

[box style=”rounded”]

Free 70-day Test Drive

High-Availability and Disaster Recovery (HA/DR) solution
for your IBM i, IBM PureFlex and IBM Power systems

[/box]

What does this have to do with the burn in period for iCluster? Not much…but it is a fun story and it does stress the need to test, test, test. During the first few weeks after installing the product you should be looking for out of sync objects and suspended objects, and fixing the underlying issues i.e. why are the objects getting out of sync or suspended. Check for latency – are there periods of time when latency is above your accepted thresholds, and if so why? You may need to look into journal receivers using the DSPJRNE or even turn on journaling on the file on the backup node to see what’s going on. You may need to change your iCluster configuration. Once things are working as expected, you can turn on automation, and if you are at iCluster 7.1 TR1 or later, try a roleswitch simulator. Feel the burn so that when do perform that first DR test or roleswitch, you know what to expect.

[box type=”note” border=”full”]Stay current with iCluster with this weekly webinar![/box]

iCluster Tech Tuesday 115 Posts

iCluster TechTuesday is a set of posts covering technical tips and techniques to help get the most out of your Rocket iCluster installation.

0 Comments

Leave a Comment

Your email address will not be published. Required fields are marked *