Rocket UniData and UniVerse Replication best practices: preparation
Part 1 of 3
This blog post offers best practices related to implementing and using Replication, as part of a High Availability/Disaster Recovery (HA/DR) strategy. In Part 1, I’ll focus on preparation.
I make no apology for this: U2 Replication is NOT an administration free option. In the event of a failure, your organization will absolutely depend on Replication.
This blog series will help you to:
- Avoid common pitfalls that can be encountered in U2 Replication
- Understand the methods and strategies that have worked well for others
- Understand the tools that are available for monitoring and tuning of replication
HA/DR 101. The first thing to do when setting up your machines within Replication – primary and standby (the most common configuration) is make sure that both of these machines have fixed IP addresses – their IP addresses do not change. Then, setup a high-level name at the DNS level such as ‘live’- to connect to the current primary or publishing machine.
When the primary machine is the publisher then ‘live’ is mapped to the IP address for that machine by means of a network routing change (e.g. DNS or router). When the standby or subscribing machine becomes the failed over publisher, then you map ‘live’ to the IP address for that machine.
This is a really easy way to have one configuration for all your client connections (Webservers, SBClient, wIntegrate, Dynamic Connect, Telnet etc.) in that you don’t have to change the configuration in the event of a failover. The alternative, for example, is that you would have to reconfigure every client connection or instruct every user to use an alternative configuration file in the event of a failover.
When you look at the systems involved in Replication (because of some of the calls we make), it’s a best practice to identify the systems by their host names rather than their IP addresses. Make sure the host names you’re using don’t have any problems. Here’s how you do it:
- Use the OS command ‘hostname’ to see what the OS thinks the system is called.
- Use a ping from an outside source to ensure the machines can be pinged.
- Use the OS command ‘nslookup’ to further look for any errors in server name and IP address mismatches
- Look in the /etc/hosts file to verify the systems involved are mentioned in the /etc/hosts file and are correct. To elaborate, if you see entries have been added to the /etc/hosts file, that’s a very good indication that someone else has already experienced a problem trying to identify the machines involved in Replication, so please check carefully. This is a last resort. Best practice is to rely on your OS and DNS lookups to resolve host names and their IP addresses.
When you’re looking at the two systems, there are some simple, basic checks to help you verify that the two systems that have been identified can talk to each other.
- Can the machines ‘ping’ each other?
- Note: “ping” may be blocked on some networks by Administrator choice.
- Can you ‘telnet’ from one machine to the other?
- A connection should succeed, though there will not be any visible output.
- Can both systems ‘telnet’ to the unirpcd listening port – each to the other?
- Do you need to open the unirpcd port in the firewall?
- Some systems may have both a “hosts.allow” and “hosts.deny” file that you may need to check.
- Make sure the unirpcservices file has been populated correctly, i.e. the replication service is listed (handled by installation and upgrade processes)
- rmconn’nn’ and unirep’nn’ for UniData, uvsub and uvrmconn for UniVerse – without these, Replication won’t be able to sync the subscriber with the publisher
Another best practice is to understand how many accounts are going to be replicated. Doing this will give you a good understanding of the complexity and the structure, size, volume throughput and inter-linking of those accounts. I recommend this because many of us are used to putting things in our voc files that point all over the place (i.e. a lot of interlinking between accounts and files). As I said, understanding this at this early stage will give you an early indication of how complicated the group configuration of Replication will need to be including tuning that will be necessary.
Currently, we can only get key information via Monitoring once Replication is turned on (to tune the configuration). We are planning to provide a ’dry run’ tool that will collect the file traffic statistics needed for configuration prior to implementation. Currently this information can only be estimated and requires an iterative cycle of configuration changes and monitoring to achieve the best result.
Other things to consider, from an accounts perspective, is a list of files that don’t need to be replicated. There is a standard file exclusion list included in the repacct.def file but think about other files that don’t need to be replicated. We do allow pattern name matching on file exclusions (you can use three ellipses before or after any piece of text). A common example of this would be temporary or work files that would not be needed in the event of failing over to another machine (e.g.TMP001 or WORKFILE104 can be excluded). These files and any pattern to these files can be added to the repacct.def file, so you don’t have to add the exclusion repeatedly in multiple account level groups.
One thing we allow during Replication is the ability to create or delete files on the fly. Because we store a map in shared memory, called the Replication Object table, and the size of the table is fixed when UniData or UniVerse starts, you need to understand how many files you will create / delete so you can set the reserved file space parameter accurately. If you exhaust the reserved file space, Replication will fail and give an error message that the Replication Object table is full. If this happens, issue a reconfiguration command to get Replication up and running again without having to restart UniData or UniVerse. In short, being prepared helps especially with more advanced configurations, for example you can specify file names that don’t exist yet, so that when the files are created they will go under control of a certain group. I’d also like to point you to the documentation related to RESERVED_FILE_SPACE. Look on page 32 of the UniVerse U2 Data Replication User Guide and on page 62 of the UniData U2 Data Replication User Guide.
This next point is not essential, but it is useful to know if the accounts to be replicated all have entries in the UV/UD.ACCOUNT file. These entries can help as we can define the path resolution to an account name in several places for U2 Replication. Having the same account name defined in multiple places will result in confusion. In addition, to use the XAdin, the account names need to be defined in the UV/UD.ACCOUNT file or in the repsys file. There’s an advantage of defining the account names in the repsys file: you can develop a configuration that is very portable to your customers’ machines. One caveat – don’t define an account name in the repsys file that already exists in the UD/UV.ACCOUNT file because this will cause Replication not to start. You can contact Support to get a tool I wrote, part of the monitor phantom / script deployment package, to help you avoid this from the start.
Another best practice when setting up Replication is to make sure the systems involved can send an email to someone in the organization. Although not essential, the ability for the systems to send emails allows you to easily deploy the replication monitoring tools and exception action scripts. Please note that Support can’t help you set up email due to the number of variables (dependent on your OS). Example exception action scripts ship with the latest versions of UniVerse and UniData for Unix. If you’re running on Windows, these are available on request from the Support Sharepoint drive.
Replication Log File Locations
To finish up Replication Preparation best practices, think about where your Replication Log Files are going to exist. If Replication becomes suspended or you haven’t got the configuration set up properly, the updates will be stored in the Replication Log Files before being sent to the subscriber.
In terms of permissions, all users will need to write to the ‘LEF’ log extension files and the U2 replication processes need to be able to write all the others. Experience shows customers can be over zealous with security on these, which will introduce problems.
The size of the Log directory will determine the period of time a system can remain suspended and the level of sustainable overflow. Our recommendation for most customers is to use a large separate disk or disk volume for Log Files. How much space is needed? We cannot answer that exactly at this stage, but generally the larger the better. Although not essential in terms of a proof of concept, experience shows this can become a critical factor very quickly.
I hope you found Part 1 useful. Look for information on Monitoring in Part 2. In the meantime, if you want to listen to the entire Replication Best Practices webinar, please feel free listen and share within your organization.