Saturday, July 16, 2016

Retrospective Automatic Image Replication in NetBackup

In version 7.x of NetBackup, VERITAS added the Automatic Image Replication functionality. This technology is more commonly referred to as "AIR". Its primary use case is to enable a NetBackup administrator to easily configure data replication between two different — typically geographically-disbursed — NetBackup domains.

Like many tools that are designed for a given use-case, AIR can be used for things it wasn't specifically designed for. Primary down-side to these not-designed-for use-cases is the documentation and tool-sets for such usage is generally pretty thin.

A customer I was assisting wanted to upgrade their appliance-based NetBackup system but didn't want to have to give up their old data. Because NetBackup appliances use Media Server Deduplication Pools (MSDP), it meant that I had a couple choices in how to handle their upgrade. I opted to try to use AIR to help me quickly and easily migrate data from their old appliance's MSDP to their new appliance's.

Sadly, as is typical of  not-designed-for use-case, documentation for doing it was kind of thin-on-the ground. Worse, because Symantec had recently spun VERITAS back off as its own entity, many of the forums that survived the transition had reference- and discussion-links that pointed to nowhere. Fortunately, I had access to a set of laboratory systems (AWS/Azure/Google Cloud/etc. is great for this - both from the standpoint of setup speed and "ready to go" OS templates). I was also able to borrow some of my customer's NetBackup 7.7 keys to use for the testing.

I typically prefer to work with UNIX/Linux-based systems to host NetBackup. However, my customer is a Windows-based shop. My customer's planned migration was also going to have the new NetBackup domain hosted on a different VLAN from their legacy NetBackup domain. This guided my lab design: I created a cloud-based "lab" configuration using two subnets and two Windows Server 2012 instance-templates. I set up each of my instances with enough storage to host the NetBackup software on one disk and the MSDPs on another disk ...and provisioned each of my test master servers with four CPUs and 16GiB or RAM. This is considerably smaller then both their old and new appliances, but I also wasn't trying to simulate an enterprise outpost's worth of backpup traffic. I also set up a mix of about twenty Windows and Linux instances to act as testing clients (customer is beginning to add Linux systems as virtualization and Linux-based "appliances" have started to creep into their enterprise-stacks).

I set up two very generic NetBackup domains. Into each, I built an MSDP. I also set up a couple of very generic backup policies on the one NetBackup Master Server to backup all of the testing clients to the MSDP. I configured the policies for daily fulls and hourly incrementals, and set up each of the clients to continuously regenerate random data-sets in their filesystems. I let this run for forty-eight hours so that I could get a nice amount of seed-data into the source NBU domain's MSDP.

Note: If you're not familiar with MSDP setup, the SETTLERSOMAN website has a good, generic walkthrough.

After populating the source site's MSDP, I converted from using the MSDP by way of a direct STorage Unit definition (STU) to using it by way of a two-stage Storage Lifecycle Policy (SLP). I configured the SLP to use the source-site MSDP as the stage-one destination in the lifecycle and added the second NBU domain's MSDP as the stage-two destination in the lifecycle. I then seeded the second NBU domain's MSDP with data by executing a full backup of all clients against the SLP.

Note: For a discussion on setting up an AIR-based replication SLP, again, the SETLLERSOMAN website has a good, generic walkthrough.

All of the above is fairly straight-forward and well documented (both within the NBU documentation and sites like SETTLERSOMAN). However, it only addresses the issue of how you get newly-generated data from one NBU domain's MSDP to another's. Getting older data from an existing MSDP to a new MSDP is a bit more involved ...and not for the command-line phobic (or, in my case, PowerShell-phobic.)

At a high level, what you do is:
  1. Use the `bpimmedia` tool to enumerate all of the backup images stored on the source-site's MSDP
  2. Grab only the media-IDs of the enumerated backup images
  3. Feed that list of media-IDs to the `nbreplicate` tool so that it can copy that old data to the new MSDP
Note: The vendor documentation for the `bpimmedia` and  `nbreplicate` tools can be found at the VERITAS website.

When using the `bpimmedia` tool to automate image-ID enumeration, using the `-l` flag puts the output into a script-parsable format. The desired capture-item is the fourth field in all lines that begin 'IMAGE':
  • In UNIX/Linux shell, use an invocation similar to: `bpimmedia -l | awk '/^IMAGE/{print $4}`
  • In PowerShell, use an invocation similar to:`bpimmedia -l | select-string -pattern "IMAGE *" | ForEach-Object { $data = $_ -split " " ; "{0}" -f $data[3] }`
The above output can then be either captured to a file — so that one the `nbreplicate` job can be launched to handle all of the images — or each individual image-ID can be passed to an individual `nbreplicate` job (typically via a command-pipeline in a foreach script). I ended up doing the latter because, even though the documentation indicates that the tool supports specifying an image-file, when executed under PowerShell, `nbreplicate` did not seem to know what to do with said file.

The `nbreplicate` command has several key flags we're interested in for this exercise:
  • -backupid: The backup-identifier captured via the `bpimmedia` tool
  • -cn: The copy-number to replicate — in most circumstances, this should be "1"
  • -rcn: The copy-number to assign to the replicated backup-image — in most circumstances, this should be "1"
  • -slp: the name of the SLP hosted on the destination NetBackup domain
  • -target_sts: the FQDN of the destination storage-server (use `nbemmcmd -listhosts` to verify names - or the replication jobs will fail with a status 191, sub-status 174)
  • -target_user: the username of a user that has administrative rights to the destination storage-server
  • -target_user: the password of the the -target_user username
 If you don't care about minimizing the number of replication operations, this can all be put together similar to the following:
  • For Unix:
    for ID in $(bpimmedia -l | awk '/^IMAGE/{print $4}')
    do
       nbreplicate -backupid ${ID} -cn 1 -slp_name <REMOTE_SLP_NAME> \
         -target_sts <REMOTE_STORAGE_SERVER> -target_user <REMOTE_USER> \
         -target_pwd <REMOTE_USER_PASSWORD>
    done
    
  • For Windows:
    @(bpimmedia -l | select-string -pattern "IMAGE *" | \
       ForEach-Object { $data = $_ -split " " ; "{0}" -f $data[3] }) | \
       ForEach-Object { nbreplicate -backupid $_ -cn 1 \
         -slp_name <REMOTE_SLP_NAME> -target_sts <REMOTE_STORAGE_SERVER> \
         -target_user <REMOTE_USER> -target_pwd <REMOTE_USER_PASSWORD> }