Tuesday, May 28, 2013

Fixing CIFS ACLs On a DataDomain

On my current project, our customer makes use of DataDomain storage to provide nearline backups of system data. Backups to these devices is primarily done through tools like Veeam (for ESX-hosted virtual machines and associated application data) and NetBackup.

However, a small percentage of the tenants my customer hosts are "self backup" tenants. In this case, "self backup" means that, rather than leveraging the enterprise backup frameworks, the tenants are given an NFS or CIFS share directly off of the DataDomain that is: A) closest to their system(s); and, B) has the most free space to accommodate their data.

"Self backup" tenants that use NFS shares are minimally problematic. Most of the backup problems come from the fact that DataDomains weren't really designed for multi-tenancy. Things like quota controls are fairly lacking. So, it's possible for a tennants of a shared DataDomain to screw each other over by either soaking up all of the device's bandwidth or soaking up all the space.

Still, those problems aside, providing NFS service to tenants is fairly straight-forward. You create a directory, you go into the NFS share-export interface, create the share and access controls and call it a day. CIFS shares, on the other hand...

While we'd assumed that providing CIFS service would be on a par to providing NFS service, it's proven to be otherwise. While the DataDomains provide an NTFS-style ACL capability in their filesystem, it hasn't proven to work quite as one might expect.

The interface for creating shares allows you to set share-level access controls based on both calling-host as well as assigned users and/or groups. One would reasonably assume that this would mean that the correct way to set up a share is to export it with appropriate client-allow lists and user/group-allow lists and that the shares would be set with appropriate filesystem permissions automagically. This isn't exactly how it's turned out to work.

What we've discovered is that you pretty much have to set the shares up as being universally accessible from all CIFS clients and that you grant global "full control" access to the top-level share-folder. Normally, this would be a nightmare, but, once created, you can lock the shares down. You just have to manage the NTFS attributes from a Windows-based host. Basically, you create the share, present it to a Windows-based administrative host, then use the Windows folder security tools to modify the permissions on the share (e.g., remove all the "Everyone" rights, then manually assign appropriate appropriate ownerships and posix gropus to the folder and set up the correct DACLs.

From an engineering perspective, it means that you have to document the hell out of things and try your best to train the ops folks on how to do things The Right Way™. Then, with frequent turnovers in Operations and other "shit happens" kind of things, you have to go back and periodically audit configurations for correctness and repair the brokenness that has crept in.

Unfortunately, one of the biggest sources of brokenness that creeps in is broken permissions structures. When doing the initial folder-setup, it's absolutely critical that the person setting up the folder remembers to click the "Replace all child object permissions with inheritable permissions from this object" checkbox (accessed by clicking on the "Change Permissions" button within the "Advanced Security Settings" section for the folder). Failure to do so makes it so that each folder, subfolder and file created (by tenants) in the share have their own, tenant-created permissions structures. What this results in is a share whose permissions are not easily maintainable by the array-operators. Ultimately, it results in trouble tickets opened by tenants whose applications and/or operational folks eventually break access for themselves

Once those tickets come in, there's not much that can be easily done if the person who "owns" the share has left the organization. If you find yourself needing to fix such a situation, you need to either involve DataDomain's support staff to fix it (assuming your environment is reachable via an WebEx-type of support session) or get someone to slip you instructions on how to access the array's "Engineering Mode"

There's actually two engineering modes: there's the regular SE shell and the BASH shell. The SE shell is basically a super-set of the regular system administration CLI. The BASH shell is basically a Linux BASH shell with DataDomain-specific management commands enabled. For the most part, the two modes are interchangable. However, if you need the ability to do mass modifications or script on your array, you'll need to access the DataDomain's BASH shell mode to do it. See my prior article on accessing the DataDomain's BASH shell mode.

Once you've gotten the engineering BASH shell, you have pretty much unfettered access to the guts of the DataDomain. The BASH shell is pretty much the same as you'd encounter on a stock Linux system. Most of the GNU utilities you're used to using will be there and will work the same way they do on Linux. You won't have man pages, so, if you forget flags to a given shell command, look them up on a Linux host that has the man pages installed. In addition to the standard Linux commands will be some DataDomain-specific commands. For the purposes of fixing your NTFS ACL mess, you'll be wanting to use the "dd_xcacls" command:

  • Use "dd_xcacls -O '[DomainObject]' [Object]" to set the Ownership of an object. For example, to set the ownership attribute to your AD domain account, issue the command "dd_xcacls -O 'MDOMAIN\MYUSER' /data/col1/backup/ShareName".
  • Use "dd_xcacls -G '[DomainObject]' [Object]" to set the POSIX group of an object.  For example, to set the POSIX group attribute to your AD domain group, issue the command "dd_xcacls -O 'MDOMAIN\MYUSER' /data/col1/backup/ShareName".
  • Use "dd_xcacls -D '[ActiveDirectorySID]:[Setting]/[ScopeMask]/[RightsList]' [OBJECT]" to set the POSIX group of an object. For example, to give "Full Control" rights to your domain account, issue the command "dd_xcacls -D 'MDOMAIN\MYUSER:ALLOW/4/FullControl' /data/col1/backup/ShareName".

A couple of notes apply to the immediately preceding:

  1. While the "dd_xcacls" command can notionally set rights-inheritance, I've discovered that this isn't 100% reliable in the DDOS 5.1.x family. It will likely be necessary that once you've placed the desired DACLs on the filesystem objects, you'll need to use a Windows system to set/force inheritance onto objects lower in the filesystem hierarchy.
  2. When you set a DACL with "dd_xcacls -D", it replaces whatever DACLS are in place. Any permissions previously on the filesystem object will be removed. If you want more than one user/group DACL applied to the filesystem-object, you need to apply them all at once. Use the ";" token to separate DACLs within the quoted text-argument to the "-D" flag

Because you'll need to fix all of your permissions, one at a time, from this mode, you'll want to use the Linux `find` command to power your use of the  "dd_xcacls" command. On a normal Linux system, when dealing with filesystems that have spaces in directory or file object-names, you'd do something like `find [DIRECTORY] -print0 | xargs -0 [ACTION]` to more efficiently handle this. However, that doesn't seem to work exactly like on a generic Linux system, at least not on the DDOS 5.x systems I've used. Instead, you'll need to use a `find [Directory] -exec [dd_xcacls command-string] {} \;`. This is very slow and resource intensive. On a directory structure with thousands of files, this can take hours to run. Further, because of how resource-intensive using this method is, you won't be able to run more than one such job at a time. Attempting to do so will result in SEGFAULTs - and the more you attempt to run concurrently, the more frequent the SEGFAULTs will be. These SEGFAULTs will cause individual "dd_xcacls" iterations to fail, potentially leaving random filesystem objects permissions unmodified.

Friday, May 24, 2013

DataDomain Bash Shell (or: So You Want to Wreck Your DataDomain)

When dealing with the care and feeding of DataDomain arrays, there are occasions where it helps to know how to access the array's "Engineering Mode". In actuality, there are two levels of engineering mode for DataDomains:

  • SE Shell: the SE (system engineer) shell mode is a superset of the normal system administration shell. It includes all of the management commands of the normal administration shell plus some powerful utilities for doing lower-level maintenance tasks on your DataDomain. These include things like fixing ACLs on your CIFS shares, changing networking settings (e.g., timeouts related to OST sessions) and other knobs that are nice to be able to twizzle
  • BASH Shells: While the SE shell mode gives you more utilities for managing the array, they're still wrapped in the overall DDOS command-shell construct. The BASH shell mode is pretty much just like a normal root shell on a Linux system: you're able to script tasks in it, use tools like `find`, etc. Take all the damage you can do in the SE mode and add on the capability of doing those tasks on a massive, automated scale.
While enabling SE mode can likened to enabling you to shoot your foot off with a .22, the BASH mode could be likened to enabling you to shoot your foot off with a howitzer. Where SE mode is merely dangerous, I can't really begin to characterize the level of risk you expose yourself to when you start taking full advantage of the DataDomain's BASH shell.

Since accessing either of these modes isn't well-documented (though there's a decent number of Google searches that will turn up the basic "SE" mode) and I use this site as a personal-reminder on how to do things. I'm going to put the procedures here.

Please note: use of engineering mode allows you to do major amounts of damage to your data with a frightening degree of ease and rapidity. Don't try to access engineering mode unless you're fully prepared to have to re-install your DataDomain - inclusive of destroying what's left of the data on it.

Accessing SE Mode:
  1. SSH to the DataDomain.
  2. Login with an account that has system administrator privileges (this may be one of the default accounts your array was installed with, a local account you've set up for the purpose or an Active Directory managed account that has been placed into a Active Directory security-group that has been granted the system administrator role on the DataDomain
  3. Get the array's serial number. The easiest way to do this is type `system show serialno` at the default command prompt
  4. Access SE mode by typing `priv set se`. You will be prompted for a password - the password is the serial number from the prior step.
At this point, your command prompt will change to "SE@<ARRAYNAME>" where "<ARRAYNAME>" will be the nodename of your DataDomain. While in this mode, an additional command-set will be enabled. These commands are accessed by typing "se". You can get a further listing of the "se" sub-commands in much the same way you can get help at the normal system administration shell (in this particular case: by typing "se ?").


Accessing the SE BASH Shell:
Once you're in SE mode, the following command-sequence will allow you to access the engineering mode's BASH shell:

  1. Type "fi st"
  2. Type "df"
  3. Type <CTRL>-C three times
  4. Type "shell-escape"
At this point, a warning banner will come up to remind you of the jeopardy you've put your configuration in. The prompt will also change to include a warning. This is DataDomain's way of reminding you, at every step, the danger of the access-level you've entered.

Once you've gotten the engineering BASH shell, you have pretty much unfettered access to the guts of the DataDomain. The BASH shell is pretty much the same as you'd encounter on a stock Linux system. Most of the GNU utilities you're used to using will be there and will work the same way they do on Linux. You won't have man pages, so, if you forget flags to a given shell command, look them up on a Linux host that has the man pages installed.

In addition to the standard Linux commands will be some DataDomain-specific commands. These are the commands that are accessible from the "se" command and its subcommands. The primary use-case for exercising these commands in BASH mode is that the BASH mode is pretty much as fully-scriptable as a root prompt on a normal Linux host. In other words, take all the danger and power of SE mode and wrap it in the sweaty-dynamite of an automated script (you can do a lot of modifications/damage by horsing the se sub-commands to a BASH `find` command or script).