Titular Discrepancy: authentication

Showing posts with label authentication. Show all posts

Wednesday, July 3, 2024

Implementing (Psuedo) Profiles in Git (Part 2!)

As noted in my first Implementing (Psuedo) Profiles in Git post:

I'm an automation consultant for an IT contracting company. Using git is a daily part of my work-life. … Then things started shifting, a bit. Some customers wanted me to use my corporate email address as my ID. Annoying, but not an especially big deal, by itself. Then, some wanted me to use their privately-hosted repositories and wanted me to use identities issued by them.

This led me down a path of setting up multiple git "profiles" that I captured into my first article on this topic. To better support such per-project identities, it's also a good habit to use per-project authentication methods. I generally prefer to do git-over-SSH – rather than git-over-http(s) – when interfacing with remote Git repositories. Because I don't like having to keep re-entering my password, I use an SSH-agent to manage my keys. When one only has one or two projects they regularly interface with, this means a key-agent that is only needing to store a couple of authentication-keys.

Unfortunately, if you have more than one key in your SSH agent, when you attempt to connect to a remote SSH service, the agent will iteratively-present keys until the remote accepts one of them. If you've got three or more keys in your agent, the agent could present 3+ keys to the remote SSH server. By itself, this isn't a problem: the remote logs the earlier-presented keys as an authentication failure, but otherwise let you go about your business. However, if the remote SSH server is hardened, it very likely will be configured to lock your account after the third authentication-failure. As such, if you've got four or more keys in your agent and the remote requires a key that your agent doesn't present in the first three autentication-attempts, you'll find your account for that remote SSH service getting locked out.

What to do? Launch multiple ssh-agent instantiations.

Unfortunately, without modifying the default behavior, when you invoke the ssh-agent service, it will create a (semi) randomly-named UNIX domain-socket to listen for requests on. If you've only got a single ssh-agent instance running, this is a non-problem. If you've got multiple, particularly if you're using a tool like direnv, setting up your SSH_AUTH_SOCKET in your .envrc files is problematic if you don't have predictably-named socket-paths.

How to solve this conundrum? Well, I finally got tired of, every time I rebooted my dev-console, having to run `eval $( ssh-agent )` in per-project Xterms. So, I started googling and ultimately just dug through the man page for ssh-agent. In doing the latter, I found:

DESCRIPTION
     ssh-agent is a program to hold private keys used for public key authentication.  Through use of environment variables the
     agent can be located and automatically used for authentication when logging in to other machines using ssh(1).

     The options are as follows:

     -a bind_address
             Bind the agent to the UNIX-domain socket bind_address.  The default is $TMPDIR/ssh-XXXXXXXXXX/agent..

So, now I can add appropriate command-aliases to my bash profile(s) (which I've already moved to ~/.bash_profile.d/<PROJECT>) that can be referenced based on where in my dev-console's filesystem hierachy I am and can set up my .envrcs, too. Result: if I'm in <CUSTOMER_1>/<PROJECT>/<TREE>, I get attached to an ssh-agent set up for that customer's project(s); if I'm in <CUSTOMER_2>/<PROJECT>/<TREE>, I get attached to an ssh-agent set up for that customer's project(s); etc.. For example:

$ cd ~/GIT/work/Customer_1/
direnv: loading ~/GIT/work/Customer_1/.envrc
direnv: export +AWS_CONFIG_FILE +AWS_DEFAULT_PROFILE +AWS_DEFAULT_REGION ~SSH_AUTH_SOCK
ferric@fountain:~/GIT/work/Customer_1 $ ssh-add -l
3072 SHA256:oMI+47EiStyAGIPnMfTRNliWrftKBIxMfzwYuxspy2E SH512-signed RSAv2 key for Customer 1's Project (RSA)

Tuesday, October 2, 2018

S3 And Impacts of Using an IAM Role Instead of an IAM User

I work for a group that manages a number of AWS accounts. To keep things somewhat simpler from a user-management perspective, we use IAM roles to access the various accounts rather than IAM users. This means that we can manage users via Active Directory such that, as new team members are added, existing team members' responsibilities change or team members leave, all we have to do is update their AD user-objects and the breadth and depth of their access-privileges are changed.

However, the use of IAM Roles isn't without it's limitations. Role-based users' accesses are granted by way of ephemeral tokens. Tokens last, at most, 3600 seconds. If you're a GUI user, it's kind of annoying because it means you that across an 8+ hour work-session, you'll be prompted to refresh your tokens at least seven times (on top of the initial login). If you're working mostly via the CLI, you won't necessarily notice things as each command you run silently refreshes your token.

I use the caveat "won't necessarily notice" because there are places where using an IAM Role can bite you in the ass. Worse, it will bite you in the ass in a way that may leave you scratching your head wondering "WTF isn't this working as I expected".

The most recent adventure in "WTF isn't this working as I expected" was in the context of trying to use S3 pre-signed URLs. If you haven't used them before, pre-singed URLs allow you to provide temporary access to S3-hosted objects that you otherwise want to not make anonymously-available. In the grand scheme of things, one can do one of the following to provide access to S3-hosted data for automated tools:

Public-read object hosted in a public S3 bucket: this is a great way to end up in new stories about accidental data leaks
Public-read object hosted in a private S3 bucket: you're still providing unfettered access to a specific S3-hosted object/object-set, but some rando can't simply explore the hosting S3 bucket for interesting data. You can still end up in the newspapers, but, the scope of the damage is likely to be much smaller
Private-read object hosted in a private S3 bucket: the least-likely to get you in the newspapers, but requires some additional "magic" to allow your automation access to S3-hosted files:

IAM User Credentials stored on the EC2 instance requiring access to the S3-hosted files: a good method, right until someone compromises that EC2 instance and steals the credentials. Then, the attacker has access to everything those credentials had access to (until such time that you discover the breach, and have deactivated or changed the credentials)
IAM Instance-role: a good way of providing broad-scope access to S3 buckets to an instance or group of instances sharing a role. Note that, absent some additional configuration trickery, every process running on an enabled instance has access to everything that the Instance-role provides access to. Thus, probably not a good choice for systems that allow interactive logins or that run more than one, attackable service.
Pre-signed URLs: a way to provide fine-grained, temporary access to S3-hosted objects. Primary down-fall is there's significant overhead in setting up access to a large collection of files or providing continuing access to said files. Thus, likely suitable for providing basic CloudFormation EC2 access to configuration files, but not if the EC2s are going to need ongoing access to said files (as they would in, say, an AutoScaling Group type of use-case)

There's likely more access methods one can play with - each with their own security, granularity and overhead tradeoffs. The above are simply the ones I've used.

The automation I write tends to include a degree of user-customizable content. Which is to say, I write my automation to take care of 90% or so of a given service's configuration, then hand the automation off to others to use as they see fit. To help prevent the need for significant code-divergence in these specific use-cases, my automation generally allows a user to specify the ingestion of configuration-specification files and/or secondary configuration scripts. These files or scripts often contain data that you wouldn't want to put in a public GitHub repository. Thus, I generally recommend to the automation's users "put that data someplace safe - here's methods for doing it relatively safely via S3".

Circling back so that the title of this post makes sense... Typically, I recommend the use of pre-signed URLs for automation-users that want to provide secure access to these once-in-an-instance-lifecycle files. Pre-signed URLs' access-granting can be as little as a couple seconds to as much as seven days. Without specifying a desired time, the granted-access is granted for two hours.

However, that degree of flexibility depends greatly on what type of IAM object is creating the ephemeral access-grant. A standard IAM user can grant with all of the previously mentioned time-flexibility. An IAM role, however, is constrained to however long their currently-active token is good for. Thus, if executing from the CLI using an IAM role the grantable lifetime for a pre-signed is 0-3600 seconds.

If you're confused whether the presigned URL you've created/were given was from an IAM user or Role, look for the presence of X-Amz-* in the URL. If you see any such elements, it was generate by an IAM Role and will only last up to 3600 seconds.

Tuesday, August 7, 2012

Why Google's Two-Factor Authentication Is Junk

To be fair, I understand the goals that Google was trying to achieve. And, they're starting down a good path. However, there are some serious flaws (as I see it) with how they've decided to treat services that don't support two-factor authentication.

Google advertises that you can set per-service passwords for each application. That is to say, if you use third-party mail clients such as Thunderbird, third-party calendaring clients such as Lightning, and third-party chat clients such as Trillian, you can set up a password for each service. Conceivably, one could set one password for IMAP, one password for SMTP, one password for iCAL and yet another password for GoogleTalk. However, Google doesn't actually sandbox the passwords. By "sandbox" I mean restrict a given password to a specific protocol. If I generate four passwords with the intention of using each password once for each service - as Google's per-application passwords would logically be inferred to work - one actually weakens the security of the associated services. Instead of each service having its own password, each of the four, generated passwords can be used with any of the four targeted services. Instead of having one guessable password, there are now four guessable passwords.
Google's "per-application" passwords do not allow you to set your password strings. You have to use their password generator. While I can give Google credit for generating 16-character passwords, the strength of the generated passwords is abysmally low. Google's generated passwords are comprised solely of lower case characters. When you go to a "how strong is my password site", Google's generated passwords are ridiculously easy. The Google password is rated at "very weak" - a massive cracking array would take 14 years to break it. By contrast, the password I used on my work systems, last December, is estimated take the best part of 16,000 centuries. For the record, my work password from last year is two characters shorter than the ones Google generates.

So, what you end up with is X number of services that are each authenticatable against with X number of incredibly weak passwords.

All in all, I'd have to rate Google's efforts, at this point, pretty damned close to #FAIL: you have all the inconvenience of two-factor authentication and you actually broaden your attack surfaces if you use anything that's not HTTP/HTTPS based.

Resources:

Tuesday, July 31, 2012

Finding Patterns

I know that most of my posts are of the nature "if you're trying to accomplish 'X' here's a way that you can do it". Unfortunately, this time, my post is more of a, "I've got yet to be solved problems going on". So, there's no magic bullet fix currently available for those with similar problems who found this article. That said, if you are suffering similar problmes, know that you're not alone. Hopefully that's small consolation and what follows may even help you in investigating your own problem.

Oh: if you have suggestions, I'm all ears. Given the nature of my configuration, there's not much in the way of useful information I've yet found via Google. Any help would be much appreciated...

The shop I work for uses Symantec's Veritas NetBackup product to perform backups of physical servers. As part of our effort to make more of the infrastructure tools we use more enterprise-friendly, I opted to leverage NetBackup 7.1's NetBackup Access Control (NBAC) subsystem. On its own, it provides fine-grained rights-delegation and role-based access control. Horse it to Active Directory and you're able to roll out a global backup system with centralized authentication and rights-management. That is, you have all that when things work.

For the past couple months, we've been having issues with one of the dozen NetBackup domains we've deployed into our global enterprise. When I first began trougleshooting the NBAC issues, the authentication and/or authorization failures had always been associated with a corruption of LikeWise's sqlite cachedb files. At the time the issues first cropped up, these corruptions always seemed to coincide with DRS moving the NBU master server from one ESX host to another. It seemed like, when under sufficiently heavy load - the kind of load that would trigger a DRS event - LikeWise didn't respond well to having the system paused and moved. Probably something to do with the sudden apparent time-jump that happens when a VM is paused for the last parts of the DRS action. My "solution" to the problem was to disable automated-relocation for my VM.

This seemed to stabilize things. LikeWise was no longer getting corrupted and seems like I'd been able to stabilize NBAC's authentication and authorization issues. Well, they stabilized for a few weeks.

Unfortunately, the issues have begun to manifest themselves, again, in recent weeks. We've now had enough errors that some patterns are starting to emerge. Basically, it looks like something is horribly bogging the system down around the time that the nbazd crashes are happening. I'd located all the instances of nbazd crashing from its log files ("ACI" events are loged to the /usr/openv/netbackup/logs/nbazd logfiles), and then began to try correlating them with system load shown by the host's sadc collections. I found two things: 1) I probably need to increase my sample frequency - it's currently at the default 10-minute interval - if I want to more-thoroughly pin-down and/or profile the events; 2) when the crashes have happened within a minute or two of an sadc poll, I've found that the corresponding poll was either delayed by a few seconds to a couple minutes or was completely missing. So, something is causing the server to grind to a standstill and nbazd is a casualty of it.

For the sake of thoroughness (and what's likely to have matched on a Google-search and brought you here), what I've found in our logs are messages similar to the following:

/usr/openv/netbackup/logs/nbazd/vxazd.log

07/28/2012 05:11:48 AM VxSS-vxazd ERROR V-18-3004 Error encountered during ACI repository operation.
07/28/2012 05:11:48 AM VxSS-vxazd ERROR V-18-3078 Fatal error encountered. (txn.c:964)
07/28/2012 05:11:48 AM VxSS-vxazd LOG V-18-4204 Server is stopped.
07/30/2012 01:13:31 PM VxSS-vxazd LOG V-18-4201 Server is starting.

/usr/openv/netbackup/logs/nbazd/debug/vxazd_debug.log

07/28/2012 05:11:48 AM Unable to set transaction mode. error = (-1)
07/28/2012 05:11:48 AM SQL error S1000 -- [Sybase][ODBC Driver][SQL Anywhere] Connection was terminated
07/28/2012 05:11:48 AM Database fatal error in transaction, error (-1)
07/30/2012 01:13:31 PM _authd_config.c(205) Conf file path: /usr/openv/netbackup/sec/az/bin/VRTSaz.conf
07/30/2012 01:22:40 PM _authd_config.c(205) Conf file path: /usr/openv/netbackup/sec/az/bin/VRTSaz.conf

Our NBU master servers are hosted on virtual machines. It's a supported configuration and adds a lot of flexibility and resiliency to the overall enterprise-design. It also means that I have some additional metrics available to me to check. Unfortunately, when I checked those metrics, while I saw utilization spikes on the VM, those spikes corresponded to healthy operations of the VM. There weren't any major spikes (or troughs) during the grind-downs. So, to ESX, the VM appeared to be healthy.

At any rate, I've requested our ESX folks see if there might be anything going on on the physical systems hosting my VM that aren't showing up in my VM's individual statistics. I'd previously had to disable automated DRS actions to keep LikeWise from eating itself - those DRS actions wouldn't have been happening had the hosting ESX system not been experiencing loading issues - perhaps whatever was causing those DRS actions is still afflicting this VM.

I've also tagged one of our senior NBU operators to start picking through NBU's logs. I've asked him to look to see if there are any jobs (or combinations of jobs) that are always running during the bog-downs. If it's a scheduling issue (i.e., we're to blame for our problems), we can always reschedule jobs to exert less loading or we can scale up the VM's memory and/or CPU reservations to accommodate such problem jobs.

For now, it's a waiting-game. At least there's an investigation path, now. It's all in finding the patterns.

Wednesday, September 7, 2011

NetBackup with Active Directory Authentication on UNIX Systems

While the specific hosts that I used for this exercise were all RedHat-based, it should work for any UNIX platform that both NetBackup 6.0/6.5/7.0/7.1 and Likewise Open are installed onto.

I'm a big fan of leveraging centralized-authentication services wherever possible. It makes life in a multi-host environment - particularly where hosts can number from the dozens to the thousands - a lot easier when you only have to remember one or two passwords. It's even more valuable in modern security environments where policies require frequent password changes (or, if you've even been through the whole "we've had a security incident, all the passwords on all of the systems and applications need to be changed, immediately" exercise). Over the years, I've used things like NIS, NIS+, LDAP, Kerberos and Active Directory to do my centralized authentication. If your primary platforms are UNIX-based, NIS, NIS+, LDAP and Kerberos have traditionally been relatively straight-forward to set up and use.

I use the caveat of "relatively" because, particularly in the early implementations of each service, things weren't always dead-simple. Right now, we seem to be mid-way through the "easiness" life-cycle of using Active Directory as a centralized authentication source for UNIX operating systems and UNIX-hosted applications. Linux and OSX seem to be leading the charge in the OS space for ease of integration via native tools. There's also a number of third-party vendors out there who provide commercial and free solutions to do it for you, as well. In our enterprise, we chose LikeWise, because, at the time, it was the only free option that also worked reasonably-well with our large and complex Active Directory implementation. Unfortunately, not all of the makers of software that runs on the UNIX hosts seem to have been keeping up on the whole "AD-integration within UNIX operating environment" front.

My latest pain in the ass, in this arena, is Veritas NetBackup. While Symantec likes to tout the value of NetBackup Access Control (NBAC) in a multi-administrator - particularly one where different administrators may have radically different NetBackup skill sets or other differentiating factors - using it in a mixed-platform environment is kind of sucktackular to set up. While modern UNIX systems have the PAM framework to make writing an application's authentication framework relatively trivial, Symantec seems to still be stuck in the pre-PAM era. NBAC's group lookup components appear to still rely on direct consultation of a server's locally-maintained group files rather than just doing a call to the host OS's authentication frameworks.

When I discovered this problem, I opened a support case with Symantec. Unfortunately, their response was "set up a Windows-based authentication broker". My NetBackup environment is almost entirely RedHat-based (actually, unless/until we implement BareMetal Restore (BMR) or other backup modules that require specific OSes be added into the mix, it is entirely RedHat-based). The idea of having to build a Windows server just to act as an authentication broker struck me as a rather stupid way to go about things. It adds yet another server to my environment and, unless I cluster that server, it introduces a single point of failure into and otherwise fairly resilient NetBackup design. I'd designed my NetBackup environment with a virtualized master server (with DRS and SRM supporting it) and multiple media servers for both throughput and redundancy

We already use LikeWise Open to provide AD-base user and group management service for our Linux and Solaris hosts. When I first was running NetBackup through my engineering process, using the old Java auth.conf method for login management worked like a champ. The Java auth.conf-based systems just assumes that any users trying to access the Java UI are users that are managed through /etc/passwd. All you have to do is add the requisite user/rights entries into the auth.conf file and Java treats AD-provided users the same as it treats locally-managed users. Because of this, I suspected that I could work around Symantec's authorization coding lameness.

After a bit of playing around with NBAC, I discovered that, so long as the UNIX group I wanted to map rights to existed in /etc/group, NBAC would see it as a valid, mappable "UNIX PWD" group. I tested by seeing if it would at least let me map the UNIX "wheel" group to one of the NBAC privilege groups. Whereas, even if I could look up the group via getent, if it didn't exist in /etc/group, NBAC would tell me it was an invalid group. Having already verified that a group's presence in /etc/group allowed NBAC to use a group, I proceded to use getent to copy my NetBackup-related groups out of Active Directory and into my /etc/group file (all you have to do is a quick `getent [GROUPNAME] >> /etc/group` and you've populated your /etc/group file).

Unfortunately, I didn't quite have the full groups picture. When I logged in using my AD credentials, I didn't have any of the expected mapped-privileges. I remembered that I'd explicitly emptied the userids from the group entries I'd added to my /etc/group file (I'd actually sed'ed the getents to do it ...can't remembery why, at this point - probably just anticipating the issue of not including userids in the /etc/group file entries). So, I logged out of the Java UI and reran my getent's - this time leaving the userids in place. I logged back into the Java UI and this time I had my mapped privileges. Eureka.

Still I wasn't quite done. I knew that, if I was going to roll this solution into production, I'd have to cron-out a job to keep the getent file up-to date with the changing AD group memberships. I noticed, while nuking my group entry out of getent, that only my userid was on the group line and not every member of the group. Ultimately, tracked it down to LikeWise not doing full group enumeration by default. So, I was going to have to force LikeWise to enumerate the group's membership before running my getent's.

I proceded to dig around in /opt/likewise/bin for likely candidates for forcing the enumeration. After trying several lw*group* commands, I found that doing a `lw-find-group-by-name [ADGROUP]` did the trick. Once that was run, my getent's produced fully-populated entries in my /etc/group file. I was then able to map rights to various AD groups and wrote a cron script to take care keeping my /etc/group file in sync with Active Directory.

In other words, I was able to get NBAC to work with Active Directory in an all RedHat environment and no need to set up a Windows server just to be an authentication broker. Overall, I was able to create a much lighter-weight, portable solution.

Thursday, May 5, 2011

Linux Active Directory Integration and PAM

Previously, I've written about using LikeWise to provide Active Directory integration to Linux and Solaris hosts. One of the down sides of LikeWise (and several other similar integration tools) is that it tends to make it such that, if a user has an account in Active Directory, they can log into the UNIX or Linux boxes you've bound to your domain. In fact, while walking someone through setting up LikeWise with the automated configuration scripts I'd written, that person asked, "you mean anyone with an AD account can log in?"

Now, this had occurred to me when I was testing the package for the engineer who was productizing LikeWise for our enterprise build. But, it hadn't really been a priority, at the time. Unfortunately, when someone who isn't necessarily a "security first" kind of person hits you with that question/observation, you know that the folks for whom security is more of a "Job #1" are eventually going to come for you (even if you weren't the one who was responsible for engineering the solution). Besides, I had other priorities to take care of.

This week was a semi-slack week at work. There was some kind of organizational get-together going on that had most of the IT folks out of town discussing global information technology strategies. Fortunately, I'd not had to take part in that event. So, I've spent the week revisiting some stuff I'd rolled out (or been part of the rollout of) but wasn't completely happy with. The "AD integration giving everyone access" thing was one of them. So, I began by consulting the almighty Google. When I'd found stuff I that seemed promising, I fired up a test VM and started trying it out.

Now, SSH (and several other services) weren't really a problem. Many applications allow you to internally regulate who can use the service. For example, with OpenSSH, you can modify the sshd_config file to explicitly define which users and groups can and cannot access your box through that service (for those of you who hit this page looking for tips, do a `man sshd_config` and grep for AllowUsers and AllowGroups for more in-depth information). Unfortunately, it's predictable enough to figure that people that are gonna whine about AD integration giving away the farm are gonna bitch if you tell them they have to modify the configuration of each and ever service they want to protect. No, most people want to be able to go to one place and take care of things with on action or one set of consistent actions. I can't blame them: I feel the same way. Everyone wants things done easily. Part of "easily" generally implies "consistently" and/or "in one place".

Fortunately, any good UNIX or Linux implementation leverages the Pluggable Authentication Management system (aka. PAM). There's about a bazillion different PAM modules out there that allow you to configure any given service's authentication to do or test a similar variety of attributes. My assumption for solving this particular issue was that, while there might be dozens or hundreds of groups (and thousands of users) in an Active Directory forrest, one would only want to grant a very few groups access to an AD-bound UNIX/Linux host. So, I wasn't looking for something that made it easy to grants lots of groups access in one swell-foop. In fact, I was kind of looking for things that made that not an easy thing to do (after all, why lock stuff down if you're just going to blast it back open, again?). I was also looking for something that I could fairly reliably find on generic PAM implementations. The pam_succeed_if is just about tailor made for those requirements.

LikeWise (and the other AD integration methods) add entries into your PAM system to allow users allowed by those authentication subsystems to login, pretty much, unconditionally. Unfortunately, those PAM modules don't often include methods for controlling which users are able to login once their AD authentication has succeeded. Since the PAM system uses a stackable authentication module, you can insert access controls earlier into the stack to cause a user access to fail out earlier than the AD module would otherwise grant the access. If you wanted to be able to allow users in AD_Group1 and AD_Group2 to log in, but not other groups, you'd modify your pam stack to insert the control ahead of the AD allow module.

     account    [default=ignore success=1] pam_succeed_if.so user ingroup AD_Group1 quiet_success
     account    [default=ignore success=1] pam_succeed_if.so user ingroup AD_Group2 quiet_success
     account    [default=bad success=ignore] pam_succeed_if.so user ingroup wheel quiet_success
     account    sufficient    /lib/security/pam_lsass.so

The above is processed such that if a user is a member of the AD-managed group "AD_Group1" or "AD_Group2", it sets the test's success flag to true. If the user isn't a member of those two groups, testing falls through to the next group check - is the user a member of the group wheel (if yes, fall through to the next test; if no, then there's a failure and the user's access is denied). Downside of using this particular PAM module is that it's only availble to you on *N*X systems with a plethora of PAM modules. This is true for many Linux releases - and I know it to be part of RedHat-related releases - but probably won't be available on less PAM-rich *N*X systems (yet one more reason to cast Solaris on the dung-heap, frankly). If your particular *N*X system doesn't have it, you can probably find the sourcecode for it and create yourself the requisite model for your OS.