Friday, January 3, 2025

SSH Problems When Using `sysadm_u` SELinux Confinement

Decided to be proactive on the security-setup for my one project. Opted to confine my default-user to sysadm_u. However, as soon as I did that, I stopped being able to ssh into the resulting EC2 as the default-user. Turns out there's a bit more requirede in order to use that confinement with a user that also needs to be able to SSH into the host.

For those reading and who don't have a Red Hat login, if I want to confine a user to the to sysadm_u, I also need to ensure that my system-configuration automation includes:

setsebool ssh_sysadm_login on
setsebool -P ssh_sysadm_login on
Without the above, doing an ssh -v to the target user will show a spurious:
Authenticated to 0.0.0.0 ([127.0.0.1]:22) using "publickey".
debug1: pkcs11_del_provider: called, provider_id = (null)
debug1: channel 0: new [client-session]
debug1: Requesting no-more-sessions@openssh.com
debug1: Entering interactive session.
debug1: pledge: filesystem full
client_loop: send disconnect: Broken pipe

The `pledge: filesystem full` kinda threw me, at first, since I knew that neither my local nor my remote filesystem was full. So, I assumed that it was just a misleading error message (as seems to so often be the case when SELinux is involved). So, I searched for ssh login problems associated with the selected SELinux-confinement, which led me to the previously-linked Red Hat article.

I guess that's why the hardening guidelines show `staff_u` as the recommended confinement for administrator users?

 Ultimately, I opted to use  `staff_u`, instead. Having a cloud-config block like:

user: {
  gecos: "GitLab Provisioning-Account (LOCAL)",
  name: "${PROVISIONING_USER}",
  selinux_user: 'staff_u',
  sudo: ['ALL=(root) TYPE=sysadm_t ROLE=sysadm_r NOPASSWD:ALL']
}
Ensuring to have ROLE and TYPE SELinux transition-mappings defined for my default-user eliminates the confusion that can result when confining a user to staff_u and not supplying a mapping. Without the mapping, if an confined admin-user executes `sudo -i`, they get all sorts of unexpected `permission denied` errors.

Wednesday, December 18, 2024

Proving a Point: Git-over-SSH Edition

I'm in the process of helping a customer migrate from an on-premises GitLab configuration to an AWS-hosted configuration. The on-premises GitLab is hosted on CentOS 7. The cloud-hosted GitLab will be hosted on Rocky or Alma 9. 

Right now, the customer is doing User Acceptance Testing (UAT). They're running into some issues making legacy projects' repositories and associated automations work with the new GitLab service. One of these problems is that their developers are using keys in excess of five years' age. The OpenSSH server in CentOS 7 is elderly and, as a result, had been ok using these keys. However, because the maintainers of OpenSSH deprecated the use of SHA1-signed RSAv2 keys some time ago (and the OpenSSH versions in EL 8&9 and derivatives updated the shipped OpenSSH server version), the OpenSSH server in RHEL 9-derived distros just is not having it with these elderly keys.

The developer was suspicious of my claim that this was the source of their git-over-ssh problems. I needed to be able to prove things to them, but I haven't had a Linux host capable of creating SHA1-signed RSAv2 SSH keys in quite a while. So, "what to do"?

Turns out, "Docker to the rescue". Not wanting to dick with the overhead of writing a Dockerfile, I simply did it interactively:

  1. Login to Dockerhub …to reduce the likelihood of getting errors around too many anonymous fetch-attempts in a given timespan
  2. Launch an interactive CentOS 6 container with a volume attached so that I could easily save out any generated keys:
    $ docker run -it -v $( pwd ):/save_dir --entrypoint /bin/bash centos:6
  3. Install the openssh-clients RPM:
    # yum install -y --disablerepo=* --enablerepo=C6.9* openssh-clients
  4. Generate a suitable key:
    # ssh-keygen \
        -t rsa \
        -b 2048 \
        -C "SHA1-signed key generated on CentOS 6" \
        -f /save_dir/id_rsa_sha1-signed
  5. Exit from the running container (<CTRL>-D suffices)
  6. Fix any ownership/permission problems on the new files
  7. Register the new key with GitLab
  8. Attempt a git clone using the SSH URL and ensure that I'm using the newly-generated key
    $ git clone \
        -c core.sshCommand="/usr/bin/ssh -i $( pwd )/id_rsa_sha1-signed" \
        git@<GIT_SERVER_FQD>:<REPOSITORY_PATH>
  9. Receive an error like:
    Cloning into '<REPO_NAME>'...
    git@<GIT_SERVER_FQD>: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
    fatal: Could not read from remote repository.
    
    Please make sure you have the correct access rights
    and the repository exists.
    
    
  10. Create a SHA2-signed RSAv2 key like:
    $ ssh-keygen -t rsa-sha2-512
  11. Register the SHA2-signed key to GitLab
  12. Attempt the same `git clone …`operation (minus the `core.sshCommand` stuff) that had peviously failed with the SHA1-signed RSAv2 key.
  13. This time, the clone operation succeeds

Wednesday, August 28, 2024

Getting the Most Out of EC2 Transfers With Private S3 Endpoints

Recently, I was given a project to help a customer migrate an on-premises GitLab installation into AWS. The current GitLab was pretty large: a full export of the configuration was nearly 500GiB in size.

It turned out a good chunk of that 500GiB was due to disk-hosted artifacts and LFS objects. Since I was putting it all into AWS, I opted to make use of GitLab's ability to store BLOBs in S3. Ultimately, that turned out to be nearly 8,000 LFS objects and nearly 150,000 artifacts (plus several hundred "uploads").

The first challenge was getting the on-premises data into my EC2. Customer didn't want to give me access to their on-premises network, so I needed to have them generate the export TAR-file and upload it to S3. Once in S3, I needed to get it into an EC2.

Wanting to make sure that the S3→EC2 task was as quick as possible, I selected an instance-type rated to 12.5Gbps of network bandwidth and 10Gbps of EBS bandwidth. However, my first attempt at downloading the TAR-file from S3 took nearly an hour to run: it was barely creeping along at 120MiB/s. Abysmal.

I broke out `iostat` and found that my target EBS was reporting 100% utilization and a bit less than 125MiB/s of average throughput. That seemed "off" to me, so I looked at the EBS. Was then that I noticed that the default volume-throughput was only 125MiB/s. So, I upped the setting to its maximum: 1000MiB/s. I re-ran the transfer only to find that, while the transfer-speed had improved, it had only improved to a shade under 150MiB/s. Still abysmal.

So, I started rifling through the AWS documentation to see what CLI settings I could change to improve things. First mods were:

max_concurrent_requests = 40
multipart_chunksize = 10MB
multipart_threshold = 10MB

This didn't really make much difference. `iostat` was showing really variable utilization-numbers, but mostly that my target-disk was all but idle. Similarly, `netstat` was showing only a handful of simultaneous-streams between my EC2 and S3.

Contacted AWS support. They let me know that S3 multi-part upload and download was limited to 10,0000 chunks. So, I did the math (<FILE_SIZE> / <MAX_CHUNKS>) and changed the above to:

max_concurrent_requests = 40
multipart_chunksize = 55MB
multipart_threshold = 64MB

This time, the transfers were running about 220-250MiB/s. While that was a 46% throughput increase, it was still abysmal. While `netstat` was finally showing the expected number of simultaneous connections, my `iostat` was still saying that my EBS was mostly idle.

Reached back out to AWS support. They had the further suggestion of adding:

preferred_transfer_client = crt
target_bandwidth = 10GB/s

To my S3 configuration. Re-ran my test and was getting ≈990MiB/s of continuous throughput for the transfer! This knocked the transfer speed down from fifty-five minutes to a shade over eight minutes. In other words, I was going to be able to knock nearly an hour off the upcoming migration-task.

In digging back through the documentation, it seems that, when one doesn't specify a preferred_transfer_client value, the CLI will select the `classic` (`python`) client. And, depending on your Python version, the performance ranges from merely-horrible to ungodly-bad: using RHEL 9 for my EC2, it was pretty freaking bad, but had been less-bad when using AWS for my EC2's OS. Presumably a difference in the two distro's respective Python versions?

Specifying a preferred_transfer_client value of `crt` (C run-time client) unleashed the full might and fury of my EC2's and GP3's capabilities.

Interestingly, this "use 'classic'" behavior isn't a universal auto-selection. If you've selected an EC2 with any of the instance-types:

  • p4d.24xlarge
  • p4de.24xlarge
  • p5.48xlarge
  • trn1n.32xlarge
  • trn1.32xlarge

The auto-selection gets you `crt`. Not sure why `crt` isn't the auto-selected value for Nitro-based instance-types. But, "it's what it's". 

Side note: just selecting `crt` probably wouldn't have completely roided-out the transfer. I assume the further setting of `target_bandwidth` to `10GB/s` probably fully-unleashed things. There definitely wasn't much bandwidth leftover for me to actually monitor the transfer. I assume that the `target_bandwidth` parameter has a default value that's less than "all the bandwidth". However, I didn't actually bother to verify that.

Update: 

After asking support "why isn't `crt` the default for more instance-types", I got back the reply:

Thank you for your response. I see that these particular P5, P4d and Trn1 instances are purpose built for high-performance ML training1. Hence I assume the throughput needed for this ML applications needs to high and CRT is auto enabled for these instance types.

Currently, the CRT transfer client does not support all of the functionality available in the classic transfer client.

These are few limitations for CRT configurations2:

  • Region redirects - Transfers fail for requests sent to a region that does not match the region of the targeted S3 bucket.
  • max_concurrent_requests, max_queue_size, multipart_threshold, and max_bandwidth configuration values - Ignores these configuration values.
  • S3 to S3 copies - Falls back to using the classic transfer client


All of which is to say that, once I set `preferred_transfer_client = crt` all of my other, prior settings got ignored.

Wednesday, July 3, 2024

Implementing (Psuedo) Profiles in Git (Part 2!)

 As noted in my first Implementing (Psuedo) Profiles in Git post:


I'm an automation consultant for an IT contracting company. Using git is a daily part of my work-life. … Then things started shifting, a bit. Some customers wanted me to use my corporate email address as my ID. Annoying, but not an especially big deal, by itself. Then, some wanted me to use their privately-hosted repositories and wanted me to use identities issued by them.

This led me down a path of setting up multiple git "profiles" that I captured into my first article on this topic. To better support such per-project identities, it's also a good habit to use per-project authentication methods. I generally prefer to do git-over-SSH – rather than git-over-http(s) – when interfacing with remote Git repositories. Because I don't like having to keep re-entering my password, I use an SSH-agent to manage my keys. When one only has one or two projects they regularly interface with, this means a key-agent that is only needing to store a couple of authentication-keys.

Unfortunately, if you have more than one key in your SSH agent, when you attempt to connect to a remote SSH service, the agent will iteratively-present keys until the remote accepts one of them. If you've got three or more keys in your agent, the agent could present 3+ keys to the remote SSH server. By itself, this isn't a problem: the remote logs the earlier-presented keys as an authentication failure, but otherwise let you go about your business. However, if the remote SSH server is hardened, it very likely will be configured to lock your account after the third authentication-failure. As such, if you've got four or more keys in your agent and the remote requires a key that your agent doesn't present in the first three autentication-attempts, you'll find your account for that remote SSH service getting locked out.

What to do? Launch multiple ssh-agent instantiations.

Unfortunately, without modifying the default behavior, when you invoke the ssh-agent service, it will create a (semi) randomly-named UNIX domain-socket to listen for requests on. If you've only got a single ssh-agent instance running, this is a non-problem. If you've got multiple, particularly if you're using a tool like direnv, setting up your SSH_AUTH_SOCKET in your .envrc files is problematic if you don't have predictably-named socket-paths.

How to solve this conundrum? Well, I finally got tired of, every time I rebooted my dev-console, having to run `eval $( ssh-agent )` in per-project Xterms. So, I started googling and ultimately just dug through the man page for ssh-agent. In doing the latter, I found:

DESCRIPTION
     ssh-agent is a program to hold private keys used for public key authentication.  Through use of environment variables the
     agent can be located and automatically used for authentication when logging in to other machines using ssh(1).

     The options are as follows:

     -a bind_address
             Bind the agent to the UNIX-domain socket bind_address.  The default is $TMPDIR/ssh-XXXXXXXXXX/agent..

So, now I can add appropriate command-aliases to my bash profile(s) (which I've already moved to ~/.bash_profile.d/<PROJECT>)  that can be referenced based on where in my dev-console's filesystem hierachy I am and can set up my .envrcs, too. Result: if I'm in <CUSTOMER_1>/<PROJECT>/<TREE>, I get attached to an ssh-agent set up for that customer's project(s); if I'm in <CUSTOMER_2>/<PROJECT>/<TREE>, I get attached to an ssh-agent set up for that customer's project(s); etc.. For example:

$ cd ~/GIT/work/Customer_1/
direnv: loading ~/GIT/work/Customer_1/.envrc
direnv: export +AWS_CONFIG_FILE +AWS_DEFAULT_PROFILE +AWS_DEFAULT_REGION ~SSH_AUTH_SOCK
ferric@fountain:~/GIT/work/Customer_1 $ ssh-add -l
3072 SHA256:oMI+47EiStyAGIPnMfTRNliWrftKBIxMfzwYuxspy2E SH512-signed RSAv2 key for Customer 1's Project (RSA)

Thursday, June 20, 2024

Keeping It Clean: EKS and `kubectl` Configuration

Previously, I was worried about, "how do I make it so that kubectl can talk to my EKS clusters".  However, after several days of standing up and tearing down EKS clusters across a several accounts, I discovered that my ~/.kube/config file had absolutely exploded in size and its manageability reduced to all but zero. And, while aws eks update-kubeconfig --name <CLUSTER_NAME> is great, its lack of a `--delete` suboption is kind of horrible when you want or need to clean out long-since-deleted clusters from your environment. So, onto "next best thing", I guess…

Ultimately, that "next best thing" was setting a KUBECONFIG environment-variable as part of my configuration/setup tasks (e.g., something like `export KUBECONFIG=${HOME}/.kube/config.d/MyAccount.conf`). While not as good as I'd like to think a `aws eks update-kubeconfig --name <CLUSTER_NAME> --delete would be, it at least means that:

  1. Each AWS account's EKS's configuration-stanzas are kept wholly separate from each other
  2. Reduces cleanup to simply overwriting – or straight up nuking – per-account ${HOME}/.kube/config.d/MyAccount.conf files

…I tend to like to keep my stuff "tidy". This kind of configuration-separation facilitates scratching that (OCDish) itch. 

The above is derived, in part, from the Organizing Cluster Access Using kubeconfig Files document

Monday, June 17, 2024

Crib Notes: Accessing EKS Cluster with `kubectl`

While AWS does provide a CLI tool – eksctl –for talking to EKS resources, it's not suitable for all Kubernetes actions one might wish to engage in. Instead, one must use the more-generic access provided through the more-broadly used tool, kubectl. Both tools will generally be needed, however.

If, like me, your AWS resources are only reachable through IAM roles – rather than IAM user credentials – it will be necessary to use the AWS CLI tool's eks update-kubeconfig subcommand. The general setup workflow will look like:

  1. Set up your profile definition(s)
  2. Use the AWS CLI's sso login to authenticate your CLI into AWS (e.g., `aws sso login --no-browser`)
  3. Verify that you've successfully logged in to your target IAM role (e.g., `aws sts get-caller-identity` …or any AWS command, really)
  4. Use the AWS CLI to update your ~/.kube/config file with the `eks update-kubeconfig` subcommand (e.g., `aws eks update-kubeconfig --name thjones2-test-01`)
  5. Validate that you're able to execute kubectl commands and get back the kind of data that you expect to get (e.g., `kubectl get pods --all-namespaces` to get a list of all running pods in all namespaces within the target EKS cluster)

Thursday, May 16, 2024

So You Work in Private VPCs and Want CLI Access to Your Linux EC2s?

 Most of the AWS projects I work on, both currently and historically, have deployed most, if not all, of their EC2s into private VPC subnets. This means that, if one wants to be able directly login to their Linux EC2s' interactive shells, they're out of luck. Historically, to get something akin to direct access one had to set up bastion-hosts in a public VPC subnet, and then jump through to the EC2s one actually wanted to login to. How well one secured those bastion-hosts could make-or-break how well-isolated their private VPC subnets – and associated resources – were.

If you were the sole administrator or part of a small team, or were part of an arbitrary-sized administration-group that all worked from a common network (i.e., from behind a corporate firewall or through a corporate VPN), keeping a bastion-host secure was fairly easy. All you had to do was set up a security-group that allowed only SSH connections and only allowed them from one or a few source IP addresses (e.g. your corporate firewall's outbound NAT IP address). For a bit of extra security, one could eve prohibit password-based logins on the Linux bastions (instead, using SSH key-based login, SmartCards, etc. for authenticating logins). However, if you were a member of a team of non-trivial size and your team members were geographically-distributed, maintaining whitelists to protect bastion-hosts could become painful. That painfulness would be magnified if that distributed team's members were either frequently changing-up their work locations or were coming from locations where their outbound IP address would change with any degree of frequency (e.g., work-from-home staff whose ISPs would frequently change their routers' outbound IPs).

A few years ago, AWS introduced SSM and the ability to tunnel SSH connections through SSM (see the re:Post article for more). With appropriate account-level security-controls, the need for dedicated bastion-hosts and maintenance of whitelists effectively vanished. Instead, all one had to do was:

  • Register an SSH key to the target EC2s' account
  • Set up their local SSH client to allow SSH-over-SSM
  • Then SSH "directly" to their target EC2s

SSM would, effectively, "take care of the rest" …including logging of connections. If one were feeling really enterprising, one could enable key-logging for those SSM-tunneled SSH connections (a good search-engine query should turn up configuration guides; one such guide is toptal's). This would, undoubtedly make your organization's IA team really happy (and may even be required depending on security-requirements your organization is legally-required to adhere to) – especially if they don't yet have an enterprise session-logging tool purchased.

But what if your EC2s are hosting applications that require GUI-based access to set up and/or administer? Generally, you have two choices:

  • X11 display-redirection
  • SSH port-forwarding

Unfortunately, SSM is a fairly low-throughput solution. So, while doing X11 display-redirection from an EC2 in a public VPC subnet may be more than adequately performant, the same cannot be said when done through an SSH-over-SSM tunnel. Doing X11 display-redirection of a remote browser session – or, worse, an entire graphical desktop session (e.g., KDE or Gnome desktops) – is paaaaaainfully slow. For my own tastes, it's uselessly slow. 

Alternately, one can use SSH port-forwarding as part of that SSH-over-SSM session. Then, instead of trying to send rendered graphics over the tunnel, one only sends the pre-rendered data. It's a much lighter traffic load with the result being a much quicker/livelier response. It's also pretty easy to set up. Something like:

ssh -L localhost:8080:$( aws ec2 describe-instances \ --query 'Reservations[].Instances[].PrivateIpAddress' \ --output text \ --instance-ids <EC2_INSTANCE_ID> ):80 <USERID>@<EC2_INSTANCE_ID>

Is all you need. In the above, the argument to the -L flag is saying, "set up a tcp/8080 listener on my local machine and have it forward connections to the remote machine's tcp/80". The local and remote ports can be varied for your specific needs. You can even set up dynamic-forwarding by creating a SOCKS proxy (but this document is meant to be a starting point, not dive into the weeds).

Note that, while the above is using a subshell (via the $( … ) shell-syntax) to snarf the remote EC2's private IP address, one should be able to simply substitute "localhost". I simply prefer to try to speak to the remote's ethernet, rather than loopback, interface, since doing so can help identify firewall-type issues that might interfere with others' use of the target service.

Thursday, March 28, 2024

Mixed Data-Types and Keeping Things Clean

This year, one of the projects I've been assigned to has me assisting a customer in implementing a cloud-monitoring solution for their multi-cloud deployment. The tool uses the various CSPs APIs to monitor the creation/modification/deletion of resources and how those resources are configured.

The tool, itself, is primarily oriented for use and configuration via web UI. However, one can configure it via Terraform. This makes it easier to functionally-clone the monitoring tool's configuration as well a reconstitute it if someone blows it up.

That said, the tool uses Nunjucks and GraphQL to implement some of its rule-elements. Further, most of the data it handles comes in the form of JSON streams. The Nunjucks content, in particular, can be used to parse those JSON streams and static JSON content can be stored within the  monitoring-application. Because Terraform is used for CLI-based configuration, the Terraform resources can consist of pure Terraform code as well as a mix of encapsulated Nunjucks, GraphQL and JSON.

Most of the vendor's demonstration configuration-code has the Nunjucks, GraphQL and JSON contents wholly encapsulated in Terraform resource-definitions. If one wants to lint their configuration-code prior to pushing it into the application, the vendor-offered method for formatting the code can work counter to that. That said, with careful coding, one can separate the content-types from each other and use reference-directives to allow Terraform to do the work of merging it all together. While this may seem more complex, separating the content-types means that each chunk of content is more-easily validated and checked for errors. Rather than blindly hitting "terraform apply" and hoping for the best, you can lint your JSON, Nunjucks and GraphQL separately. This means that, once you've authored all of your initial code and wish to turn it over to someone else to do lifecycle tasks, you can horse it all to a CI workflow that ensures that humans that have edited any given file hasn't introduced content-type violations that can lead to ugly surprises.

Honestly, I have more confidence that the people I turn things over to will know how to massage single content-type files than mixed content-type files. This means I feel like I'm less likely to get confused help requests after I'm moved to another assignment.

Wednesday, October 18, 2023

Crib Notes: Has My EC2's AMI-Publisher Updated Their AMIs

Discovered, late yesterday afternoon, that the automation I'd used to deploy my development-EC2, had had some updates and that these updates (rotating some baked-in query-credentials) were breaking my  development-EC2's ability to use some resources (since they could no longer access those resources). So, today, I'll need to re-deploy my EC2. I figured that, since it'd been since September 14th since I'd launched my EC2, perhaps a new AMI was available (our team publishes new AMIs each month). So, what's a quick way to tell if a new AMI is available? Some nested AWS CLI commands:

aws \
  --output text ec2 describe-images \
  --owner $(
  aws ec2 describe-images \
    --output text \
    --image-ids $(
    aws ec2 describe-instances \
      --output text \
      --query 'Reservations[].Instances[].ImageId' \
      --instance-ids <CURRENT_EC2_ID>  ) \
  --query 'Images[].OwnerId'
) \
  --query 'Images[].{ImageId:ImageId,CreationDate:CreationDate,Name:Name}' | \
sort -nk 1

Sadly, the answer, today, is, "no". I apparently, this month's release-date is going to be later today or tomorrow. So, I'll just be re-deploying from September's AMI.

Friday, October 6, 2023

ACTUALLY Deleting Emails in gSuite/gMail

Each month, I archive all the contents of my main gSuite account to a third-party repository. I do this via an IMAP-based transfer.

Unfortunately, when you use an IMAP-based transfer to move files, Google doesn't actually delete the emails from your gMail/gSuite account. No, it simply removes all labels from them. This means that instead of getting space back – space that Google charges for in gSuite – the messages simply become not-easily-visible within gMail. None of your space is freed up and, thus, space-charges for those unlabeled emails continue to accrue.

Discovered this annoyance a couple years ago when my mail-client was telling me I was getting near the end of my quota. When I first got the quota-warning, I was like, "how??? I've offloaded all my old emails. There's only a month's worth of email in my Inbox, Sent folder and my per-project folders!" That prompted me to dig around to discover the de-labled/not-deleted fuckery. So, I dug around further to find a method for viewing those de-labeled/not-deleted files. Turns out, putting:

-has:userlabels -in:sent -in:chat -in:draft -in:inbox

In your webmail search-bar will show you them …and allow you to delete them.

My gSuite account was a couple years old when I discovered all this. So, when I selected all the unlabeled emails for deletion, it took a while for Google to actually delete them. However, once the deletion completed, I recovered nearly 2GiB worth of space in my gSuite account.