Tuesday, March 24, 2026

Dunno Who To Blame This On

I'm currently working on extending some SaltStack-based automation. I'd initially written it to work on (RHEL) Linux-based hosts. Worked like a champ. I was recently asked to extend the automation to also work on Windows. My most recent "Why I hate Powershell" posts are related to other tasks around this recent request.

At any rate, the approach I've been taking to extending my SaltStack logic is to refactor the Linux logic so as to insert an execution-branch based on the detected "kernel" (what's returned when one does a `salt-call --local grains.get kernel`). Basically, I took my prior "install.sls" file and moved it to "lin_install.sls" and then made a new "install.sls" that looked like:

include:
{%- if grains.kernel == "Linux" %}
  - .lin_install
{%- elif grains.kernel == "Windows" %}
  - .win_install
{%- endif %}

Worked like a champ on Linux; on Windows, "not so much". So, I started debugging the logic.

I primarily write my code on Linux-based development-VMs. I'd chosen the "kernel" SaltStack-grain, as my branching-basis, because I was hoping to head off having to do compound if blocks (in case anyone ever asked, "can you make this support Ubuntu instead of just Enterprise Linux and Windows"). Otherwise, I'd have chosen the "os_family" SaltStack-grain. Prior to running the extended logic on a Windows-based host (Server 2022 for this exercise), I'd simply assumed that the code would work. Imagine my surprise when I executed it on a Server 2022-based EC2 and I was getting error messages that I should only have seen on a missconfigured Linux-based host.

First thing I did was test how saltstack was rendering my  new "install.sls" file. I executed:

& 'C:\Program Files\Salt Project\Salt\salt-call.exe' -c C:\<config_path> \
  slsutil.renderer C:<state_file_path>\install.sls

Interestingly, it returned

local:
    ----------
    include:
        - .win_install

This meant that the Jinja was returning the expected list-element for the include statement. So, why the hell was I getting errors as though the ".lin_install" logic were what was being executed??

Did some digging around. One of the search results I got back indicated that the SaltStack minion for Windows can be flaky when using relative pathing to invoke other SaltStack files. So, I updated my "install.sls" to look like:

include:
{%- if grains.kernel == "Linux" %}
  - <formula_root>.package.lin_install
{%- elif grains.kernel == "Windows" %}
  - <formula_root>.package..win_install
{%- endif %}

This time, when I executed `<formula_root>.package.install`, directly (on my Server 2022-based EC2), it worked like it always should have. The reason I don't know who to blame is that while the problem is in SaltStack, I can't help but think that the problem is resultant of Windows weirdness.

Oh well, at least I have a path forward. 

Wednesday, March 18, 2026

Why, Palo Alto, WHY??

Yesterday, I was doing up some SaltStack based automation to help a customer automate the installation of the Cortex XDR Agent on RHEL-based Linux hosts. The vendor delivers the agent in the form of a ZIP-archived RPM. Yeah, I was a bit unimpressed by them deciding an RPM needed to be encapsulated in a ZIP-archive.

When you read the installation documentation, there's a link in the page they tell you to download. Yesterday, the embedded link was:

https://docs-cortex.paloaltonetworks.com/v/u/cortex-xdr-agent.zip

This URL was actually set up as an HTTP 302 (redirect) to:

https://docs-cortex.paloaltonetworks.com/api/khub/documents/Im1wc74y4HN15mXxBu3nYQ/content?Ft-Calling-App=ft%2Fturnkey-portal&Ft-Calling-App-Version=5.2.49&download=true&locationValue=viewer

SaltStack didn't care for trying to use file.managed to try to download from a redirect. I had to whip up some logic to chase the redirect and stuff it into a (Jinja) variable. Worked well once I got it in place.

Today, I was attempting to continue with the refactoring that I'd started, yesterday. This means launching an EC2 with the new automtion. I was surprised to find that the automation — unaltered since yesterday's day-ending push — was failing. When I checked the logs, SaltStack was complaining that I was trying to pass a null value to file.managed's source parameter. Perplexed, I started troubleshooting.

Ultimately, what I found was that the URL I was doing redirection-chasing on was no longer redirecting. Using curl like:

curl -Ls -o /dev/null 
  -w %{url_effective} "https://docs-cortex.paloaltonetworks.com/v/u/cortex-xdr-agent.zip"

Was returning null. So, I opted to try to make it provide more-definitive output like:

curl -Ls -o /dev/null \
  -w "Status: %{http_code}\nEffective URL: %{url_effective}\n" \
  "https://docs-cortex.paloaltonetworks.com/v/u/cortex-xdr-agent.zip"

This returned:

Status: 200
Effective URL: https://docs-cortex.paloaltonetworks.com/v/u/cortex-xdr-agent.zip

Which is to say, the reference URL is no longer returning an HTTP 302. Thus receiving a null-value when looking for a "url_effective" value. 

So, I revisited the documentation. I scrolled down to the where the document called out the signature-file's link. When I hovered over the link, it's value had changed since yesterday. I guess they pushed out a documentation-portal change and, along with that, nuked the previous URL's redirect action. This meant that the URL I was expecting was now returning a data-stream. Looking at the data-stream's first and last five lines:

curl -Ls https://docs-cortex.paloaltonetworks.com/v/u/cortex-xdr-agent.zip | \
sed -n '1,5p; :a;$p;N;11,$D;ba'

It was obvious from the output that the HTTP data-stream was now sending me JavaScript, presumably substituting the prior HTTP redirect with JS-based navigation-aids. Unfortunately, those kinds of navigation-aids work well enough for graphical browsers but not at all well for curl-type methods. 

None of this would have been necessary if they just maintained a file-repository of the RPMs and their signing-keys. Or, since they were ZIPing up the RPMs, include the damned signing-key in the archive-file. But, no, that would be too fucking easy and way too sensible.

I wish I could say this was surprising. However, I've had to integrate enough tooling — particularly security tooling — that I've gotten sort of used to (especially security) vendors  doing things in absolutely baffling ways. At least this vendor wasn't doing things in ways that required reducing system-security to allow installation of their tool: that is a hallmark of the brain-damage I frequently witness with security vendors' tooling.

Friday, March 13, 2026

I Hate PowerShell (Installment 3783)

So, I know I've skipped a few numbers since my I Hate PowerShell (Installment 3769) post. That post still applies and, yes, there've been several more annoyances encountered since authoring that post.

For the project I'm currently working on, I can only access my remote Windows (Server 2022) hosts using AWS's  FleetManager service. The VPCs are otherwise configured to block access via direct RDP.

At any rate, the first time I login, I fire up PowerShell. I go to type into the resulting "terminal" window and… nothing. The window doesn't accept my keyboard input. Perplexed, I Slack the teammate who asked me to write the automation for the Windows provisioning-tasks that he'd been doing manually. I note my problem to him. He tells me, "install PowerShell 7".

I update the userData I pass as part of launching my EC2 so as to implement that advice. I wait for the userData's execution to complete and then login. Once the remote desktop session becomes fully responsive, I fire up powershell …and find that I am still unable to type. Perplexed, I do some digging around and find that "upgrade to PowerShell 7" doesn't _actually_ upgrade PowerShell, it just installs a second, newer version of PowerShell alongside the system default. If I want to access the newer version, I need to call `pwsh`, instead. So, I do so. The new window opens and, as advertised, I'm able to type into that window.

Now, I'm a person possessed of significant curiousity. As such, my curious brain thinks, "what happens if I call `powershell` from the PS7 window I'm able to type into". To sate my curiosity, I do so. Suddenly, the window I was previously able to type into no longer accepts keyboard input.

So, yeah, if I want to be able to type into windows opened from the default PowerShell version, I need to add some further plumbing to my userData. Specifically, I need to include some logic to update the default PowerShell installation's `PSReadLine` module.

Notionally, I could skip the installation of PS7. However, its installation seems to be the default work-around for this project (still don't know if I'd call that an "anti-pattern" or not). As such, I fear that not having it installed will result in the eventual users of my automation screaming about it not being installed.

At any rate, if I want to be able to type into the default PowerShell version and I want PowerShell 7 available for others' use, my userData payload needs to look like:

# Ensure the default PowerShell (v5.x) windows accept typed-in content
Install-PackageProvider -name NuGet -Force
Install-Module `
  -Name PSReadLine `
  -Repository PSGallery `
  -Force

# Install PowerShell 7
iex "& { $(irm https://aka.ms/install-powershell.ps1) } -UseMSI"

So. Yeah. But, at least my deployment target works. 

Interestingly, I don't seem to run into the "can't type into the window" problem if I use the AWS SSM CLI to login. Or maybe I didn't start trying that route until I was already including the PSReadLine update-juju into my userData payload?

Also interestingly, because the PS7 installation is a parallel installation, if I want to use the  AWS SSM CLI to login directly to a PS7 session, I have to update my CLI-access invocation from:

aws ssm start-session --target <INSTANCE_ID>

To:

aws ssm start-session --target <INSTANCE_ID> \
  --document-name AWS-StartInteractiveCommand \
  --parameters command="pwsh"

I guess the "don't try to change the system-wide PS version" guidance that search-results are showing me are a lot like "don't try to replace the system-default Python version" cautions for Red Hat distros?

Thursday, February 12, 2026

I Hate PowerShell (Installment 3769)

One of my customers has a group of developers. They're also tasked with SRE-style responsibilities as things move from development to production. Since the production environment is significantly locked down, these developers cannot access it from their laptops. As such, they don't have the kind of service-management tooling available to them in case shit goes sideways. To work around this issue, they requested the creation of a "jump box" in the production-environment that would have a reliable set of tools installed. Even though everything in production is either Linux or Kubernetes based, they wanted the "jump box" to be Windows-based. So, one of my peers hand-created a suitable "jump box" for them — installing all of the desired tools and creating the desired, RDP-enabled users.

In general, we don't like to have such hand-made boxes. They tend to be poorly maintained (such that their security auditors become sadder and sadder with the passing of each patching cycle) and, if they're the victim of system-breakage. Further, automating builds makes it much easier to do cost-control since you can better implement an "intantiate as you need it" deployment (or repair/replace) model. As such, they needed someone to automate what had previously been hand-jammed.

Since I'd just come off of a project where I'd delivered a bunch of "hardening" automation, I was the stuckee. I, uh, don't generally work with Windows. I especially don't really do automation for Windows-based systems. So, I accepted the task know that, "fun times be ahead," for me.

Part of the task involved hardening the "jump box" using a framework we'd been pushing our various customers to use for the past decade or so. Since there was already a "bootstrap" (PowerShell) script written for this purpose, I opted to use that script as my starting-point — why (wholly) reinvent the wheel, eh?

That script worked by downloading and executing the hardening framework. Easiest path forward, for me, was to update that script to similarly download and execute a my script.

At any rate, the first-pass at authoring my script was just to install a set of developer-oriented (really, more like "SRE oriented") tools. I wrote it in such a way that the automation user could specify specific versions of the desired tooling (by giving the URLs of the tools they wanted installed). It worked well enough. However, it didn't include the creation of RDP-enabled users. As such, once the "jump box" was built, the developers/SREs couldn't login without someone hand-jamming the creation of local users.

Today, I extended my script to allow the use of an external, JSON-formatted, user-specification file.

I wanted the various users' attributes to be flexible (e.g., the "is an admin user" attribute should allow a value that was any of `true`, `false` or undefined). The version of Powershell on the automation-targets apparently has some kind of "strict" mode set as the default behavior. So, when three of my JSON-file's testing user-definitions didn't have the "is an admin user" attribute defined, Powershell noped-out on that "problem". 

So, I did some digging around (Microsoft documentation, StackExchange, etc.). Figured out how to write the block in a manner that operates properly when the "strict" mode is set.

It was, to me, quite ugly compared to automation I've written for Linux-based hosts. To implement in a way that left the "strict" mode interpreter happy, I implemented my function like:

function Parse-JsonFile {
  # Where to write downloaded user-creation spec-file to
  $UserCreationFile = "${SaveDir}\$(${UserCreationUrl}.split("/")[-1])"

  # Download user-creation spec-file
  Download-File -Url ${UserCreationUrl} -SavePath ${UserCreationFile}

  # Abort if given file-path is not valid
  if ( -not ( Test-Path $UserCreationFile ) ) {
      Write-Error "File not found: $UserCreationFile"
      return
  }

  # Load JSON-payload from file and convert to PS object
  $JsonStream = Get-Content -Raw -Path "${UserCreationFile}" | ConvertFrom-Json

  # The structure has a 'Users' array containing a single object with dynamic keys
  foreach ($userContainer in $JsonStream.Users) {
    # Iterate through each dynamic key (the usernames)
    foreach ($username in $userContainer.psobject.Properties.Name) {
      # Get the array associated with that username
      $userDetails = $userContainer.$username

      foreach ($detail in $userDetails) {
        # Create "full name" attribute to user-creation function
        $FullName = ${detail}.givenName + " " + ${detail}.surname

        # Safely set WantsAdmin in a Strict-mode safe way
        if ( $detail.psobject.Properties.Name -contains "localAdmin" ) {
          $WantsAdmin = ${detail}.localAdmin
        } else {
          $WantsAdmin = $null
        }

        if ( ${WantsAdmin} -eq "true" ) {
          Create-User -UserUidName "$username" `
            -UserFullName ${FullName} `
            -UserPasswd ${detail}.initialPassword `
            -UserIsAdmin
        } else {
          Create-User -UserUidName "$username" `
            -UserFullName ${FullName} `
            -UserPasswd ${detail}.initialPassword
        }
      }
    }
  }
}

I understand that this is far from optimal way to do things. I hate how non-terse it feels like PowerShell makes me write things. However, I am not a PowerShell guy. It works. This is the first pass at it. I'll probably try to improve it in future iterations. 

I still hold out hope that we can convince their to use a Linux-based "jump box" (since, as mentioned early, their production environment is wholly Linux and Kubernetes-based) …At which point, I'll probably be tasked with figuring out how to RDP-enable a box with similar tooling and user-access. 

Tuesday, January 6, 2026

Crib Notes: Removing Un-Tracked "Junk" From Git Repositories

 I have a couple of git-managed projects where the projects' CI-configuration takes documentation-inputs — usually Markdown files — and renders those inputs into other formats (usually HTML for hosting on platforms like Read The Docs. While the documentation-inputs are tracked in git, the rendered outputs are not tracked.

Indeed, they're normally not even generated in contributors' local copies of the GitHub- or GitLab-hosted repositories. At best, the projects are adequately Docker-enabled so as to make it easy to generate "preview" renderings (to save on uploading documentation-updates that have errors and saving the time and resources lost to server-side rendering of the contents).

If one does avail themselves of the "preview" capability, it can leave grumph in the local repository copies. This grumph can lead to non-representative (i.e., "stale") content being previewed. To avoid this, one generally wants to ensure that such "preview" content is cleaned up before the next (local) generation of "preview" content is performed.

The git client provides a nifty little method for performing cleanups of such content. This method is in the form of `git clean`. Unfortunately, running just `git clean` typically won't result in the desired results. One needs to add further flags to it. I've found that, for my use-cases, the most-appropriate/thorough invocation is via `git clean -fdx`. 

 This is also useful if, in the course of doing updating a repository — say, as part of a significant refactor — you find you've done a number of `mv <DIR>{,-OLD}` types of operations (not exactly the "school" method to underpin refactors, but provides an easy path for "before/after" comparisons). Such directories and similar content will also get wiped away by `git clean -fdx`.