Friday, July 19, 2019

Why I Default to The Old Ways

I work with a growing team of automation engineers. Most are purely dev types. Those that have lived in the Operations world, at all, skew heavily towards Windows or only had to very lightly deal with UNIX or Linux.

I, on the other hand, have been using UNIX flavors since 1989. My first Linux system was the result of downloading a distribution from the MIT mirrors in 1992. Result, I have a lot of old habits (seriously: some of my habits are older than some of my teammates). And, because I've had to get deep into the weeds with all of those operating systems many, many, many times, over the years, those habits are pretty entrenched ("learned with blood" and all that rot).

A year or so ago, I'd submitted a PR that included some regex-heavy shell scripts. The person that reviewed the PR had asked "why are you using '[<space><TAB>]*' in your regexes rather than just '\s'?". At the time, I think my response was a semi-glib, "A) old habits die hard; and, B) I know that the former method always works".

That said, I am a lazy-typist. Typing "\s" is a lot fewer keystrokes than is "[<space><TAB>]*". Similarly, "\s" takes up a lot less in the way of column-width than does "[<space><TAB>]*" (and I/we generally like to code to fairly standard page-widths). So, for both laziness reasons and column-conservation reasons, I started to move more towards using "\s" and away from using "[<space><TAB>]*".  I think in the last 12 months, I've moved almost exclusively to  "\s".

Today, that move bit me in the ass. Well, yesterday, actually, because that's when I started receiving reports that the tool I'd authored on EL7 wasn't working when installed/used on EL6. Ultimately, I traced the problem to an `awk` invocation. Specifically, I had a chunk of code (filtering DNS output) that looked like:

awk '/\sIN SRV\s/{ printf("%s;%s\n",$7,$8)}'

Which worked a treat on EL7 but on EL6, "not so much." When I altered it to the older-style invocation:

awk '/[  ]*IN[  ]*SRV[  ]*/{ printf("%s;%s\n",$7,$8)}'

It worked fine on both EL7 and EL6. Turns out the ancient version of `awk` (3.1.7) on EL6 didn't know how to properly interpret the "\s" token. Oddly (my recollection from writing other tooling) is that EL6's version of `grep` understands the "\s" token just fine.

When I Slacked the person I'd had the original conversation with a link to the PR with a "see: this is why" note, he replied, "oh: I never really used awk, so never ran into it".

Wednesday, July 17, 2019

Crib-Notes: Manifest Deltas

Each month, the group I work for publishes new CentOS and Red Hat AMIs (and Azure templates and Vagrant boxes). When we complete the publication-event, we post a news announcement to our user-portal so that subscribers can receive an alert of the new publication. Included in that news announcement is a "what's changed" section.

In prior months, figuring out "what changed" was left as a manual step for the team-member charged with running the automation for a given month's publication event. This month, no one generated that news article and there were several updated and new RPMs included in the new image. So, I set about figuring out "how to extract this information programmatically so as to more-easily suss-out what to include in the announcement posting." The following does so (though, presumably, in a not-particularly-optimized) fashion:
git diff $(
      git log --pretty='%H' --follow -- <PATH_TO_MANIFEST_FILE> | \
      head -2 | \
      tac | \
      sed 'N;s/\n/../'
   ) -- <PATH_TO_MANIFEST_FILE> | \
grep -E '(amazon|aws|ec2)-' | \
sed 's/^./& /' | \
sort -k 2
To explain:
  1. Use `git log` to output the commit-hashes for all the commits for the target file (in this case, the project's manifest-file)
  2. Use `head -2` to grab only the two most-recent commit hashes from the output-stream
  3. Use the `tac` command to invert the order of the two lines returned from the `head` command
  4. Use the `sed` command to join the two lines, replacing the first line's line-ending newline character with ".."
  5. Use `git diff` against the output created in steps 1-4, and constrain the diff-activity to just the manifest-file.
  6. Pipe that output through `grep` to suppress all information other than the bits containing the `amazon-`, `aws-` and `ec2-` substrings.
  7. Pipe that through `sed` so that the +/- that `git diff` uses to show new and removed files, respectively, becomes an easily-tokenized substring.
  8. Sort the remaining output-stream (with `sort`) so that the lines are groups by manifest-element (the second key/token in the sorted output)
Taking that output and converting to a news article is still manual, but it at least makes it a lot easier to do than either hand-diffing two files or having to "just know" what's changed.

Notes

Because Red Hat has placed EL6 is in its final stage of de-support, we've stopped publishing CentOS6 and RHEL6. We did this to discourage our subscribers from doing new deployments on EL6 (since the underlying platform will go into final de-support come November of this year).

Similarly, due to current lack of CentOS offering for EL8, lack of security-related build- or hardening-guidance for EL8 and associated lack of subscriber-demand for an EL8 build, we don't yet include builds for CentOS8 or RHEL8 in our process. Thus, for the time being, we only need to provide a "whats changed" for EL7 builds. Given this, we currently only need to do change-queries against the "manifests/spel-minimal-centos-7-hvm.manifest.txt" file.