Titular Discrepancy: Ansible

Showing posts with label Ansible. Show all posts

Friday, September 23, 2022

Code Explainer: Regex and Backrefs in Ansible Code

Recently, I'd submitted a code-update to a customer-project I was working on. I tend to write very dense code, even when using simplification frameworks like Ansible. As a result, I had to answer some questions asked by the person who did the code review. Ultimately, I figured it was worth writing up an explainer of what I'd asked them to review…

The Ansible-based code in question was actually just one play:

---
- name: Remove sha512 from the LOG option-list
  ansible.builtin.lineinfile:
    backrefs: true
    line: '\g<log>\g<equals>\g<starttoks>\g<endtoks>'
    path: /etc/aide.conf
    regexp: '^#?(?P<log>LOG)(?P<equals>(\s?)(=)(\s?))(?P<starttoks>.*\w)(?P<rmtok>\+?sha512)(?P<endtoks>\+?.*)'
    state:present
...

The above is meant to ensure that the contents of the RHEL 7 config file, "/etc/aide.conf" sets the propper options on for the defined scan-definition, "LOG". The original contents of the line were:

LOG = p+u+g+n+acl+selinux+ftype+sha512+xattrsfor

The STIGs were updated to indicate that the contents of that line should actually be:

LOG = p+u+g+n+acl+selinux+ftype+xattrsfor

The values of the Ansible play's regexp and backrefs attributes are designed to use the advanced line-editing afforded through the Ansible lineinfile module's capability. Ansible is a Python-based service. This module's advanced line-editing capabilities are implemented using Python's re() function. The regexp attribute's value is written to make use of the re() function's ability to do referenceable search-groupings. Search-groupings are specified using parenthesis-delimited search-rules (i.e., "(SEARCH_SYNTAX)").

By default, a given search-grouping is referenced by a left-to-right index-number. These number starting at "1". The reference-IDs can then be referenced – also refered to as "backrefs" – in the replacement-string (through the value of the play's line attribute) to help construct the replacement string's value. Using the index-number method, the replacement-string would be "\1\2\6\8" …which isn't exactly self-explanatory.

To help with readability, each group can be explicitly-named. To assign a name to a search-group, one uses the syntax ?P<LABEL_NAME> at the beginning of the search-group. Once the group is assigned a name, it can subsequently be referenced by that name using the syntax "\g<LABEL_NAME>".

If one visits the Regex101 web-site and selects the "Python" regex-type from the left menu, one can get a visual representation of how the above regexp gets interpreted. Enter the string to be evaluated in the "TEST STRING" section and then enter the value of the regexp parameter in the REGULAR EXPRESSION box. The site will then show you how the regex chops up the test string and tell you why it chopped it up that way:

Wednesday, November 4, 2020

Increasing Verbosity of Ansible Jobs

Sometimes, Ansible doesn't have really native methods for installing and/or configuring some types of content. As a result, you may find yourself resorting to using Ansible's shell: or command: modules. – basically a shell-out ("escape"_ methods to execute such tasks tasks. While these methods are ok for smaller tasks, for larger tasks they can expose you to a number of problems:

The shell-out can take quite a long time to return
The shell-out can leave you guessing what it's actually doing – leaving you wondering:
- "is it actually still working"
- "is it hung"
- "is this going to leave me waiting forever or will it ultimately time out"
- etc.
The shell-out can be uninformative.
The shell-out can return an incorrect status:
- It may report a change where none actually occurred

It may report a success where there was partial- or even total-failure
It may report a failure where there was partial-success

While there's often not a lot that can be done about execution time – things take as long as they take – the other problems are addressable.

Trying to fix the "what's it actually doing" problem from within the shell-out, itself, often isn't meaningful: Ansible gathers up all the shell-output and only returns it once the shell exits. That said, improving your shell-out's output isn't a wholly-wasted effort: making your shell-out more verbose can help you if it does error-out; it can also provide greater assurance if you want to pore through the change: or ok: (success) output. This can help you with some of the "can be uninformative" problem, even if only after-the-fact.

Similarly, trying to fix the "return an incorrect status" problem strictly from within the shell-out doesn't necessarily provide a full solution. It can improve the overall reliability of the shell-out. However, it doesn't necessarily fix the status that Ansible uses when it tries to decide, "should I abort this run or should I continue" or any other contingent- or branching-logic you might want to implement.

Recently, I ran into these kinds of problems at one of my customer sites. They're a shop that has a significant percentage of their user-base that are data-science oriented. As such, they use Ansible to install the R language binaries along with a few hundred modules that their various users like to use. While they're an Ansible shop, they'd implemented the installation as a call to an external shell script. Ansible first pushes the script out and then executes it on the targets.

The script, itself, wasn't especially robustly-written: next to no error-handling or -reporting. It's basically a launch-and-pray kind of tool. On the plus side, they had thought ahead well enough to provide a mechanism for determining where in the shell-managed installation-process things had died. Basically, as the script runs, it creates a simple file containing a list of yet-to-be-installed modules from which you could infer that one or more of them had failed. That said, to see that file, you have to manually SSH to the managed-system and go view it.

Unfortunately, because the script is doing so many module-installs, it takes a really long time to execute (a few hours!). Because Ansible only reports script output when an invoked-script exits, Ansible pauses for a loooong time with no nerve-soothing output. And, as previously mentioned, if it does fail, you're stuck having to visit managed-systems to try to figure out why.

I'm not real tolerant of waiting for things to run with no output to tell me, "at least I know that it's doing something." So, I set out to refactor. While I'd hoped there was a native Ansible module for the task, my Google-fu wasn't able to turn anything up (my particular customer's environment doesn't lend itself well to using extensions to Ansible functionality such as one might find on Galaxy or GitHub). So, I too resorted to the shell-escape method.

That said, upon looking at the external shell script they wrote, I realized it was an extremely simple script. As such, I opted to replace it with an equivalent shell: | block in the associated Ansible plays.

---
- name: Iteratively install R-modules
  args:
    executable: /bin/bash
  changed_when: "'Added' in modInstall_result.stdout"
  environment:
    http_proxy: 'http://{{proxy_user}}:{{proxy_password}}@{{proxy_host}}:80/'
    https_proxy: $http_proxy
  failed_when: "modInstall_result.rc != 0 or 'had non-zero exit status' in modInstall_result.stderr"
  register: modInstall_result
  shell: |
    if [[ {{ item }} =~ ":" ]]
    then
       PACKAGE="$( echo {{ item }} | cut -d : -f 1 )"
       VERSION="$( echo {{ item }} | cut -d : -f 2 )"
       VERSTRING="version = '${VERSION}',"
    else
       PACKAGE="{{ item }}"
       VERSTRING=""
    fi

    Rscript --slave --no-save --no-restore-history -e "
    if (! ('${PACKAGE}' %in% installed.packages()[,'Package'])) {
      require(devtools);
      install_version(package = '${PACKAGE}', ${VERSTRING} upgrade = 'never', repos=c('http://cran.us.r-project.org'));
      print('Added');
    } else {
      print('Already installed');
    }"
  with_items: "{{ Rmodules.split('\n') }}"
...

The value of the refactored approach is that, instead of waiting hours for output, there's output associated with each installation-attempt. Further, the output is all captured on the host running the Ansible play: no having to visit managed systems to find something resembling a logfile.

Explaining the Play...

When I construct plays, I like to order the YAML alphabetically (with the exception of the "name" parameter – that always goes first). Which is to say, anything at a given directive-level will be ordered from A-Z. Some people prefer to put things in something more-resembling a functional order. I choose alphabetical because it makes it easier for me to cross-reference with the documentation.

"name" is fairly self-explanatory. It just provides a human-friendly indication of what Ansible is doing. In this case, iteratively installing R-modules (duh!).
"args": This can have a number of sub-parameters (I have yet to dig through the documentation or source to find all of them). I've only ever had use for the "executable" sub-parameter.
"changed_when": This parameter allows you to tell Ansible how to know that the shell-escape changed something. In this instance, I'm having it evaluate data contained in a variable named "modInstall_result" (set later via the "register" action).
"executable": this allows you to explicitly tell Ansible which interpreter to use. I'm pedantic, so, I like to tell it, "use /bin/bash".
"environment": this allows you to set execution-specific environment-variables for the shell to use. In this case, I'm setting the "http_proxy" and "https_proxy" environmentals. This is necessary because the build environment is isolated and I'm trying to let R's in-built URL-fetcher pull content from public, internet-hosted repositories (see this vendor documentation for explanation). The customer doesn't have a full CRAN mirror, so, leveraging this installation method minimizes having to account for dependencies.
"failed_when": This parameter allows you to tell Ansible how to know that the shell-escape failed. In this instance, I'm having it evaluate data contained in a variable named "modInstall_result" (set later via the "register" action).
"register": Used in this manner, it collects all of the inputs to and outputs produced by the shell and its exit code and store it in a JSON-formatted variable. In this case, the variable is named "modInstall_result". Data can be extracted via regular JSON-extraction methods
"shell": This is the actual code that Ansible will execute (via the previously-requested /bin/bash interpreter). I'll explain the block's contents, shortly
"with_items": This is one of the ways that Ansible allows you to run a given play iteratively. External to this play, I had defined the "Rmodules" variable to read in a text file – via Ansible's lookup() function – that contained one R module-name per line (and, optionally, an associated version-number). The "with_items" parameter-value is in the form of a list. The lookup() function originally created the "Rmodules" value as a single string with embedded newlines. Using the .split() function converts that string into a list. As Ansible iterates this list, each list-element is popped off and assigned to the temporary-variable "item".

Explaining the script-stanza:

The "shell:" module can be invoked as either a single line or as a block of text. Using the "|" as the value for the module tells the module that the following, indented block of text is to be treated as a single input-block. For readability of anything but very short script-content, I prefer the readability of the block of text.

The BASH content, itself, is basically two parts.

The first part takes the value of the iterated "item" temporary-value and parses it. If the string contains a colon, the string is split into defined PACKAGE and VERSION BASH-variables (with the VERSION BASH-variable being further expanded to an version-string statement suitable for use with the Rscript command). If the string does not contain a colon, the PACKAGE BASH-variable is set to the R module-name and the version-string BASH-variable is set to a null/empty value.
The second part is the Rscript logic. Using R's installed.packages() function, the existing R installation is checked for the presence of the R module-name contained in the PACKAGE BASH-variable. If not present, R's install_version() function is used, along with the PACKAGE BASH-variable and the version-string variable to install (an optionally versioned) R-module. The if check helps with idempotency – preventing attempts to reinstall the module and the not-inconsiderable time that reinstallation can take.

Note that, in order for the Rscript logic to work, the devtools (link goes to a specific, older version; other versions should work) R module must have been previously installed. In my case, this installation has been taken care of in a prior Ansible play (not presented here: the primary focus of this article was to illustrate how to use iteration to make for more-verbose configuration-management)

Tuesday, October 27, 2020

Smashing Walls of Text

On a few projects I work on, we make use of Ansible to automate system-configuration tasks. For better or worse, the automation relies on some upstream content that is outside the control of the customer. Translation: every few months, automation that has worked for months will suddenly no longer work.

By itself, this is more an inconvenience than a real problem. However, because the automation is handed off to other, more-junior people to execute, when errors are encountered, those others are frequently at a loss as to what to do.

Frequently, they don't bother to include logs of any sort ...or if they do, they include log snippets (or, worse: screen-caps of text-based log snippets!), the snippets frequently don't contain the information critical to solve the problem. So, first response in the support request is a "send me all the logs" type of reply.

Now, depending on the size of the Ansible job, the associate log-file might be HUGE. Generally, I'm only interested in where Ansible has failed. Parsing through an entire Ansible log-file can be like trying to find a specific brick in the Great Wall of China. So, to help preserve my sanity, I cooked up a quick BASH script to both help "cut to the chase" and provide more-easily readable log-output. That script looks like:

#!/bin/bash
#
# Script to filter the output of Ansible log files to something more-readable
#
###########################################################################
OUTFILE="${1:-/tmp/Ansible.log}"

mapfile -t ERROUT < <(
   grep ^failed: "${OUTFILE}"
)

# Make sure we got a mount-string
if [[ ${#ERROUT[@]} -lt 1 ]]
then
   echo "No failure-messages found in ${OUTFILE}"
fi

# Iterate over error-string
ITER=0
while [[ ${ITER} -lt ${#ERROUT[@]} ]]
do
   printf '##########\n## %s\n##########\n' "$(
      echo "${ERROUT[${ITER}]}" | sed 's/^failed:.* => //' | 
      python3 -c "import sys, json; print(json.load(sys.stdin)['item'])"
   )"
   
   echo "${ERROUT[${ITER}]}" | sed 's/^failed:.* => //' | \
     python3 -c "import sys, json; print(json.load(sys.stdin)['stderr'])" | sed 's/^/    /'
     
   printf '####################\n\n\n'
   ITER=$(( ITER + 1 ))
done

Monday, July 6, 2020

Taming the CUDA (Pt. II)

So, today, finally had a chance to implement in Ansible what I'd learned in Taming the CUDA.

Given that it takes a significant time to run the uninstall/new-install/reboot operation, I didn't want to just blindly execute the logic. So, I wanted to implement logic that checked to see what version, if any, of the CUDA drivers were already installed on the Ansible target. First step to this was as follows:

- name: Gather the rpm package facts
  package_facts:
    manager: auto

This tells Ansible to check the managed-host and gather relevant package-information for the base cuda RPM and stuff the return of the action into a registered variable `cuda_pkginfo`. This variable is a JSON structure that's then referencable by subsequent Ansible actions. Since I'm only interested in the installed version, I'm able to grab that information by grabbing the `cuda_pkginfo.results[0].version` value from the JSON structure and using it in a `when` conditional.

Because I had multiple actions that I wanted to make conditional on a common condition, I didn't want to have a bunch of configuration-blocks with the same conditional statement. Did some quick Googling and found that, yes, Ansible does support executing multiple steps within a shared-condition block. You just have to use (wait for it...) the `block` statement in concert with the shared condition-statement. When you use that statement, you then nest actions that you might otherwise have put in their own, individual action-blocks. In my case, the block ended up looking like:

- name: Update CUDA drivers as necessary
  block:
    - name: Copy CUDA RPM-repository definition
      copy:
        src: files/cuda-rhel7-11-0-local.repo-DSW
        dest: /etc/yum.repos.d/cuda-rhel7-11-0-local.repo
        group: 'root'
        mode: '000644'
        owner: 'root'
        selevel: 's0'
        serole: 'object_r'
        setype: 'etc_t'
        seuser: 'system_u'
    - name: Uninstall previous CUDA packages
      shell: |
          UNDOID=$( yum history info cuda | sed -n '/Transaction ID/p' | \
                    cut -d: -f 2 | sed 's/^[     ]*//g' | sed -n 1p )
          yum -y history undo "${UNDOID}"
    - name: Install new CUDA packages (main)
      yum:
        name:
          - cuda
          - nvidia-driver-latest-dkms
        state: latest
    - name: Install new CUDA packages (drivers)
      yum:
        name: cuda-drivers
        state: latest
  when:
    ansible_facts.packages['cuda'][0].version.split('.')[0]|int < 11

I'd considered doing the shell-out a bit more tersely – something like:

yum -y history undo $( yum history info cuda | \
sed -n '/Transaction ID/p' | cut -d: -f 2 | sed -n 1p)

But figured what I ended up using was marginally more readable for the very junior staff that will have to own this code after I'm gone.

Any way you slice it, though, I'm not super chuffed that I had to resort to a shell-out for the targeted/limited removal of packages. So, if you know a more Ansible-y way of doing this, please let me know.

I'd have also finished-out with one yum install-statement rather than the two, but the nVidia documentation for EL7 explicitly states to install the two groups separately. 🤷

Oh... And because I didn't want my `when` statement to be tied to the full X.Y.Z versioning of the drivers, I added the `split()` method so I could match against just the major number. Might have to revisit this if they ever reach a point where they care about the major and minor or the major, minor and release number. But, for now, the above suffices and is easy enough to extend via a compound `when` statement. Similarly, because Ansible defaults to string-output, I needed forcibly cast the string-output to an integer so that numeric comparison would work properly.

Final note: I ended up line-breaking where I did because yamllint had popped "too wide" alerts when I ran my playbook through it.

Friday, June 5, 2020

Ansible Journey: Adding /etc/fstab Entries

As noted in yesterday's post, I'm working on a new customer-project. One of the automation-tools this customer uses is Ansible. This is a new-to-me automation-technology. Previously — and aside from just writing bare BASH and Python code — I've used frameworks like Puppet, SaltStack and a couple others. So, picking up a new automation-technology — especially one that uses a DSL not terribly unlike one I was already familiar with, hasn't been super much of a stretch.

After sorting out yesterday's problem and how I wanted my /etc/fstab to look, I set about implementing it via Ansible. Ultimately, I ended up settling on a list-of-maps variable to drive a lineinfile role-task. I chose a list-of-maps variable mostly because the YAML that Ansible relies on doesn't really do tuples. My var ended up looking like:

s3fs_fstab_nest:
  - mountpoint: /provisioning/repo
    bucket: s3fs-build-bukkit
    folder: RPMs
  - mountpoint: /provisioning/installers
    bucket: s3fs-build-bukkit
    folder: EXEs
  - mountpoint: /Data/personal
    bucket: s3fs-users-bukkit
    folder: build

And my play ended up looking like:

---
- name:  "Add mount to /etc/fstab"
  lineinfile:
    path: '/etc/fstab'
    line: "s3fs#{{ item.bucket }}:/{{ item.folder }}\t{{ item.mountpoint }}fuse\t_netdev,allow_other,umask=0000,nonempty 0 0"
  loop: "{{ s3fs_fstab_nest }}"
...

Was actually a lot simpler than I was expecting it to be.