Titular Discrepancy: Increasing Verbosity of Ansible Jobs

Sometimes, Ansible doesn't have really native methods for installing and/or configuring some types of content. As a result, you may find yourself resorting to using Ansible's shell: or command: modules. – basically a shell-out ("escape"_ methods to execute such tasks tasks. While these methods are ok for smaller tasks, for larger tasks they can expose you to a number of problems:

The shell-out can take quite a long time to return
The shell-out can leave you guessing what it's actually doing – leaving you wondering:
- "is it actually still working"
- "is it hung"
- "is this going to leave me waiting forever or will it ultimately time out"
- etc.
The shell-out can be uninformative.
The shell-out can return an incorrect status:
- It may report a change where none actually occurred

It may report a success where there was partial- or even total-failure
It may report a failure where there was partial-success

While there's often not a lot that can be done about execution time – things take as long as they take – the other problems are addressable.

Trying to fix the "what's it actually doing" problem from within the shell-out, itself, often isn't meaningful: Ansible gathers up all the shell-output and only returns it once the shell exits. That said, improving your shell-out's output isn't a wholly-wasted effort: making your shell-out more verbose can help you if it does error-out; it can also provide greater assurance if you want to pore through the change: or ok: (success) output. This can help you with some of the "can be uninformative" problem, even if only after-the-fact.

Similarly, trying to fix the "return an incorrect status" problem strictly from within the shell-out doesn't necessarily provide a full solution. It can improve the overall reliability of the shell-out. However, it doesn't necessarily fix the status that Ansible uses when it tries to decide, "should I abort this run or should I continue" or any other contingent- or branching-logic you might want to implement.

Recently, I ran into these kinds of problems at one of my customer sites. They're a shop that has a significant percentage of their user-base that are data-science oriented. As such, they use Ansible to install the R language binaries along with a few hundred modules that their various users like to use. While they're an Ansible shop, they'd implemented the installation as a call to an external shell script. Ansible first pushes the script out and then executes it on the targets.

The script, itself, wasn't especially robustly-written: next to no error-handling or -reporting. It's basically a launch-and-pray kind of tool. On the plus side, they had thought ahead well enough to provide a mechanism for determining where in the shell-managed installation-process things had died. Basically, as the script runs, it creates a simple file containing a list of yet-to-be-installed modules from which you could infer that one or more of them had failed. That said, to see that file, you have to manually SSH to the managed-system and go view it.

Unfortunately, because the script is doing so many module-installs, it takes a really long time to execute (a few hours!). Because Ansible only reports script output when an invoked-script exits, Ansible pauses for a loooong time with no nerve-soothing output. And, as previously mentioned, if it does fail, you're stuck having to visit managed-systems to try to figure out why.

I'm not real tolerant of waiting for things to run with no output to tell me, "at least I know that it's doing something." So, I set out to refactor. While I'd hoped there was a native Ansible module for the task, my Google-fu wasn't able to turn anything up (my particular customer's environment doesn't lend itself well to using extensions to Ansible functionality such as one might find on Galaxy or GitHub). So, I too resorted to the shell-escape method.

That said, upon looking at the external shell script they wrote, I realized it was an extremely simple script. As such, I opted to replace it with an equivalent shell: | block in the associated Ansible plays.

---
- name: Iteratively install R-modules
  args:
    executable: /bin/bash
  changed_when: "'Added' in modInstall_result.stdout"
  environment:
    http_proxy: 'http://{{proxy_user}}:{{proxy_password}}@{{proxy_host}}:80/'
    https_proxy: $http_proxy
  failed_when: "modInstall_result.rc != 0 or 'had non-zero exit status' in modInstall_result.stderr"
  register: modInstall_result
  shell: |
    if [[ {{ item }} =~ ":" ]]
    then
       PACKAGE="$( echo {{ item }} | cut -d : -f 1 )"
       VERSION="$( echo {{ item }} | cut -d : -f 2 )"
       VERSTRING="version = '${VERSION}',"
    else
       PACKAGE="{{ item }}"
       VERSTRING=""
    fi

    Rscript --slave --no-save --no-restore-history -e "
    if (! ('${PACKAGE}' %in% installed.packages()[,'Package'])) {
      require(devtools);
      install_version(package = '${PACKAGE}', ${VERSTRING} upgrade = 'never', repos=c('http://cran.us.r-project.org'));
      print('Added');
    } else {
      print('Already installed');
    }"
  with_items: "{{ Rmodules.split('\n') }}"
...

The value of the refactored approach is that, instead of waiting hours for output, there's output associated with each installation-attempt. Further, the output is all captured on the host running the Ansible play: no having to visit managed systems to find something resembling a logfile.

Explaining the Play...

When I construct plays, I like to order the YAML alphabetically (with the exception of the "name" parameter – that always goes first). Which is to say, anything at a given directive-level will be ordered from A-Z. Some people prefer to put things in something more-resembling a functional order. I choose alphabetical because it makes it easier for me to cross-reference with the documentation.

"name" is fairly self-explanatory. It just provides a human-friendly indication of what Ansible is doing. In this case, iteratively installing R-modules (duh!).
"args": This can have a number of sub-parameters (I have yet to dig through the documentation or source to find all of them). I've only ever had use for the "executable" sub-parameter.
"changed_when": This parameter allows you to tell Ansible how to know that the shell-escape changed something. In this instance, I'm having it evaluate data contained in a variable named "modInstall_result" (set later via the "register" action).
"executable": this allows you to explicitly tell Ansible which interpreter to use. I'm pedantic, so, I like to tell it, "use /bin/bash".
"environment": this allows you to set execution-specific environment-variables for the shell to use. In this case, I'm setting the "http_proxy" and "https_proxy" environmentals. This is necessary because the build environment is isolated and I'm trying to let R's in-built URL-fetcher pull content from public, internet-hosted repositories (see this vendor documentation for explanation). The customer doesn't have a full CRAN mirror, so, leveraging this installation method minimizes having to account for dependencies.
"failed_when": This parameter allows you to tell Ansible how to know that the shell-escape failed. In this instance, I'm having it evaluate data contained in a variable named "modInstall_result" (set later via the "register" action).
"register": Used in this manner, it collects all of the inputs to and outputs produced by the shell and its exit code and store it in a JSON-formatted variable. In this case, the variable is named "modInstall_result". Data can be extracted via regular JSON-extraction methods
"shell": This is the actual code that Ansible will execute (via the previously-requested /bin/bash interpreter). I'll explain the block's contents, shortly
"with_items": This is one of the ways that Ansible allows you to run a given play iteratively. External to this play, I had defined the "Rmodules" variable to read in a text file – via Ansible's lookup() function – that contained one R module-name per line (and, optionally, an associated version-number). The "with_items" parameter-value is in the form of a list. The lookup() function originally created the "Rmodules" value as a single string with embedded newlines. Using the .split() function converts that string into a list. As Ansible iterates this list, each list-element is popped off and assigned to the temporary-variable "item".

Explaining the script-stanza:

The "shell:" module can be invoked as either a single line or as a block of text. Using the "|" as the value for the module tells the module that the following, indented block of text is to be treated as a single input-block. For readability of anything but very short script-content, I prefer the readability of the block of text.

The BASH content, itself, is basically two parts.

The first part takes the value of the iterated "item" temporary-value and parses it. If the string contains a colon, the string is split into defined PACKAGE and VERSION BASH-variables (with the VERSION BASH-variable being further expanded to an version-string statement suitable for use with the Rscript command). If the string does not contain a colon, the PACKAGE BASH-variable is set to the R module-name and the version-string BASH-variable is set to a null/empty value.
The second part is the Rscript logic. Using R's installed.packages() function, the existing R installation is checked for the presence of the R module-name contained in the PACKAGE BASH-variable. If not present, R's install_version() function is used, along with the PACKAGE BASH-variable and the version-string variable to install (an optionally versioned) R-module. The if check helps with idempotency – preventing attempts to reinstall the module and the not-inconsiderable time that reinstallation can take.

Note that, in order for the Rscript logic to work, the devtools (link goes to a specific, older version; other versions should work) R module must have been previously installed. In my case, this installation has been taken care of in a prior Ansible play (not presented here: the primary focus of this article was to illustrate how to use iteration to make for more-verbose configuration-management)

Titular Discrepancy

Wednesday, November 4, 2020

Increasing Verbosity of Ansible Jobs

No comments:

Post a Comment