Friday, September 23, 2022

Code Explainer: Regex and Backrefs in Ansible Code

Recently, I'd submitted a code-update to a customer-project I was working on. I tend to write very dense code, even when using simplification frameworks like Ansible. As a result, I had to answer some questions asked by the person who did the code review. Ultimately, I figured it was worth writing up an explainer of what I'd asked them to review…

The Ansible-based code in question was actually just one play:

---
- name: Remove sha512 from the LOG option-list
  ansible.builtin.lineinfile:
    backrefs: true
    line: '\g<log>\g<equals>\g<starttoks>\g<endtoks>'
    path: /etc/aide.conf
    regexp: '^#?(?P<log>LOG)(?P<equals>(\s?)(=)(\s?))(?P<starttoks>.*\w)(?P<rmtok>\+?sha512)(?P<endtoks>\+?.*)'
    state:present
...

The above is meant to ensure that the contents of the RHEL 7 config file, "/etc/aide.conf" sets the propper options on for the defined scan-definition, "LOG". The original contents of the line were:

LOG = p+u+g+n+acl+selinux+ftype+sha512+xattrsfor

The STIGs were updated to indicate that the contents of that line should actually be:

LOG = p+u+g+n+acl+selinux+ftype+xattrsfor

The values of the Ansible play's regexp and backrefs attributes are designed to use the advanced line-editing afforded through the Ansible lineinfile module's capability. Ansible is a Python-based service. This module's advanced line-editing capabilities are implemented using Python's re() function. The regexp attribute's value is written to make use of the re() function's ability to do referenceable search-groupings. Search-groupings are specified using parenthesis-delimited search-rules (i.e., "(SEARCH_SYNTAX)").

By default, a given search-grouping is referenced by a left-to-right index-number. These number starting at "1". The reference-IDs can then be referenced – also refered to as "backrefs" – in the replacement-string (through the value of the play's line attribute) to help construct the replacement string's value. Using the index-number method, the replacement-string would be "\1\2\6\8" …which isn't exactly self-explanatory.

To help with readability, each group can be explicitly-named. To assign a name to a search-group, one uses the syntax ?P<LABEL_NAME> at the beginning of the search-group. Once the group is assigned a name, it can subsequently be referenced by that name using the syntax "\g<LABEL_NAME>".

If one visits the Regex101 web-site and selects the "Python" regex-type from the left menu, one can get a visual representation of how the above regexp gets interpreted. Enter the string to be evaluated in the "TEST STRING" section and then enter the value of the regexp parameter in the REGULAR EXPRESSION box. The site will then show you how the regex chops up the test string and tell you why it chopped it up that way:

Regex101 Screen-cap




No comments:

Post a Comment