Given that it takes a significant time to run the uninstall/new-install/reboot operation, I didn't want to just blindly execute the logic. So, I wanted to implement logic that checked to see what version, if any, of the CUDA drivers were already installed on the Ansible target. First step to this was as follows:
- name: Gather the rpm package facts
  package_facts:
    manager: auto
This tells Ansible to check the managed-host and gather relevant package-information for the base cuda RPM and stuff the return of the action into a registered variable `cuda_pkginfo`. This variable is a JSON structure that's then referencable by subsequent Ansible actions. Since I'm only interested in the installed version, I'm able to grab that information by grabbing the `cuda_pkginfo.results[0].version` value from the JSON structure and using it in a `when` conditional.Because I had multiple actions that I wanted to make conditional on a common condition, I didn't want to have a bunch of configuration-blocks with the same conditional statement. Did some quick Googling and found that, yes, Ansible does support executing multiple steps within a shared-condition block. You just have to use (wait for it...) the `block` statement in concert with the shared condition-statement. When you use that statement, you then nest actions that you might otherwise have put in their own, individual action-blocks. In my case, the block ended up looking like:
- name: Update CUDA drivers as necessary
  block:
    - name: Copy CUDA RPM-repository definition
      copy:
        src: files/cuda-rhel7-11-0-local.repo-DSW
        dest: /etc/yum.repos.d/cuda-rhel7-11-0-local.repo
        group: 'root'
        mode: '000644'
        owner: 'root'
        selevel: 's0'
        serole: 'object_r'
        setype: 'etc_t'
        seuser: 'system_u'
    - name: Uninstall previous CUDA packages
      shell: |
          UNDOID=$( yum history info cuda | sed -n '/Transaction ID/p' | \
                    cut -d: -f 2 | sed 's/^[     ]*//g' | sed -n 1p )
          yum -y history undo "${UNDOID}"
    - name: Install new CUDA packages (main)
      yum:
        name:
          - cuda
          - nvidia-driver-latest-dkms
        state: latest
    - name: Install new CUDA packages (drivers)
      yum:
        name: cuda-drivers
        state: latest
  when:
    ansible_facts.packages['cuda'][0].version.split('.')[0]|int < 11
I'd considered doing the shell-out a bit more tersely – something like:
But figured what I ended up using was marginally more readable for the very junior staff that will have to own this code after I'm gone.yum -y history undo $( yum history info cuda | \ sed -n '/Transaction ID/p' | cut -d: -f 2 | sed -n 1p)
Any way you slice it, though, I'm not super chuffed that I had to resort to a shell-out for the targeted/limited removal of packages. So, if you know a more Ansible-y way of doing this, please let me know.
I'd have also finished-out with one yum install-statement rather than the two, but the nVidia documentation for EL7 explicitly states to install the two groups separately. 🤷
Oh... And because I didn't want my `when` statement to be tied to the full X.Y.Z versioning of the drivers, I added the `split()` method so I could match against just the major number. Might have to revisit this if they ever reach a point where they care about the major and minor or the major, minor and release number. But, for now, the above suffices and is easy enough to extend via a compound `when` statement. Similarly, because Ansible defaults to string-output, I needed forcibly cast the string-output to an integer so that numeric comparison would work properly.
Final note: I ended up line-breaking where I did because yamllint had popped "too wide" alerts when I ran my playbook through it.
 
 
No comments:
Post a Comment