Showing posts with label rpm. Show all posts
Showing posts with label rpm. Show all posts

Wednesday, March 27, 2019

Travis, EL7 and Docker-Based Testing

As noted in a prior post, lot of my customer-oriented activities support deployment within networks that are either partially- or wholly-isolated from the public Internet. Yesterday, as part of supporting one such customer, I was stood up a new project to help automate the creation of yum repository configuration RPMs for private networks. I've had to hand-jam such files twice, now, and there's unwanted deltas between the two jam-sessions (in defense, they were separated from each other by nearly a three-year time-span). So, I figured it was time to standardize and automate things.

Usually, when I stand up a project, I like to include tests of the content that I wish to deliver. Since most of my projects are done in public GitHub repositories, I typically use TravisCI to automate my testing. Prior to this project, however, I wasn't trying to automate the validity-testing of RPM recipes via Travis. Typically, when automating creation of RPMs I wish to retain or deliver, I set up a Jenkins job that takes the resultant RPMs and stores them in Artifactory – both privately-hosted services. Most of my prior Travis jobs were simple, syntax-checkers (using tools like shellcheck, JSON validators, CFn validators, etc.) rather than functionality-checkers.

This time, however, I was trying to deliver a functionality (RPM spec files that would be used to generate source files from templates and package the results). So, I needed to be able to test that a set of spec files and source-templates could be reliably-used to generate RPMs. This meant, I needed my TravisCI job to generate "throwaway" RPMs from the project-files.

The TravisCI system's test-hosts are Ubuntu-based rather than RHEL or CentOS based.While there are some tools that will allow you to generate RPMs on Ubuntu, there've been some historical caveats on their reliability and/or representativeness. So, my preference was to be able to use a RHEL or CentOS-based context for my test-packagings. Fortunately, TravisCI does offer the ability to use Docker on their test-hosts.

In general, setting up a Docker-oriented job is relatively straight forward. Where things get "fun" is that the version of `rpmbuild` that comes with Enterprise Linux 7 gets kind of annoyed if it's not able to resolve the UIDs and GIDs of the files it's trying to build from (never mind that the build-user inside the running Docker-container is "root" ...and has unlimited access within that container). If it can't resolve them, the rpmbuild tasks fail with a multitude of not terribly helpful "error: Bad owner/group: /path/to/repository/file" messages.

After googling about, I ultimately found that I needed to ensure that the UIDs and GIDs of the project-content need to exist within the Docker-container's /etc/passwd and /etc/group files, respectively. Note: most of the "top" search results Google returned to me indicated that the project files needed to be `chown`ed. However, simply being mappable proved to be sufficient.

Rounding the home stretch...

To resolve the problem, I needed to determine what UIDs and GIDs the project-content had inside my Docker-container. That meant pushing a Travis job that included a (temporary) diagnostic-block to stat the relevant files and return me their UIDs and GIDs. Once the UIDs and GIDs were determined, I needed to update my Travis job to add relevant groupadd and useradd statements to my container-preparation steps. What I ended up with was.

    sudo docker exec centos-${OS_VERSION} groupadd -g 2000 rpmbuilder
    sudo docker exec centos-${OS_VERSION} adduser -g 2000 -u 2000 rpmbuilder

It was late in the day, by this point, so I simply assumed that the IDs were stable. I ran about a dozen iterations of my test, and they stayed stable, but that may have just been "luck". If I run into future "owner/group" errors, I'll update my Travis job-definition to scan the repository-contents for their current UIDs and GIDs and then set them based on those. But, for now, my test harness works: I'm able to know that updates to existing specs/templates or additional specs/templates will create working RPMs when they're taken to where they need to be used.

Wednesday, May 2, 2018

"How Big Is It," You Ask?

For one of the projects I'm supporting, they want to deploy into an isolated network. However, they want to be able to flexibly support systems in that network being able to fetch feature and patch RPMs (and other stuff) as needed. Being on an isolated network, they don't want to overwhelm the security gateways by having a bunch of systems fetching directly from the internet. Result: a desire to host a controlled mirror of the relevant upstream software repositories.

This begged the question, "what all do we need to mirror." I stress the word "need" because this customer is used to thinking in terms of manual rather than automated efforts (working on that with them, too). As a result of this manual mind-set, they want to keep data-sets around tasks small.

Unfortunately, their idea of small is also a bit out of date. To them, 100GiB — at least for software repositories — falls into the "not small" category. To me, if I can fit something on my phone, it's a small amount of data. We'll ignore the fact that I'm biased by the fact that I could jam a 512GiB micro SD card into my phone. At any rate, they were wanting to minimize the number of repositories and channels they were going to need to mirror so that they could keep the copy-jobs small. I pointed out "you want these systems to have a similar degree of fetching-functionality to what they'd have if they weren't being blocked from downloading directly from the Internet: that means you need <insert exhaustive list of repositories and channels>". Naturally, they balked. The questions "how do you know they'll actually need all of that" and "how much space is all of that going to take" were asked (with a certain degree of overwroughtness).

I pointed out that I couldn't meaningfully answer the former question because I wasn't part of the design-groups for the systems that were to be deployed in the isolated network. I also pointed out that they'd probably be better asking the owners of the prospective systems what they'd anticipate needing (knowing full well that, usually, such questions' answers are somewhere in the "I'm not sure, yet" neighborhood). As such, my exhaustive list was a hedge: better to have and not need than to need and not have. Given the stupid-cheapness of storage and that it can actually be easier to sync all of the channels in a given repository vice syncing a subset, I didn't see a a benefit to not throw storage at the problem.

To the second question, I pointed out, "I'm sitting at this meeting, not in front of my computer. I can get you a full, channel-by-channel breakdown once I get back to my desk.

One of the nice things about yum repositories (where the feather and patch RPMs would come from) is that they're easily queryble for both item-counts and aggregate sizes. Only down side is that the OS-included tools for doing so are more for human-centric, ad hoc queries rather than something that can be jammed into an Excel spreadsheet and =SUM formulae being run. In other words, sizes are put into "friendly" units: if you have 100KiB of data, the number is reported in KiB; if you have 1055KiB of data, the number is reported in MiB; and so on. So, I needed to "wrap" the native tools output to put everything into consistent units (which Excel prefers for =SUMing and other mathematical actions). Because it was a "quick" task, I did it in BASH. In retrospect, using another language likely would have been far less ugly. However, what I came up with worked for creating a CSV:

#!/bin/bash

for YUM in $(
   yum repolist all | awk '/abled/{ print $1}' | \
      sed -e '{
         /-test/d
         /-debuginfo/d
         /-source/d
         /-media/d
      }'  | sed 's/\/.*$//'
   )
do
IFS=$'
'
   REPOSTRUCT=($(
      yum --disablerepo=* --enablerepo=${YUM} repolist -v | \
      grep ^Repo- | grep -E "(id|-name|pkgs|size) *:" | \
      sed 's/ *: /:/'
   ))
   unset IFS

   REPSZ=($(echo "${REPOSTRUCT[3]}" | sed 's/^Repo-size://'))

   if [[ $( echo "${REPSZ[1]}" ) = M ]]
   then
      SIZE=$(echo "${REPSZ[0]} * 1024" | bc)
   elif [[ $( echo "${REPSZ[1]}" ) = G ]]
   then
      SIZE=$(echo "${REPSZ[0]} * 1024 * 1024 " | bc)
   else
      SIZE="${REPSZ[0]}"
   fi
 
   for CT in 0 1 2
   do
      printf "%s;" "${REPOSTRUCT[${CT}]}"
   done
   echo ${SIZE}
done | sed 's/Repo-[a-z]*://'

Yeah... hideous and likely far from optimal ...but not worth my time (even if I had it) to revisit. It gets the job done and there's a certain joy to writing hideous code to solve a problem you didn't want to be asked to solve in the first place.

At any rate, checking against all of the repos that we'd want to mirror for the project, the initial-sync data-set would fit on my phone (without having to upgrade to one of  top-end beasties). Pointing out that the only the initial sync would be "large" and that only a couple of the channels updated with anything resembling regularity (the rest being essentially static), the monthly delta-sync would be vastly smaller and trivial to babysit. So, we'll see whether that assuages their anxieties or not.

Monday, June 13, 2016

Seriously, CentOS?

One of my (many) duties in our shop is doing cross-platform maintenance of RPMs. Previously, when I was maintaining them for EL5 and EL6, things were fairly straight-forward. You got or made a SPEC file to packages your SOURCES into RPMS and you were pretty much good to go. Imagine my surprise when I went to start porting things to EL7 and all my freaking packages had el7.centos in their damned names. WTF?? These RPMs are for use on all of our EL7 systems, not just CentOS: why the hell are you dropping the implementation into my %{dist}-string?? So, now I have to make sure that my ${HOME}/.rpmmacros file has a line in it that looks like:
%dist .el7
If I don't want my %{dist}-string get crapped-up.

Screw you, CentOS. Stuff was just fine on CentOS 5 and CentOS 6. This change is not an improvement.