Titular Discrepancy: repository

Showing posts with label repository. Show all posts

Wednesday, May 2, 2018

"How Big Is It," You Ask?

For one of the projects I'm supporting, they want to deploy into an isolated network. However, they want to be able to flexibly support systems in that network being able to fetch feature and patch RPMs (and other stuff) as needed. Being on an isolated network, they don't want to overwhelm the security gateways by having a bunch of systems fetching directly from the internet. Result: a desire to host a controlled mirror of the relevant upstream software repositories.

This begged the question, "what all do we need to mirror." I stress the word "need" because this customer is used to thinking in terms of manual rather than automated efforts (working on that with them, too). As a result of this manual mind-set, they want to keep data-sets around tasks small.

Unfortunately, their idea of small is also a bit out of date. To them, 100GiB — at least for software repositories — falls into the "not small" category. To me, if I can fit something on my phone, it's a small amount of data. We'll ignore the fact that I'm biased by the fact that I could jam a 512GiB micro SD card into my phone. At any rate, they were wanting to minimize the number of repositories and channels they were going to need to mirror so that they could keep the copy-jobs small. I pointed out "you want these systems to have a similar degree of fetching-functionality to what they'd have if they weren't being blocked from downloading directly from the Internet: that means you need <insert exhaustive list of repositories and channels>". Naturally, they balked. The questions "how do you know they'll actually need all of that" and "how much space is all of that going to take" were asked (with a certain degree of overwroughtness).

I pointed out that I couldn't meaningfully answer the former question because I wasn't part of the design-groups for the systems that were to be deployed in the isolated network. I also pointed out that they'd probably be better asking the owners of the prospective systems what they'd anticipate needing (knowing full well that, usually, such questions' answers are somewhere in the "I'm not sure, yet" neighborhood). As such, my exhaustive list was a hedge: better to have and not need than to need and not have. Given the stupid-cheapness of storage and that it can actually be easier to sync all of the channels in a given repository vice syncing a subset, I didn't see a a benefit to not throw storage at the problem.

To the second question, I pointed out, "I'm sitting at this meeting, not in front of my computer. I can get you a full, channel-by-channel breakdown once I get back to my desk.

One of the nice things about yum repositories (where the feather and patch RPMs would come from) is that they're easily queryble for both item-counts and aggregate sizes. Only down side is that the OS-included tools for doing so are more for human-centric, ad hoc queries rather than something that can be jammed into an Excel spreadsheet and =SUM formulae being run. In other words, sizes are put into "friendly" units: if you have 100KiB of data, the number is reported in KiB; if you have 1055KiB of data, the number is reported in MiB; and so on. So, I needed to "wrap" the native tools output to put everything into consistent units (which Excel prefers for =SUMing and other mathematical actions). Because it was a "quick" task, I did it in BASH. In retrospect, using another language likely would have been far less ugly. However, what I came up with worked for creating a CSV:

#!/bin/bash

for YUM in $(
   yum repolist all | awk '/abled/{ print $1}' | \
      sed -e '{
         /-test/d
         /-debuginfo/d
         /-source/d
         /-media/d
      }'  | sed 's/\/.*$//'
   )
do
IFS=$'
'
   REPOSTRUCT=($(
      yum --disablerepo=* --enablerepo=${YUM} repolist -v | \
      grep ^Repo- | grep -E "(id|-name|pkgs|size) *:" | \
      sed 's/ *: /:/'
   ))
   unset IFS

   REPSZ=($(echo "${REPOSTRUCT[3]}" | sed 's/^Repo-size://'))

   if [[ $( echo "${REPSZ[1]}" ) = M ]]
   then
      SIZE=$(echo "${REPSZ[0]} * 1024" | bc)
   elif [[ $( echo "${REPSZ[1]}" ) = G ]]
   then
      SIZE=$(echo "${REPSZ[0]} * 1024 * 1024 " | bc)
   else
      SIZE="${REPSZ[0]}"
   fi
 
   for CT in 0 1 2
   do
      printf "%s;" "${REPOSTRUCT[${CT}]}"
   done
   echo ${SIZE}
done | sed 's/Repo-[a-z]*://'

Yeah... hideous and likely far from optimal ...but not worth my time (even if I had it) to revisit. It gets the job done and there's a certain joy to writing hideous code to solve a problem you didn't want to be asked to solve in the first place.

At any rate, checking against all of the repos that we'd want to mirror for the project, the initial-sync data-set would fit on my phone (without having to upgrade to one of top-end beasties). Pointing out that the only the initial sync would be "large" and that only a couple of the channels updated with anything resembling regularity (the rest being essentially static), the monthly delta-sync would be vastly smaller and trivial to babysit. So, we'll see whether that assuages their anxieties or not.

Wednesday, May 24, 2017

Barely There

So, this morning I get an IM from one of my teammates asking, "you don't happen to have your own copy of <GitHub_Hosted_Project>, do you? I had kind of an 'oops' moment a few minutes ago." Unfortunately, because my A/V had had its own "oops" moment two days prior, all of my local project copies had gone *poof!*, as well.

Fortunately, I'd been making a habit of configuring our RedMine to do bare-syncs of all of our projects. So, I replied back, "hold on, I'll see what I can recover from RedMine." I update the security rules for our RedMine VM and then SSH in. I escalate my privileges and navigate to where RedMine's configured to create repository copies. I look at the repository copy and remember, "shit, this only does a bare copy. None of the actual files I need is here."

So, I consult the Googles to see if there's any articles on "how to recreate a full git repository from a bare copy" (and permutations thereof). Pretty much all the results are around "how to create a bare repository" ...not really helpful.

I go to my RedMine project's repository page and notice the "download" link when I click on one of the constituent files. I click on it to see whether I actually get the file's contents or if all the download link is is a (now broken) referral to the file on GitHub. Low and behold, the file downloads. It's a small project, so, absolute worst case, I can download all of the individual files an manually recreate the GitHub project and only lose my project-history.

That said, the fact that I'm able to download the files tells me, "RedMine has a copy of those files somewhere." So I think to myself, "mebbe I need to try another search: clearly RedMine is allowing me to check files out of this 'bare' repository, so perhaps there's something similar I can do more directly." I return to Google and punch in "git bare repository checkout". Voilà. A couple useful-looking hits. I scan through a couple of them and find that I can create a full clone from a bare repo. All I have to do is go into my (RedMine's) filesystem, copy my bare repository to a safe location (just in case) and then clone from it:

# find <BARE_REPOSITORY> -print | cpio -vpmd /tmp
     # cd /tmp
     # git clone <BARE_REPOSITORY_COPY> <FULL_REPOSITORY_TARGET>
     # find <FULL_REPOSITORY_TARGET>

That final find shows me that I now have a full repository (there's now a fully populated .git subdirectory in it). I chown the directory to my SSH user-account, then exit my sudo session (I'd ssh'ed in with key-forwarding enabled).

I go to GitHub and (re)create the nuked project, then, configure the on-disk copy of my files and git metadata to be able to push everything back to GitHub. I execute my git push and all looks good from the ssh session. I hop back to GitHub and there is all my code and all of my commit-history and other metadata. Huzzah!

I finish out by going back and setting up branch-protection and my push-rules and CI-test integrations. Life is good again.

Wednesday, March 25, 2015

So You Don't Want to BYOL

The Amazon Web Services MarketPlace is pretty awesome. There's oodles of pre-made machine templates to choose some. Even in the face of all that choice, it's not unusual to find that, of all the choices you have, none quite fit your needs. That's the scenario I found myself in.

Right now, I'm supporting a customer that's a heavy user of Linux for their business support systems. They're in the process of migrating from our legacy hosting environment to hosting things on AWS. During their development phase, use of CentOS was sufficient for their needs. As they move to production, however, they want "real" Red Hat Enterprise Linux.

Go up on the MarketPlace and there's plenty of options to choose from. However, my customer doesn't want to deal with buying a stand-alone entitlement to patch-support for their AWS-hosted systems. This requirement considerably cuts down on the useful choices in the MarketPlace. There's still "license included" Red Hat options to choose from.

Unfortunately, my customer also has fairly specific partitioning requirements that are not met by the "license included" AMIs. When using CentOS, this wan't a problem - CentOS's patch repos are open-access. Creating an AMI with suitable partitioning and access to those public repos is about a 20 minute process. While some of that process is leveragable for creating a Red Hat AMI, making the resultant AMI be "license included" is a bit more challenging.

When I tried to simply re-use my CentOS process, supplemented by the Amazon repo RPMs, I ended up with a system that, when I did a yum-query, got me 401 errors. I was missing something.

Google searches weren't terribly helpful in solving my problem. I found a lot of "how do I do this" posts, but damned few that actually included the answer. Ultimately, what it turns out to be is that if you generate your AMI from an EBS snapshot, instances launched from that AMI don't have an entitlement key to access the Amazon yum repos. You can see this by looking at your launched instance's metadata:

# curl http://169.254.169.254/latest/dynamic/instance-identity/document
{
  "accountId" : "717243568699",
  "architecture" : "x86_64",
  "availabilityZone" : "us-west-2b",
  "billingProducts" : null,
  "devpayProductCodes" : null,
  "imageId" : "ami-9df0ec7a",
  "instanceId" : "i-51825ba7",
  "instanceType" : "t1.micro",
  "kernelId" : "aki-fc8f11cc",
  "pendingTime" : "2015-03-25T19:04:51Z",
  "privateIp" : "172.31.19.148",
  "ramdiskId" : null,
  "region" : "us-east-1",
  "version" : "2010-08-31"
}

Specifically, what you want to look at is the value for "billingProducts". If it's "null", your yum isn't going to be able to access the Amazon RPM repositories. Where I came up close to empty on my Google searches was "how to make this attribute persist across images".

I found a small note in a community forum post indicating that AMIs generated from an EBS snapshot will always have "billingProducts" set to "null". This is due to a limitation in the tool used to register an image from a snapshot.

To get around this limitation, one has to create an AMI from a instance of an entitled AMI. Basically, after you've created the EBS you've readied to make a custom AMI, you do a disk-swap with a properly-entitled instance. You then use the "create image" option from that instance. Once you launch AMI you created via the EBS-swap, your instance's metadata will now look something like:

# curl http://169.254.169.254/latest/dynamic/instance-identity/document
{
  "accountId" : "717243568699",
  "architecture" : "x86_64",
  "availabilityZone" : "us-west-2b",
  "billingProducts" : [ "bp-6fa54006" ],
  "devpayProductCodes" : null,
  "imageId" : "ami-9df0ec7a",
  "instanceId" : "i-51825ba7",
  "instanceType" : "t1.micro",
  "kernelId" : "aki-fc8f11cc",
  "pendingTime" : "2015-03-25T19:04:51Z",
  "privateIp" : "172.31.19.148",
  "ramdiskId" : null,
  "region" : "us-east-1",
  "version" : "2010-08-31"
}

Once that "billingProducts" is set, the cloud-init related first-boot scripts will take that "billingProducts" and use it to register the system with the Amazon yum repos. Voilà: you now have a fully custom AMI that uses Amazon-billed access to Red Hat updates.

Note on Compatibility: the Red Hat provided PVM AMIs do not yield well to this method. The Red Hat provided PVM AMIs are all designed with their boot/root device set to /dev/sda1. To date, attempts to leverage the above techniques for PVM AMIs that require their boot/root device set to /dev/sda (used when using a single, partitioned EBS to host a bare /boot partition and LVM-managed root partitions) have not met with success.

Wednesday, January 18, 2012

Quick-n-Dirty Yum Repo via HTTP

Recently, we had a bit of a SNAFU in the year-end renewal of our RedHat support. As a result, all of the RHN accounts tied to the previous contract lost access to RHN's software download repositories. This meant that things like being able to yum-down RPMs on rhn_register'ed systems no longer worked and, we couldn't log into RHN and do a manual download, either.

Fortunately, because we're on moderately decent terms with RedHat and they know that the contract eventually will get sorted out, they were willing to help us get through our current access issues. Moving RHN accounts from one permanent contract to another, after first associating them with some kind of temporary entitlement is a paperwork-hassle for all parties involved and is apt to get your account(s) mis-associated down the road. Since all parties knew that this was a temporary problem but needed an immediate fix, our RedHat representative furnished us with the requisite physical media necessary to get us through till the contracts could get sorted out.

Knowing that I wasn't the only one that might need the software and that I might need to execute some burndown-rebuilds on a particular project I was working on, I wanted to make it easy to pull packages to my test systems. We're in an ESX environment, so, I spun up a small VM (only 1GB of virtual RAM, 1GHz of virtual CPU, a couple Gigabytes of virtual disk for a root volume and about 20GB of virtual disk to stage software onto and build an RPM repository on) to act as a yum repository server.

After spinning this basic VM, I had to sort out what to do as far as getting that physical media turned into a repo. I'm not a big fan of copying CDs as a stream of discrete files (been burned, too many times, by over-the-wire corruption, permission issues and the like). So, I took the DVD and made an ISO from it. I then took that ISO and scp'ed it up to the VM.

Once I had the ISO file copied up to the VM, did a quick mount of it (`mount -t iso9660 -o loop,ro /tmp/RHEL5.7_x86_64.iso /mnt/DVD` for those of you playing at home). Once I mounted it, I did a quick copy of its contents to the filesystem I'd set aside for it. I'm kind of a fan of cpio for things like this, so I cd'ed into the root of the mounted ISO and did a `find . -print | cpio -pmd /RepoDir` to create a good copy of my ISO data into a "real" filesystem (note, you'll want to make sure you do a `umask 022` first to ensure that the permission structures from the mounted ISO get copied, intact, along with the files, themselves).

With all of the DVD's files copied to the repo-server and into a writeable filesystem, it's necessary to create all the repo structures and references to support use by yum. Our standard build doesn't include the createrepo tool, so, first I had to locate its RPM in the repo filessytem and then install it onto my repo-server. Doing a quick `find . -name "*createrepo*rpm"` while cd'ed into repo fileystem turned up the path to the requisite RPM. I then did an `rpm -Uh [PathFromFind]` to install the createrepo tool's RPM files.

The createrepo tool is a nifty little tool. You just cd into the root of the directory where you copied your media to, do a `createrepo .`, and it scans the directory structures to find all the RPMs and XMLs and other files and creates the requisite data structures and pointers that allow yum to know how to pull the appropriate RPMs from the filesystem.

Once that's done, if all you care about is local access to the RPMs, you can create a basic .repo file in /etc/yum.repos.d that uses a "baseurl=file:///Path/To/Media" directive in it.

In my case, I wanted to make my new repo available to other hosts at the lab. Easiest way to make the repo available over the network is to do so via HTTP. Our standard build doesn't include the standard RedHat HTTP server, by default. So, I manually installed the requisite RPMs from the repo's filesystem. I modified the stock /etc/httpd/conf/httpd.conf and added the folowing stanzas to it:

Alias /Repo/ "/RepoDir/"
<Directory "/RepoDir">
   Options Indexes MultiViews
   AllowOverride None
   Order allow,deny
   Allow from all
</Directory>

[Note: this is probably a looser configuration than I'd have in place if I was making this a permanent solution, but this was just meant as a quick-n-dirty workaround for a temporary problem.]

I made sure to do a `chkconfig httpd on` and then did a `service httpd start` to activate the web server. I then took my web browser and made sure that the repo filesystem's contents were visable via web client. It wasn't: I forgot that our standard build has port 80 blocked by default. I did the requisite juju to add an exception to iptables for port 80 and all was good to go.

With my RPMs (etc.) now visable via HTTP, I logged into the VM that I was actually needing to install RPMs to via yum. I escalated privileges to root and created an /etc/yum.repos.d/LAB.repo file that looked similar to the following:

[lab-repo]
name=RHEL 5.7
baseurl=http://repovm.domain.name/Repo
enabled=1
gpgcheck=0

I did a quick cleanup of the consuming VM's yum repo information with a `yum clean all` and then verified taht my consuming VM was able to properly see the repos's data by doing a `yum list`. All was good to go. Depending on how temporary this actually ends up being, I'll go back and make my consuming VM's .repo file a bit more "complete" and more properly layout the repo-server's filesystem and HTTP config.