Geeking Out

Running GitHub Enterprise in Amazon EC2

Update (2015-03-29): GitHub now supports an EC2 appliance and this information is no longer accurate. It is useful only for historic reasons or general background when confronting similar challenges from other vendors.

GitHub’s hosted offering allows companies to run their own private GitHub appliance behind their firewall.  It is distributed as an OVF container that runs under VMWare or VirtualBox.  But what if you want to run it, along with your other infrastructure, on AWS?  Here is the (completely unsupported) way to do it!

The goal is to get the base GHE virtual appliance running on AWS so that we can install the latest GHE software package on top of it.  This package takes care of updating and configuring everything.  Once the software package is installed, the appliance behaves just like its on-prem cousins.

Break into the virtual appliance

First we need the virtual appliance in a form that can be moved into AWS.  Download the current virtual appliance from the GHE dashboard and find a way to get at it.  You may be able to just launch it locally in VMWare or VirtualBox, if you are able to get root, but I did not do this Instead I extracted the archive (it is just a tar file) to get at the VMDK disk image inside, and attempted to import it into EC2 using the AWS VM Import/Export tool.

This requires some fiddling, because you have to install the old EC2 command line tools and get all the options right, with some plausible guesses about what is inside.  Here is the command I ended up running:

ec2-import-volume /var/tmp/github-enterprise-11-10-320-x86-64-disk1.vmdk \
 -f vmdk -z us-east-1a -b agperson-ghe -o $AWS_ACCESS_KEY -w $AWS_SECRET_KEY

Once the import is complete (you can check the status with ec2-describe-conversion-tasks) I attempted to launch it — and failed due to an unsupported kernel.  But never fear!

Figure out what’s under the hood

If you don’t want to do this yourself skip to the end of this section where I tell you the secrets.

The VM import creates an EBS volume.  It may not be runnable, but it is mountable!  So start up a one-off Linux instance and attach the volume to it.  The data is stored in LVM, so you may need to install the lvm2 package and then run lvmdiskscan to see the volume group.

Run vgdisplay to get the name of the volume group (“enterprise”) and activate it by running vgchange -a y enterprise. Now you can mount the root volume:

mkdir /ghe
mount /dev/mapper/enterprise-root /ghe

Poke around in this volume a bit and you will establish that the virtual appliance comes with Ubuntu 11.10 Oneiric (wow!) and is 64-bit. With this information, we can launch an equivalent instance in EC2.

Setup an Amazon-happy instance

Launch a new EC2 instance using the publicly available community AMI from Ubuntu for 64-bit Oneiric (make sure you are using the released version — in us-east-1 I used ami-13ba2d7a). I chose an m3.large which is a good baseline based on GHE’s requirements. Make sure to attach a second volume for data or make the root volume large enough to hold all your repositories, and use SSD storage because it makes life better. Put your new instance in a security group that allows traffic on ports 22, 80, 443, and, if necessary, 9418 (the git:// port, which is non-authenticated so often not used on GHE installs).

When the instance launches, login as the “ubuntu” user and become root. Modify the /etc/apt/sources.list to point all archive stanzas at old-releases.ubuntu.com (including the security ones). Run an apt-get update && apt-get upgrade and wait a few minutes.

Now you need to copy over all of the files from the virtual appliance. You can either do this via SSH from the one-off instance you launched earlier, or detach the volume from that instance and repeat the steps to get LVM running and attach it to the new instance. Either way, use rsync to get everything important onto your new VM. Rackspace offers a helpful tutorial on doing this, including a good set of directory paths to exclude. I used their list and everything worked fine. The command I ran with the volume mounted locally was:

rsync --dry-run -azPx --exclude-from="exclude.txt" /ghe/ /

(and once I was satisfied, I ran it again without the “–dry-run” flag).

Bombs away!

Before rebooting, copy your SSH key into /root/.ssh/authorized_keys in case anything goes wrong (and take a moment to ponder who Ben is and why his “HacBook-Air.local” key is on our server!). Then restart the instance and, when it is done booting, visit it via HTTPS to see the beautiful GHE setup screen! Upload the latest software package and your license key and give it half an hour or so, and if everything goes well, you will have a fully-functional GitHub Enterprise instance in the cloud.

Note that after the software package installs you will no longer have root access to the server. A pity.

A few other important steps are left as an exercise to the reader — lock down access, setup SES or some other email sending capability, stay in compliance with your license, and take frequent backup snapshots! Good luck!