The Amazon Cloud

There are two things you can rent from Amazon: 1) CPUs (processors) and RAM (memory), 2) Harddrive space. I'll start explaining by how to rent CPUs and RAM to do your analysis on the cloud and will then explain how to rent harddrive space. CPUs and RAM are provided in what Amazon calls instances. The rent for an instance is priced according to how much processing power and memory you want to rent; the more of each you want the more money you have to shell out (at present the biggest, most powerful instance you can rent runs you about $2.30/hour).

Prerequisites and Starting your Server in the Cloud

Somehow I have to interact with the cloud. I will log into our server at Amazon using openSSH which should be installed by default on any Mac, Linux, or other Unix-like machine. I could also log in to the Amazon instance using PUTTY on Windows (I won't explain this here, but there is a short tutorial regarding this here).

You will only have to create a key-file for your computer once. After you set up the key you can keep using it in the future. The first thing I need to do is create a keyfile that I will use to log in to the instance I will create below. First I need to sign in to Amazon's Web Services here and sign in using the email address and password that either Mariya or I can give you (if you're not part of our lab you'll have to sign up using a credit card). Once signed in you need to select the EC2 tab at the top and select key-pairs in the lower right-hand corner. You should be on a webpage that looks like the following.


Select "Create Key Pair" at the top. Give the key-pair a name and hit enter. A file with the extension .pem should start downloading onto your computer. Locate the downloaded file (it probably went to your downloads folder) and move it to some place you will be able to find it again in the future. Do not share this file with anyone! If you loose the file go to Amazon EC2, delete your key-pair and create a new key-pair to be safe. By default ssh keys are stored in a hidden directory called .ssh in your home folder.
cp ~/downloads/*.pem ~/.ssh/
## downloads is my downloads folder
Go to your .ssh folder and type ls -lh. You need to check the permissions of the key you just created.
-rw-r--r-- 1 user group 1.7K Sep 25 14:15 MyCloudKey.pem
As you can see everyone is allowed to read your key (if you can't follow me on this one read about file attributes here). Such lax permissions are a security risk since everyone can read your key; ssh won't let you use this key. Type the following to fix this problem.
chmod 600 MyCloudKey.pem
Now the permissions should look like this:
-rw------- 1 user group 1.7K Sep 25 14:15 MyCloudKey.pem
With the key-file created and set up we can start an instance and log in. Select Instances in the left-hand panel.


Select "Launch Instance". We're working on setting up our own operating system for the cloud and write a wiki on it; for now search for Amazon Machine Image ID
ami-ea837b83 under Community AMIs. This AMI is a Debian Linux OS that will work just fine for our purposes.


Select this image.


The image I selected to boot up runs a 64-bit Linux system, and at Amazon you have the oprion of running 64-bit Linux on one of 6 different server architectures. Select the one most suited to your needs (ECUs correspond to CPU power; faster and more CPU cores=more ECUs). Click continue and don't change the default settings if you don't know what you're doing. Since we are all using the same account choose a descriptive name for the instance you want to run; this will make it easier for everyone if we run multiple instances at the same time.


Choose the key-pair that corresponds to the key installed on your local machine.


Choose the security group "Use_me"; more on security groups here.


Continue until you see the following. Hit "Launch" and then the "Close" tab on the following page.


Go back to the instances tab and select your instance. It will take a while for the instance to be up and running; if after a few minutes you still see a message that indicates the instance is starting up refresh the browser tab. Here's what you should see (we're mainly concerned with two pieces of information: 1) the public DNA and 2) the zone in which our instance is running).


With the public DNS we can log in to our instance. Here, we will log in as the root user. The user you want to log in as depends on the set-up of the OS you're using.The public DNS will be different for every instance you start! Just copy and paste it.
ssh -i ~/.ssh/MyCloudKey.pem
You will be asked to accept an RSA authentication key; type "yes". If all went well your prompt should now look somewhat like this:
Congratulations, you are now working in the cloud! Read through the information provided on working on remote machines and start transferring your data back and forth and install the software you need for your analyses.
Do not forget to power off the machine when you're done! Either go to the Amazon EC2 website and select terminate from the "Instance Actions" menu or issue the command poweroff in the shell while logged in to the remote server. Amazon will charge money for a running instance regardless of whether or not you use it. Also remember to transfer your data before shutting down the instance; every file left on an instance will be deleted when terminating the instance.


Creating a custom log-in commandWe can make the log-in a bit neater and streamline the process by creating our own custom log-in command with a simple shell script. First, create the file that will contains the ssh script. Here I'll create a file called cloud (this will also be the name of the command you can call from the shell to connect to the remote server).
touch ~/.ssh/cloud
Now edit the file and insert the following.
ssh -i ~/.ssh/BastiCloud.pem -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null root@$1
Save the file and make it executable. Note that the two options called with -o suppress Host Key checking which is convenient but may be a security risk. I don't think it's an issue for the cloud, but judge for yourself.
chmod 700 ~/.ssh/cloud
Now you have and executable which take the public DNS as an argument. With super user privileges (i.e., as root; if you haven't yet throw off the shackles and enable the root account on your Mac) you can simply create a symbolic link to your executable from /usr/bin where most of the other executables of your OS live. This way you can call your newly created executable/command from anywhere.
ln -s ~/.ssh/cloud /usr/bin/
Now the executable is permanently added and connecting to an instance in the cloud is a piece of cake now. Just issue the command cloud and pass the public DNS on to it.
Log out of the session using the command exit or Ctl-d.

Storing data persistently with AmazonIn order to get more storage space and store your data persistently with Amazon you need to create a volume (i.e., a harddrive). Select "Volumes" in the "Elastic Block Store".


Select "Create Volume".


Insert the desired size of the volume in GB or TB and most importantly select the "Availability Zone" that corresponds to the zone your instance is running in. You will not be able to use a volume that was created in a zone other than the zone of the instance you want to connect it to (see above). You can also load a snapshot into your volume. You can create snapshots from volumes (i.e., backups containing all the data on the volume and store it with Amazon; more on this later). The instance I started earlier is running in zone "us-east-1d" -- so somewhere in Virginia there's a server under my control! I don't need much space for this demo and will create a small volume of 10GB. Obviously, the size you need will depend on how much data you want to work with (to uncompress and work with actual data I'd suggest 500GB to 1TB; read Titus' blog post on computational needs for next-gen analysis to get a better feel for what you'll need).
Next, select "Attach Volume" and select the instance to which you want to attach your volume. You'll also need to select the device path. The default, /dev/sdf, is fine. If you want to attach multiple volumes just keep going up the alphabet (/dev/sdg, /dev/sdh...). Remember the device path!


Select "Yes, Attach" and the volume will get attached to your instance.


Now that the volume is attached I have to go back to the shell and log back into the instance if I logged out. Before being able to use the volume I need to create a file-system on the volume. I chose the ext2 file system, but you can create another file system if you want to. The two I would recommend are ext2 and ext3. ext2 has been around forever and there are tons of recovery tools around should something go wrong. ext3 is an improved version of ext2 with some performance advantages, in particular journaling. A lot of tools that you can use for recovering data from ext2 file systems can also be used to recover data from ext3 file systems.
For reasons that are not apparent to me the volume you created as /dev/sdf can be found at /dev/xvdf.
mkfs -t ext2 /dev/xvdf
Now it's time to mount the device somewhere to actually use it. If you don't already have a directory on which you want to mount the device create it.
mkdir /some/dir
By default I usually use the directory /mnt to mount devices.
mount /dev/xvdf /mnt
Now you can move data to mount and install software there. For a cost of $0.10 per month Amazon will store these files until the volume is deleted. You can also create snapshots (backups) of your volume and share these snapshots with others. Check out working on remote machines to figure out how to get your files to and from the volume. Before you terminate an instance unmount the volume.
umount /path/to/where/device/was/mounted
## I use /mnt so I will unmount as follows
umount /mnt
Then detach the volume by selecting the volume and the tab "Detach Volume".


If the volume isn't needed anymore delete it by selecting the volume and hitting the "Delete Volume" tab. The volume and the data it contains will be deleted.


Obviously, you don't have to delete your volumes. Using "Volumes" you can store your data and programs at Amazon while you're working on and with them or archive them long-term. Crunch the numbers first though; this can get expensive rather quickly. On the upside, availability and security are excellent if you store your data with Amazon or any similar service.

Creating Snapshots
By creating snapshots from volumes (see above) data can be backed up on Amazon and shared with collaborators or made public to everyone. I will describe this process later. If you need help with this right away bug me at and I will get to it quickly(ish).