Last modified: 21 Jul 2009

Using a Virtual Computing Laboratory

I have been working fervently to create a virtual computing lab (VCL), primarily for the ITS program but also for the CSS program. CES use is possible but not as likely.

Vision

Traditional university lab computers are placebound, forcing faculty and students to be physically present to use them. In today's highly mobile, widely dispersed society, it is difficult and often considered unnecessary to be physically present to do something on a computer. Why can't things be done remotely, using whatever network device is available -- and capable enough -- to interface with the remote computer?

Remote access to a computer saves commuting costs and time, allows access to powerful, capable computers from ones much less powerful or capable, provides access to software which can only be installed on university computers, and reduces the cost of upgrading and maintaining classroom computers since existing computers can be used to access the remote ones. Faculty and students can use their own personal notebooks (for example) to access the remote ones.

However, there is a catch -- we would need a lot of powerful remote computers... or we could essentially time-share the ones we have by allowing users to schedule time on them and expand the number of available "machines" by exploiting multiple virtual machines per remote real computer.

Definition of a VCL

A virtual computing lab (VCL) is a facility that lets authorized people use computing machines from a networked device such as a laptop or advanced smartphone (as opposed to entering a real lab room and sitting down at a real computer). Such use could be anything from installing an operating system to running an application. Although typically there is one computing machine per person, one person could be using several computing machines at once, perhaps to simulate an Information Technology department's servers or to run a complex web indexing and searching algorithm.

The "computing machines" are either a real physical machine or a virtual machine running on a real machine. There can be many virtual machines running as "guests" on a real "host" machine, using some kind of hypervisor software.

A VCL can provide predefined machines or allow creation of user-defined machines for faculty and students. They might need them for in-class assignments, out of class assignments/projects or research projects. Lab staff or faculty can create pre-defined virtual machines with specific software installed and configured on them, or a virtual machine can be created by the user for whatever purpose. Given the proper hardware support, a real machine can be used to satisfy the needs of the user.

Machines are allocated using a reservation system. Users see the display of their reserved machine(s) and can control them by mouse and keyboard, attach virtual media (e.g., CDs and floppy diskettes) to them, and, if requested, can power down or power up their machine and watch it as it boots up.

Using a VCL

To use a machine, the user must first either:

Prior to the reservation time, one or more IP addresses and login accounts will be provided to the user. The user then connects to the remote machine(s), getting a remote display of the machine as well as a keyboard and mouse to provide input. The user logs in using the login accounts provided, and starts using or modifying an existing machine, or creates a new virtual machine or bare metal image from virtual media (e.g., bootable DVD images).

The user often has to use media to install operating systems or applications. However, there is no physical access to the machines, and there may be no real machine anyway. Consequently, images of media are stored on disk and can be made appear to the machines as if the original media was inserted into a real physical device.

Media images are not the only things that need to be written to and read from disk. VM images must also use disk storage, if one wants to start a predefined VM or save a modified or new one.

Disk storage is implemented as a cluster of storage nodes with local disk drives combined into one aggregated file system.

VMs are copied from the storage cluster prior to the time reservation, and copied back after the reservation, preserving any changes. The VMs might also be able to be run directly from the cluster, if that mechanism performs adequately.

Definition of Storage Cluster

A storage cluster is a collection of networked computers that are dedicated to storing and loading huge files (from 600MB to about 10GB apiece), reflecting either CD/DVD ISO images, or complete VMs images. There are multiple computers to eliminate a single storage bottleneck, and to provide some reliability through redundancy of data. Storage clustering software (e.g., glusterfs or Lustre) glues the local filesystems together to make it appear as one filesystem.

Storage performance is limited by the local (i.e., same room) networking fabric The fabric will be 1Gb ethernet using TCP/IP until we can afford 10+Gb ethernet, InfiniBand, or other high-performance equipment. Initial measurements indicate about 80MB/sec read and 50MB write between one client and one server, with large blocksizes but without tuning or striping (or exploiting redundancy for high availability). The intent is to have no more than three clients per storage computer/server to maintain reasonable performance.

VCL Constituents

There will be two VCLs in two separate locations, each with its own storage cluster. No communication is allowed between the clusters, since their communication could easily consume all network capacity. One VCL's real machine inventory consists of 64-bit dual quad-core CPUs with 8 GB of RAM, packaged as "blades".(quantity 10). The other VCL consists of 64-bit quad-core CPUs with 5 GB RAM, housed in a workstation case (quantity 20).