Last modified: 11 Dec 2023
Return to a different web page:
Follow links below for table of contents for this page:
The VCL with blades (VCL 1) was used for this phase. After experiencing several limitations and annoyances with VMware Server and VMware ESXi, other virtual machine managers were considered.
Key desirable criteria were:
We will probably only use Linux and Windows.
If the host OS supports the storage device, it could be used.
We needed something to keep down the amount of storage consumed and transmitted. Copy-on-write seemed to be ideal for that: just load/save/transmit the changes.
It must be fairly easy to script whatever needed to be done, including defining the VM, modifying it after definition, and controlling the VM.
We didn't know what snags we were going to encounter, so it was imperative that there was a rich set of things that one could do, so we could read the documented features, experiment with them, then use them as needed.
It is important to be able to support 64-bit guests on 64-bit hosts, especially for the near future, when everything will be 64 bit.
This may become important in later phases, when it is imperative to keep a VM running when its host needs to be upgraded. One drawback is that the virtual disk can no longer be on the same host, as a local disk is down when the host goes down. One way to allow this is to use iSCSI. Another is to use a network file system for the virtual disk files and snapshots.
We have only used NAT, to allow outward connections and inward mapping of host ports to guest ports. We will want to use other networking in the future.
We would like to be able to watch a VM boot and shutdown, even though the OS's remote desktop client needs to eventually shutdown itself. It is a bonus to use remote desktop protocol (RDP) to do so, since RDP clients are easy to find.
Snapshots enable the copy-on-write feature to work properly. In the future, it may allow better control of a reserved VM (e.g., snapshot before poweroff to preserve state).
Oracle's VirtualBox open-source VM manager or type II hypervisor has all of those characteristics. We started out by using VirtualBox 3.1.4, and as of October 2022, are now using 6.1.38 and 7.0.2.
Now that a powerful and versatile VM manager was chosen, its critical features were tested:
This was done in a fairly incremental fashion, learning new features only as needed. Once we felt comfortable with a feature, we would normally try to script it somehow. Scripting was done in Perl, with the VM work done by the provided VBoxManage subcommand.
This is a complex topic, perhaps better convered elsewhere. We shall focus only on the things that needed to happen to make the scripts work.
There are two major sources for scripts: /root/scripts and /usr/local/bin. "/root/scripts" is used for nearly all VM creation, management, monitoring and orchestration, while "/usr/local/bin" is used for user interaction with the VMs, primarily through ssh_vbm. Because ssh_vbm uses some scripts in "/root/scripts", the permissions for /root must be changed to:
chmod o+x /root
Over the years, configuring a new compute node has been made easier via the /root/scripts/make_prep_vcl_tar script, which creates a .tgz file from the current compute node and places it in /prep_vcl.tgz. That file is copied to the new compute node and untarred from the root directory "/":
cd / tar -zxvf prep_vcl.tgz
which copies most of the files to the right locations. After that, one uses these commands on the new compute node (restarting between them):
cd /root/scripts ./prep_vcl 1 ./prep_vcl 2
which install additional packages and can propagate the files to other new compute nodes within a new VCL. But often, this is only done for one of the nodes, and then a Clonezilla image is made of that node and restored on all of the others, using a small script (cn, modified on the first node prior to imaging it so it will work for all nodes) to change the node name and set up IP addresses for each other node.
This is where all virtual disks which will be considered "immutable", or "multi-attach" reside. This directory is often a symbolic link to /home/vms. There are usually subdirectories relating to a course, research project or other purpose, containing the set of VMs needed for that entity.
This is where all of the user home directories and hence the VM definitions, log files and user-specific difference images reside. This directory is often a symbolic link to /home. There are usually subdirectories relating to a course, research project or other purpose, containing the VM user accounts or team accounts related to that entity.
.iso files for guest operating systems to use, usually mapped to a shared folder.
Application packages or scripts for the guest operating system to use, usually mapped to a shared folder.
Users can take a lot of snapshots per VM. It is good to either spread out these snapshots across multiple, mounted disks (or use LVM to extend a filesystem across disks) or on a TB-sized single disk. The placement of the snapshots folder is determined by the specification of each VM definition.
When preconfiguring a VM, we often use host ports in the 33890-33999 range, so they have to be opened as well. It is from the port range 50000-55000 that we reserve host ports for use by the VMs or user/team accounts. Such uses typically include VRDP ports, and NAT port mappings for ssh and/or guest RDP.
We need to allow the VRDP server to authenticate users (VBoxManage modifyvm vm_name ‐‐vrdpauthtype external). This restricts the display of a powered on VM to only the users who know the user name and password. To support teams, we allow multiple users to simultaneously view the display and type or use the mouse (‐‐vrdpmulticon on).
A file called /etc/pam.d/vrpdauth must exist and have these contents:
#%PAM-1.0 auth sufficient pam_unix.so nullok try_first_pass debug auth requisite pam_succeed_if.so uid >= 500 quiet auth required pam_deny.so account required pam_unix.so broken_shadow account sufficient pam_localuser.so account sufficient pam_succeed_if.so uid < 500 quiet account required pam_permit.so password requisite pam_cracklib.so try_first_pass retry=3 password sufficient pam_unix.so sha512 shadow nullok try_first_pass use_authtok password required pam_deny.so session optional pam_keyinit.so revoke session required pam_limits.so session [success=1 default=ignore] pam_succeed_if.so service in crond quiet use_uid session required pam_unix.so
Using the command VirtualBox from labadmin (usually via VNC or xrdp to the host), a VM and an empty virtual disk are manually created or configured. An OS is installed in the VM and configured as desired, though the configuration is often minimal. Here are the usual steps:
Security apps such as anti-virus and firewalls are common if the network is exposed to the internet (e.g., "bridged"), but not otherwise. Specialized needs may require other apps, such as SQL Server Express.
If licensing allows, install the VirtualBox Extension Pack on the host (only needs to be done once per host, per VirtualBox release). The default extension pack provides an RDP service (VRDP) that, when configured and enabled, can show the display of the VM from boot up (including the BIOS screen) to shutdown.
VRDP provides many benefits to students, who can then:
The issue for creating a VM is defining the "virtual remote desktop environment" (VRDE) to enable it, set up a unique host port, enable multiple simultaneous connections if desired (useful for teams or when the instructor wants access to student VMs), and provide a way to authenticate (usually, the "external" method is used since it is a simple addition of a file to Linux PAM).
If you can't install the VirtualBox Extension Pack, then it becomes very important to configure a guest remote desktop service inside the guest operating system to make the user experience much more palatable. This is usually the TermService service on Windows or the xrdp service on Linux. Another possibility, more prevalent on Linux, is to configure, enable and start the sshd service. Make sure the firewall is open for the service's port(s).
If this is a new VM definition, selected parts of the definition are saved in the base VM information file, together with a unique name for later use.
Once the virtual disk holds an OS, that is the essence of what people think the VM is: the guest operating system. The virtual disk is then detached from the VM and the VirtualBox Media Manager, and copied (or cloned using VBoxManage clonehd if on same host/compute node) to storage and distributed to other compute nodes generally to their /classroom/vms/xxxx folder where "xxxx" is the course name, the research project name, or other name.
Note that a virtual disk is marked as "immutable" when it gets re-added to the media manager (VBoxManage openmedium disk xxx.vdi --type immutable). This does not mark the virtual disk itself as immutable -- that's really a characteristic of how a particular VM records how the disk is to be handled; i.e., as if it were unchangeable. But there is nothing that prevents the disk or a clone of it (VBoxManage clonehd old.vdi new.vdi) from being changed, except if it added to a VM as immutable.
Later VirtualBox releases added "multiattach" as a means of allowing the same virtual disk to be attached to multiple VMs at once, using copy-on-write for VM-unique changes (or "differencing"). This is much easier to manage, as VirtualBox handles the automatic differencing vs. taking snapshots manually. However, "immutable" will continue to be used in this documentation.
We use immutable disks to save space and provide a consistent, common class or template from which to create user-specific instances of a VM. Such a VM instance is unique to the user. Uniqueness is desired mainly for Windows, but making a unique guest OS causes much larger "difference images" than just using the guest OS would, since all the differences between this VM's virtual disk and the immutable virtual disk are recorded and unique security IDs are generated.
Actions such as licensing the OS, providing a unique OS security identifier (SID), generating unique workgroup names, assigning unique account passwords for the privileged itadmin and unprivileged ituser accounts, and setting account password restrictions cause the VM instance to be unique. A unique workgroup is used to provide isolation for a security domain, and unique account passwords prevent users of a guest OS from using other users' guest OS. Unique workgroups is not really an issue with NAT, but would be for other networking types.
Scripts are used to create a VM from a template file of desired VM characteristics. A given base VM name is used to look up the corresponding base VM information. Specifying the course (class), research project name or other name provides a grouping mechanism used later on for control of the VMs, using specified user or team accounts as members. User/team accounts are created as needed and associated with a special OS group called vbvmuser. Each user/team account can own multiple VMs, as recorded in their "VirtualBox VMs" subdirectory.
The task of the scripts is to create a unique instance of a VM from preconfigured information — namely, the base VM information and the virtual disk with a guest OS installed on it (its name is specified in the base VM information file).
The first deficiency noted with VirtualBox was that there was no way to create a VM from a template. Thus, the make_vb_vm script was born (resides in /usr/local/bin). It is a rather simple script that hard-codes what is thought to be common VM characteristics, and it takes a few arguments to override some defaults. Those arguments are: VM name, OS type audio state (on or none), the disk controller type, the IO APIC state, the PAE state and whether or not page fusion is desired.
Using a centralized script to create VMs allows changing of the core VM specs for all subsequent VM creations, but won't change any of the existing ones. The script needs to be modified to change the core specs; this was before we started to use external files to provide configuration information.
But what are the "core" specs?
Both the USB 2.x (usbehci) and USB 3.x (usbxhci) options should also be enabled.
This is allowed only if the VirtualBox Extension Pack is licensed and installed on the host.
Later on, the port number must be made to be unique for each VM running on the same host.
The VM is created and registered using the createvm subcommand of VBoxManage, and then the shell VM is modified using the modifyvm subcommand, using all of the settings above. Once that is done, storage controllers are added for the specific storage controller using the storagectl subcommand, and an empty DVD drive and floppy drive are attached to the machine using storageattach.
That's all this script does. Virtual disks are attached and any other modifications are done later, using another script.
The VM instance creation process consists of these major steps:
Using sysprep was common in the first few years of the VCL, but became less important as we migrated to using trial versions of Windows, where licensing issues were not as complicated and we didn't consume a lot of licensing keys per quarter.
The VM instance creation process is implemented in the create_vb_vm_for_user script and the vb_vm.common common subroutine file. Both reside in each compute node's /root/scripts directory. The script calls make_vb_vm.
create_vb_vm_for_user creates a VirtualBox VM for a given course number (class)/research project/other use, user/team account, and base VM name. It relies heavily on the vb_vm.common common subroutine file. The subroutines were originally part of the script, but removed then included ("required") when other scripts were written to allow reuse of the subroutines in the newer scripts.
Some information is hard-coded at the top of the create_vb_vm_for_user script. It should be placed in a configuration file in the future. That information includes the paths to:
/root/create_vb_vm_for_user.dat is the source of the template definitions of the VMs. It defines prototypical VMs, configured a certain way for the needs of various courses, thereby providing a common definition file. The file must be secured such that only root can read it, since it contains passwords. Once the template is defined (and probably debugged as well), it is often copied to all other cns for consistency and to make backup copies.
The items in the template are in tab-delimited columns, denoting:
This is the arbitrary name for this VM's definition. Usually, the OS type is somehow encoded (present in another field), as well as the intended use of the VM and the quarter (or other relevant info). For example, win2016_tinfo452_spr2020 means that the Windows2016 OS type is used, the course is "tinfo452", and the quarter is "spr2020".
This naming helps a VM definer to copy and modify previous quarters' template lines and to retain what was done in the past by not redefining the same template line every quarter as things change.
The VirtualBox OS type, from VBoxManage list ostypes. Use the closest name that matches the guest OS you will install, as this typing information helps VirtualBox select defaults and virtual hardware features that the OS will support (see this web page for more).
The number of CPUs (host cores) assigned to the VM.
The amount of RAM in megabytes that this VM will be allocated when it is powered on.
The state of the audio device, either on or none.
The name of the administrator or root account on the OS, followed by a slash, then its password.
IDE, SATA or SCSI.
The file names of the host disk files representing virtual disks to attach to the VM. For IDE controllers, virtual disks are assigned as IDE 1 master, IDE 1 slave, or IDE 2 slave, in that order. For SATA controllers, we assign a new number for each disk. Each disk file name is separated by a comma and has an absolute path.
If a pound sign is in the disk_file specification, it indicates that there is further information to indicate how to interpret, create or define that disk. This is called the disk file's "type" information. Here is a definition of the types:
The disk file will never change. A snapshot "difference file" will be created to hold the changes to the disk file made during the running of the VM's guest operating system.
We want to share the same disk file amongst many VMs to save space, and use user-unique copy-on-write or "differencing" files to hold any user-specific changes to the virtual disk. Originally, only "immutable" was allowed by VirtualBox to provide copy-on-write, but it would delete the automatically-generated differencing file (a snapshot file) after every VM termination, which would delete any post-VM-creation actions we might want to persist (such as changing root passwords per user). Consequently, we started taking snapshots of our own, which allowed our changes to persist, and eventually found that an "autoreset" snapshot characteristic was what was deleting their differencing file, and turned it off. Still, it wasn't a good mechanism. Eventually, VirtualBox supported "multiattach", which fits what we want to do.
"multiattach" started failing when VirtualBox 6.0.0 was released:
VBoxManage: error: Cannot change type for medium '/classroom/vms/tinfo442/centos73.vdi': the media type 'MultiAttach' can only be used on media registered with a machine that was created with VirtualBox 4.0 or later VBoxManage: error: Details: code VBOX_E_INVALID_OBJECT_STATE (0x80bb0007), component MediumWrap, interface IMedium, callee nsISupports VBoxManage: error: Context: "COMSETTER(Type)(enmMediumType)" at line 708 of file VBoxManageStorageController.cpp VBoxManage: error: Failed to set the medium type
This was reported to VirtualBox as ticket 18296, but never marked as fixed. A workaround was found and implemented:
VBoxManage -q storageattach "centos1" --storagectl "SATA Controller" \ --port 0 --device 0 --type hdd --medium "/classroom/vms/tinfo442/centos73.vdi" \ --mtype normal
VBoxManage -q storageattach "centos1" --storagectl "SATA Controller" \ --port 0 --device 0 --type hdd --medium none
VBoxManage -q storageattach "centos1" --storagectl "SATA Controller" \ --port 0 --device 0 --type hdd --medium "/classroom/vms/tinfo442/centos73.vdi" \ --mtype multiattach
This is a single-use disk file, which can only be used by the user it is defined for. All changes that are not explicitly snapshotted by the user will change the disk file directly.
Put the disk file in this user's directory instead of sharing it with others. In addition, if a string called "%uwnetid" is found in the disk file specification, it will be replaced with the user's UW Net Id prior to creating the file.
Note that each virtual disk file needs to have a unique user identification (UUID) internally or VirtualBox will not allow assigning it to more than one user on the same host. A UUID is created every time a virtual disk is created or cloned, but manually copying (e.g., via the cp command) the virtual disk file won't create a new UUID. Sharing a virtual disk file avoids the UUID issue because snapshots based on the shared file have unique UUIDs themselves.
Clone the disk file specified for this user, providing a unique instance of the VM's disk. All non-snapshotted changes made by the user update the cloned disk file.
A forward slash following "clone" means that some cloning variation is desired. The default clone variant is "Standard". If "create" was specified instead of clone, the size of the disk in MBs follows the required clone variant specification; e.g., "create/Standard/5000" would create a new virtual disk with a dynamically-expanding size ("Standard") and a 5GB cap on its size.
The full path to the .iso file for the CD/DVD to attach. This is usually used when the user needs to install the OS from the CD/DVD or refer to an OS source CD/DVD.
A list of comma-delimited port specifications, which are each in this form: NIC number/port/protocol. For example, to open NIC 2's port 80 on the guest through the NAT, one would specify:
2/80/tcp
The script would generate a unique host port number for this guest port number, and record it for the user.
Another common specification is for remote desktop service via NAT port mapping:
2/3389/tcp
List of NIC settings, in VBoxManage modifyvm syntax; each setting for a NIC is delimited by a comma
Here are values that can be substituted for, if they are surrounded by percent "%" signs, in the nic_settings:
An IPv4 address from a host networking interface will be substituted.
The user name will be substituted.
The vde_port number from /root/scripts/manage_vde.
This is not a direct substitution, but specifies how the part after "/" is used to create a unique DHCP server and dynamic IP lease subnet for each user when a internal network is used.
Over time, we have determined that it is very useful to have at least two NICs. One is connected to an internal network that only all of the VMs for one user can use, and the other is connected to NAT for internet access from within the guest operating system. For example:
--nic1 intnet --intnet1 %user%_intnet %dhcp/192.168.u/100% --nictype1 82540EM,--nic2 nat --nictype2 82540EM on
For the user "srondeau", this says that NIC 1 is an internal network named "srondeau_intnet" , which uses a DHCP service in a user-unique subnet starting with "192.168.u" (where "u" is a unique number per user, starting at "100" (the meaning of "u/100")), and the standard "Intel PRO/1000 MT Desktop", type "82540EM". The second NIC uses NAT and the same "Intel PRO/1000 MT Desktop" virtual NIC.
"on" or "off" (normally "on").
Semicolon-separated list of "snapshot_folder#weight" specifications. The snapshot_folder is the absolute path to the directory to store snapshots, and the weight is a count of how many times to use that folder. The folder is assigned randomly, and used up to that count number of times. This attempts to distribute the use of the VM's snapshots across those folders, in proportion to how much each folder/disk is expected to be able to handle.
"on" or "off" (normally "off").
"on" or "off" (normally "off").
Here is a sample file, with passwords appearing as 'xxxx':
#base_name os_type CPUs RAM_in_MB audio_state admin_acct/pw disk_controller disk_file cd_file nicnum/guestport/protocol nic_settings ioapic_state snapshot_folder_list pae_state page_fusion win10_tinfo457_sum2020 Windows10_64 2 4096 none administrator/xxxx SATA /classroom/vms/tinfo457/win10_ent_installed_20200603.vdi#multiattach2/3389/tcp --nic1 intnet --intnet1 %user%_intnet %dhcp/192.168.u/100% --nictype1 82540EM,--nic2 nat --nictype2 82540EM on off off fedora31_rdptest_sum2020 Fedora_64 1 2048 none root/xxxx SATA /classroom/vms/rdptest/fedora31_rds_updated_20200824.vdi#multiattach2/22/tcp,2/3389/tcp --nic1 intnet --intnet1 %user%_intnet %dhcp/192.168.u/100% --nictype1 82540EM,--nic2 nat --nictype2 82540EM on /ssdm3.5t/snapshots#100 offoff
/root/reserved_ports records information about network ports. There is a range of network ports on the host that are open and reserved for use by the VMs. They are used to prevent re-allocation by other VMs; otherwise, there would be no centralized record of them (VirtualBox does record the information with each VM). This tab-delimited information is recorded:
Here is a sample file:
54455 20100405 16:27:53 _431team3 w2k8_web http 53838 20100405 16:27:54 _431team3 w2k8_web https 53092 20100405 16:36:10 _431team3 w2k8_web ms-wbt-server 53075 20100405 16:36:16 _431team3 w2k8_sql ms-sql-s 54630 20100405 16:44:03 _431team3 w2k8_sql ms-wbt-server
Most of the time, ports are reserved during the VM creation process. However, users can also reserve ports. This happens indirectly when they create new web servers to periodically update their screenshots, via the screenshot command sent via ssh and processed by ssh_vbm.
/root/create_vb_vm_for_user.userid.log is a per-user log (i.e., "userid" varies) that records with a date-time stamp what happened during all VM creations for that user.
This is where new user directories will be created. This is under the "course" directory which groups users in the same course (and section). This is all under "/classroom/home", so a user called "user123" in course "tcss431" would have a home directory in "/classroom/home/tcss431/user123".
The .vcl file resides here. It records user name/password, host IP, and VM instance and optional port information for each VM. This file is sent to the user's or team's account on "cssgate.insttech.washington.edu", in the corresponding "/home/INSTTECH/acctname" home directory. This is done after "/root/scripts/harvest_vb_vm_info" is done over the entire course for all hosts, and then actually distributed to cssgate via "/root/scripts/gather_vb_vm_info".
A log of ssh commands sent to the VMs, called ssh_vbm.log is stored in the user/team account's home directory.
The VMs are stored in the home directory, under the VirtualBox VMs folder. Often, due to immutable/multi-attach disk files, the only thing saved in the VM's subdirectory are the snapshots, whose size may be significant. Snapshots for Windows disk images brought out of sysprep can easily consume 4GB just for the unique security IDs and other actions Windows takes to make the disk unique.
Other files placed in the user/team account home directory are:
A file containing a unique user number for all users on this host.
The presence of this zero-length file means that SET lab staff do not want the user to be able to manage the VMs they "own", usually because the VMs are being modified by or deleted by SET lab staff. This is set or cleared (file is removed) by "/root/scripts/block".
Files used to record information relating to the user-specific web server that handles HTTP requests for the screenshot command. There are also .html files that are named per VM screenshotted, with corresponding .jpg files that are the last screenshot image.
Files used by the screenshot command to save the process id (.pid) of a periodic per-VM ("vmname" varies per VM) process to take a screenshot every x seconds, and the STDOUT and STDERR files for that process.
The "orch" is used by the "orchestrator" of a set of VMs, and the "orch.log" file records orchestration, ("/root/scripts/orch") activity. Subfolders of the "orch" folder are named after the courses that the orchestrator manages. Each course subfolder contains information about the students (mappings of UW Net Ids to real names) and ".yaml" versions of each user/team's ".vcl" file (created via "/root/scripts/dump_vcl_file") so the orchestrator can impersonate the user/team to manage its VMs.
Information for sysprepping a Windows image can be found here.
However, in recent years sysprepping inside a VM has been abandoned for sysprepping the VM once and optionally starting the VM manually and making any necessary final adjustments from the virtual display. Also, trial versions have been also used instead of any sysprepping.
It takes far too long to automate sysprepping (i.e., getting the VM out of sysprep via the out of box experience) to get this right, but that means doing the same thing n times for n sysprepped Windows VMs. Some of that is alleviated by using VBoxManage keyboardputscancode or /root/scripts/send_sc to send keystrokes to each VM within a host/cns (or serially across hosts via ssh) and scripts placed in C:\temp prior to sysprep.
Here is an example of how to send keystrokes in parallel, which is much faster than serially:
rshall '/root/scripts/send_sc -u ituser -v win10 -k enter `hostname`'
Paths to various shared folders are listed to allow permanent or temporary attachment to VMs.
Information about the VM name, user accounts, passwords, IP addresses, and ports are placed in the user's home directory in a file called .vcl.
Presume that a preconfigured VM for Windows 10 (see win10_tinfo457_sum2020) already exists, and now want to use that VM for a class on a compute node. Currently, you'll have to manually copy the virtual disk (e.g., win10_ent_installed_20200603.vdi) that contains the OS and any additional virtual disks into its base VM directory (for now, /classroom/vms/tinfo457). Then you'll be ready to create one VM for one user (see create_batch for times when you need to create more).
Here is what one would enter as root for class "tinfo457", user "css_test" using "win10_tinfo457_sum2020", to create the VM instance "win10" owned by "css_test":
cd ~/scripts ./create_vb_vm_for_user -v tinfo457 css_test win10_tinfo457_sum2020 win10
A class requires the same set of VMs for each user or team account. After defining a template for each VM and creating the base virtual disk that the template uses, one can list the templates to use for each VM, including a placeholder for the user name and a specification for what the VM's name will be. For example, create_class_batch.dat might contain:
#tcss431 %u win7_tcss431_sum2014 win7 #tcss431 %u kali_tcss431_sum2014 kali tcss431 %u metasploitable_tcss431_win2017 metasp tcss431 %u kali2016_64_tcss431_win2017 kali2016 tcss431 %u winxp_tcss431_win2017 winxp
Lines starting with "#" are commented and will be discarded. They function as a historical log or to remind the user of the format of the file, or allow other classes to be defined and used and return back to a commented out class later.
The tab-delimited values are the class name, a placeholder for the user name ("%u"), the template's "base name", and the name for the VM as it will appear to the user. Note that each VM name must be unique for the user, which is especially important when multiple VMs (e.g., multiple win10 VMs) are required. Generally, letters from a..z are appended to the name (e.g., win10a, win10b), and each VM needed has its own line.
create_class_batch.dat is never used on its own. It is coupled with create_batch_users.dat, which lists each user on a separate line of the file. These user ids (UW Net Ids) should be members of the class currently uncommented in create_class_batch.dat.
Once those two .dat files are defined, one can run the create_batch script to create VMs for all users, as follows:
cd /root/scripts ./create_batch -nosp -noap
The arguments passed tell create_vb_vm_for_user to ignore sysprep and after-sysprep commands, in case Windows VMs are being defined (sometimes there is a mixture of Windows and Linux VMs in the class batch file).
This is an attempt at describing the entire workflow for getting VMs ready for a course.
You need to get a feel for the resources needed for the VMs, and if it is even a feasible request given the current or anticipated usage of the VCLs. The instructor may not know, or rely on you to use reasonable defaults. Some important aspects:
Cognito Forms provides a web service to create forms with conditional logic in them, and it is free for a low volume of form submissions.
Such a form was created for the purpose of requesting VMs. Its URL https://www.cognitoforms.com/UniversityOfWashingtonTacoma1/RequestVMs was linked to and the uwnetid was passed in for the email address in cssgate:/var/html/www/secure/reqvmv2.php to prefill the email address with the UW Net ID, and https://cssgate.insttech.washington.edu/secure/reqvmv2.php was shortened by bit.ly to http://bit.ly/setreqvms .
Unfortunately, the VM request form story does not end there. The email that Cognitos Forms sends is in HTML format, and one that is almost a compressed one-liner with a lot of extra formatting inside. It looks okay in a web browser, but email clients don't handle it well. Pegasus Mail strips out all of the form information, so forwarding the form entry details email to Request Tracker ends up as an empty ticket. Outlook extracts the text from the form entry HTML when forwarding, but the resulting text is not well-formatted, making it difficult to read the relevant information.
Consequently, to make the VM request information to be useful, the extract_text_from_html.pl Perl script was written and is located in \\home.tacoma.uw.edu\srondeau$\scripts\perl. It provides much better extraction and control of output than free HTML to text converters that can run from the command line, and has been customized for handling the Cognitos RequestVMs form entry details email.
For example, if from an email client such as Pegasus email, we save the body of the message (the HTML) to C:\reqvm as rvm.html and have Perl installed, we can run this command:
perl \\home.tacoma.uw.edu\srondeau$\scripts\perl\extract_text_from_html.pl C:\reqvm\rvm.html
which will result in a .out file with a name generated from the quarter, year, and course number from the form entry. The content of the file is all of the relevant VM request information, nicely formatted and readable.
OWA allows the downloading of an email message, which has a .eml file type. The extract_text_from_html.pl script mentioned above was modified in October 2022 to be able to handle files of that type as well as .html files. However, where downloaded files are saved is important, and we don't want the file to be put in the default download location. Instead, we want to specify where to save the downloaded file.
For Chrome, you should change the settings to have Chrome prompt you for where to save any downloaded file. Similar settings are possible for Mozilla Firefox, but the instructions for Chrome are listed here:
But there are steps to take to download and use the converted file. To download an email message, These steps presume you already have WatchDirectory set up as described below to detect the downloaded file and convert it:
By saving the file, WatchDirectory detects the new or changed file and extracts the information in it to create a file ending in .out. Now you have to put that file's contents on the clipboard because OWA doesn't have a way to import a file's contents into the email body. Once the file's contents are placed on the clipboard (e.g., via toclip.exe), you can compose a new email as described below and paste the contents into the email body, sending it as a new request to RequestTracker.
Needing to manually run a script on a particular file complicates the process and forces the user to move from handling email to running an external script then going back to handling email again to use the better-formatted output.
The process can be and was made easier by automatically detecting when a file is added to a directory. One way to do that is by installing WatchDirectory Pro, and then creating a task to run a Windows batch file when something happens in a predefined directory (e.g., C:\reqvm).
The events to watch for is any file (except those with a .out extension) that is created, modified or renamed in that directory. The task should be configured to run as a service. After that, one would need to change the service login to a user such as uwtacoma\srondeau to access the network share (\\home.tacoma.uw.edu\srondeau$), because the default LocalSystem service logon account cannot.
Sometimes it is far easier to put complex commands in a batch or .cmd file instead of running them directly from WatchDirectory. That is what was done, and the file named C:\watchdirectory\watch_reqvm.cmd was created, containing:
@echo off C:\perl\bin\perl \\home.tacoma.uw.edu\srondeau$\scripts\perl\extract_text_from_html.pl "%WD_FILE%"
That WatchDirectory task needs to be set up once, and tested to make sure it does. Once it does, the workflow to process Request VMs form entry information and submit it as a ticket becomes the following, for each set of VMs requested per course:
The VM definition goes in /root/create_vb_vm_for_user.dat. Often, this file is replicated across all compute nodes in a given VCL, which is useful if the demands of the course require that multiple hosts be used.
Use the number of students, the number of VMs per student, and the cores needed per VM to come up with a rough estimate of the load on a host. If it exceeds the maximum physical cores (NOT hyperthread) available on the host (minus two for the host OS to use), then you should split the list of students across more than one host.
You can do that manually, or use /root/scripts/split_batch.
A good rule of thumb is that if more than three powerful (64-core) hosts are needed, then the instructor should be told to re-think the VM needs or they way they have students use them. The reason is that there are often multiple courses that require the VCL per quarter, and overconsuming the hosts will limit the number of courses we can satisfy. Possibilities for reducing the demand are: use of multi-student teams instead of individual students (multiple VMs assigned to each team vs. to each student), lowering number of VMs needed per student or team, lowering higher-core requirements, and/or lowering RAM requirements.
Using a host that has SSDs will make the entire OS installation and updating process much more pleasant, as it can be very I/O-intensive.
It is the virtual disk that is ultimately the most important thing to create, since it holds the operating system and its configuration. As a file, it is portable to other hosts, including those with different host operating systems or CPU vendors (within the x86/x86-64 family).
Often, you will need to download and copy to a host an .iso image of the operating system(s) desired, and put it in a folder on the host named /iso. Trial versions of Windows are often used to avoid licensing issues; Windows 10 lasts 90 days, while server versions often last 180 days. However, other avenues for Windows such as Azure Dev Tools for Education may provide .iso files as well, with other licensing terms.
Generally, we avoid using UW-negotiated Windows licenses because we don't want to deplete the pool needed for lab workstations, and because the terms usually require that one version of Windows be previously installed prior to upgrading to the desired Windows version. That's not feasible to do for multiple VMs on the same host computer, which may not have ever had Windows installed on it natively.
In 2020, we use 64-bit versions of the operating systems unless there is nothing else available. It will require somewhat more RAM to run, but it is the most capable and recent instruction architecture available.
It's best to define it as requested, to determine if there are unexpected resource demands and so that the future guest OS will properly detect the hardware and use the right number and type of drivers.
Choose a good name for the VM, utilizing what OS will be installed, version information, and when it was installed or updated, so you know when a trial license may expire or if the resulting OS image can be reused later and updated. This name should be the name of virtual disk defined in /root/create_vb_vm_for_user.dat for this VM.
Select the OS and architecture size (32 or 64-bit) corresponding to the guest OS to install, as that will be the best performing one.
In 2020, we tend to define virtual hard disks as VDI files which dynamically expand to the maximum amount we think is useful, which is 150GB. Most operating systems and their updates will use far less, but since the dynamic choice doesn't consume any more disk than it needs (mostly), it uses the least disk resources.
This is the point at which you should attach any .iso file to the disk controller, to be used to install the guest OS.
Make sure the network is defined as desired as well, with the correct NIC networking methods per NIC, in the order desired. It is NOT necessary to define any NAT port mapping.
Install to the virtual hard disk defined above, partitioning the disk as desired. The default settings are often used for Windows, while Logical Volume Management (LVM) is often used for Linux.
Define an administrator/root account and use the password defined in /root/create_vb_vm_for_user.dat for the VM. Make sure you update the /root/scripts/mod_vcl_file.patterns file with the named VM's correct guest user and password — the one you want the students to see or use — in their .vcl file.
Install any applications possible to install at OS installation time.
It is good practice to always do so unless the needs of the course demand otherwise, as specified by the instructor. You may want to ask the instructor about that explicitly if the VMs will be used for vulnerability testing, as unpatched operating systems are desirable.
This provides a better user experience, synchronization with clocks and many useful features.
A guest remote desktop service is only needed if the VirtualBox Extension Pack can't be installed and hence VRDP cannot be used. In that case, the guest remote desktop service would provide a familiar desktop experience for people who are accustomed to using GUI desktops. Windows, Ubuntu, and Kali are all desktop-driven, as are other Linux systems that install desktop environments.
The guest service may need to be installed, configured and set to start when the OS starts. In addition, the guest operating system's firewall must be opened to allow for the port that the remote desktop service uses, which is usually 3389.
You must also ensure that the network method attached to one of the virtual NICs is either NAT or Bridged. If NAT, then a host port needs to be reserved when the VM is created via the VM definition information, mapping to the guest port (3389 in this case) and ms-wbt-server service. If bridged, the port 63890 should be used to avoid being blocked at the UW network perimeter, and at run time the guest's IP address (if dynamic) will need to be determined.
Alternatively, or perhaps in addition, install, enable, configure and start the ssh daemon or service. Make sure that at least one user can login via ssh from outside, since root logins are by default not permitted using passwords, and the root account could be the only login account on a Linux system. Similar restrictions as the remote desktop service uses apply to ssh ports and the for use of NAT or bridged networks by the NICs.
These would be the ones that help manage the VCL. For 2020, that may mean software that can detect user input activity and record the last time there was some. That information can be used to determine whether or not to save the state of the VM to decrease the utilization of host resources, allowing more VMs to be assigned to the host.
From ticket 6938 -- Keyboard and Mouse Input Activity Detection:
A custom AutoIT Script script called getidleinfo.au3 was created to see if keyboard/mouse I/O has occured. If there is a change from one sampling of an I/O counter to the next, there is input activity.
Getting this to work properly and consistently has proven to be a challenge. The current (February 2021) method involves installing the AutoIT Script package on each Windows machine, installing the nssm service manager, and configuring a service called GetIdleInfo to run AutoIt3_x64.exe with getidleinfo.au3 as a parameter. The service seems to stay running longest when this is done; otherwise, it stops running within an hour.
To make the installation easier and more consistent, each host has the necessary files in /fake/ro/tools, which is mapped to the Z:\ drive. One copies those files to the C:\tools directory. Then the script C:\tools\gii_install.cmd is run to set up everything.
The current (February 2021) method of exporting the information is via the VirtualBox Guest Additions VBoxControl.exe program, which allows the guest OS to send data to the host OS. In this case, the guestproperty set /setlabs/gii/datetime is set with the last time input activity was detected. A corresponding host command via VBoxManage reads that guest property for a Windows VM when free_vm_resources is run via a cron job.
The script command records both terminal input and output.
We don't care about what the actual user input is (which is why such input is written to /dev/null) but we do care about the timing information. Ignoring actual input avoids recording sensitive information such as passwords in clear text. We use the timing information to calculate the time of the last keyboard activity by a logged-on user, and may have to do something special to allow non-root user background processes to be detected so we don't kill active work no longer attached to a terminal.
The xinput command can be used to detect mouse and keyboard activity.
The getidleinfo script handles everything; see a detailed explanation of what is happening and how to install it.
In the examples above, the activity information is either recorded in a guest property of the VM or on the virtual disk. Putting it on the virtual disk makes it cumbersome to see from the host. Using shared folders and making sure that Guest Additions are installed, the activity information could be made to write to the user's home directory on the host. However, shared folders have some drawbacks that make them somewhat unreliable.
These would be the ones required by the course instructor.
Sometimes there are multiple admin or unprivileged accounts that should be created; often, these are given the same password as the default admin account for ease of use. If we are trying to impress upon the students the principle of least privilege, we may provide a login account which can be elevated on demand to executing admin commands (e.g., via sudo) but not be able to execute them by default.
The class list of UW Net Ids (one per student) is placed in a file. For teams, put the team account names in a file. There is one UW Net Id or team account name per line. The class list file is usually named "qqqyyyy.course", where "qqq" is a quarter name abbreviation (e.g., "win", "spr", "sum", "aut"), "yyyy" is the year, and "course" is the course number and possibly section (all one word and in lower case). It can be copied into /root/create_batch_users.dat for later use by /root/scripts/create_batch.
For example, let's say file "/root/aut2020.tinfo431" has these lines:
janedoe johnsm mhubbard jackspr
However, when you have to split that class list across multiple hosts, you can edit the original class list file and place the host name followed by a colon as a line before the list of students that will be placed on that host. For example, using hosts "cn2" and "cn3" for the above file, it now looks like:
cn2: janedoe johnsm cn3: mhubbard jackspr
Then you run /root/split_batch to create create_batch_users.dat files representing the desired students on each host. For example:
/root/scripts/split_batch /root/aut2020.tinfo431
This is the /root/create_class_batch.dat described elsewhere. The course number, VM definition's base name and other relevant information for each VM used in the course must be present. If more than one host is needed for the course, that file must be replicated to each host used.
Use /root/scripts/create_batch as described above.
It is easier to run the /root/scripts/apply_mod_vcl file for a course than to do the above manually, as it can change everyone's .vcl file a given course number. For example:
cd /root/scripts ./apply_mod_vcl tinfo431
Here's an example, which needs to run on all of the cns involved (e.g., from cn2 first):
cd /root/scripts ./harvest_vb_vm_info tinfo431 ssh cn3 /root/scripts//harvest_vb_vm_info tinfo431
Here's an example, presuming /root/scripts/cns can generate all of the cns involved so gather_vb_vm can collect all of the information harvested previously:
cd /root/scripts ./gather_vb_vm -d f tinfo431
Note that special handling is required for students enrolled in more than one course that uses the VCL.
This could be starting all of the VMs (such as a sysprepped Windows 10 VM) and bringing it to the point where the login screen is shown. This can be fairly involved, as all of the course's VMs need to be touched, and the same commands issued to each VM. See /root/scripts/send_sc.
Often, if you change something in the VM, you will want to delete the User_Changes snapshot and re-take it. For example, for the entire course "tcss555" on the current host:
/root/scripts/snapshot_vb_vm "delete User_Changes" tcss555 /root/scripts/snapshot_vb_vm "take User_Changes" tcss555
That makes it possible for the users to easily revert to "User_Changes" with all of your modifications intact, if they really mess up the guest OS configuration or filesystem. If you don't do the above, all of your modifications you made, including things like getting Windows out of sysprep, will be deleted after they revert, which is usually not desirable.
Often it is useful to test one set per host used. Check for connectivity amongst VMs, to the internet (and for domain names), remote desktop and/or ssh usage from a remote client, etc.
If something is wrong, you might need to delete all of the VMs on all of the hosts, redefine the base VMs, and redo most of the other steps above, to get it all working correctly.
The primary ones are:
VCL usage information is recorded per quarter via modifying this web page. This information is used for tracking which hosts have been used for what courses or research, as well as justifying the purchase of replacement host computers when one or more start failing and are out of warranty (if any).
One can start providing networking to a VM by defining a virtual NIC that is supported by the guest operating system, assigning it a networking method provided by VirtualBox, and configuring that networking method.
If all that is needed is a connection to the internet, then the NAT networking method is used. NAT provides a private IP address and "translates" it to the host's public one to send information to and receive information from the internet.
Most network traffic initiated via the guest OS will be captured by the virtual NAT device, its source VM information will be remembered and then the public IP address will be substituted in the packet and sent out to the internet. When a response returns, the IP address of the originating VM will be looked up and the response packet will be sent back to it.
More complicated NAT networking involves using the VM as a server providing some service. That service could be ssh, remote desktop, or web. The only way some entity on the internet can access a VM is if a port mapping exists between the host and the guest OS. The mapping of host port to guest port is part of the VM definition, and is communicated to the user.
For example if one knows that on host IP 140.142.71.32, the host port 53466 maps to port 22 inside the guest VM, all one has to do is use the host IP and port to connect to the guest OS:
ssh -P 53466 itadmin@140.142.71.32
There are more complicated configurations and uses of NAT, but those are the most common.
Networking VMs is often requested by an instructor for a course.
We re-iterate here the common case, where one student's set of VMs is networked so that the VMs can communicate with each other, but they can also access the internet as needed. In this case, we might define the VM with one NIC that does internal networking (which is isolated to the user/team, and uses its own DHCP pool), and another NIC that uses NAT as described above.
This setup provides network isolation for the internal network, which is good for attack or vulnerability testing scenarios where the student is using one VM to attack/test another VM. The network traffic will never exit the internal network except when a public IP address or domain name is used.
One can keep one NIC for NAT as in the common case, but change the other NIC to use "host networking". This involves creating the host network:
VBoxManage hostonlyif create
(done once per host) as well as defining the NIC to use that host-only network name, usually in the VM definition. For example:
--nic1 hostonly --hostonlyadapter1 vboxnet0, --nic2 nat
However, that only works when all of the student/teams in the course are on the same host. Sometimes the instructor likes to keep it simple and can accept that some students/teams will use one host and the rest will use another, as long as she knows which students are on which hosts. Then separate host-only networks are used.
To network VMs across hosts so that all VMs are on the same network, one must either use the Virtual Distributed Ethernet (VDE) networking method, or the User Datagram Protocol or UDP tunnel networking method. VDE is more familiar, as we used it back in the 2014 time frame but very little since. We have never used UDP tunneling.
While this is technically possible, it might not work because you would need to either have enough reserved public static IP addresses or dynamic IP addresses in a real shared DHCP pool, or be able to rely on the private subnet of static IP addresses you use.
Because SET "owns" a couple of public subnets (as of 2020) in CP (140.142.71.0/24 and 172.22.71.0/24), it might be possible on select hosts to use a subnet by pre-requesting enough static IP addresses from UW Net Ops, or by using the UW IT DHCP server servicing that subnet. Public IP addresses are probably not desired, so the 172.22.71.0/24 subnet would likely be used.
The VCL cns and sns all use public IP addresses from the 140.142.71.0/24 subnet, with private subnets in the 10.64.0.0/16 subnet for internal communication. There are some current and unused IP address in the 172.22.71.0/24 subnet, statically allocated for SET Lab staff use and some for VMs. There also may be a DHCP pool in the higher (than 172.22.71.100?) IP addresses there, or it could be enabled to work that way.
When a VM instance is created, it is left in a powered off state. At minimum, we need to be able to start a VM (power it on), but there are other actions we might like to take as well. The script in /root/scripts that one uses is control_vb_vm. It operates for all VMs on the current host. Starting a VM requires the use of the VBoxManage startvm subcommand. The other control commands supported by the script are VBoxManage controlvm control commands and VBoxManage getextradata commands; both of which require a running VM.
The script can operate on an entire course, all the VMs for a user within the course, a particular VM that a user owns, or all course VMs named the same, regardless of the user.
Here is the current complete list of actions:
If powered off, power it on. Once running, assign it to the same CPU core as was previously assigned.
If powered on, attempt to perform a graceful shutdown by simulating pressing an ACPI power button. The VM may not power off, in which case you must issue the "poweroff" action.
If powered on, essentially pull the plug on the VM. The OS won't be informed of the "loss of power" event, and the VM will stop executing and be left in the powered off state.
Show the VM information (uses VBoxManage showvminfo). One may have to use the "-v" option and pipe the script's standard error output to the same as standard output to be able to use a command like "less". For example:
cd ~/scripts ./control_vb_vm 2>&1 -v show tinfo431 css_test w7c | less
Displays the "extra data" associated with the VM, which is predominantly NAT's ingress host port to guest port mapping information.
If a VM runs out of real disk space, it will be put in a "paused" state. One could also do this manually. This action will resume a VM that is in a "paused" state.
Since the other actions always need to check to see if the VM to take action upon exists, and to determine its state, this is a quick way of doing nothing other than verify the state of VM.
Set this VM to use the core number specified. This was designed when only one CPU was common. We are not sure how VirtualBox assigns cores, but hope that it assigns them sequentially, and ignores assigning hyperthreads. It is an attempt to enhance performance by reducing context-switches.
Show the core number that this VM is using.
Save the state of the VM. Very useful if we need to take down the host and don't want to interfere with the state of each VM that the user is controlling. However, if the host is being updated and a new release of VirtualBox is installed, the saved state may not restore correctly. Also, beware of the extra snapshot storage needed to save the state.
cd ~/scripts ./control_vb_vm -v start tinfo431 _431team3
cd ~/scripts ./control_vb_vm -v start tinfo431 . win10
cd ~/scripts ./control_vb_vm -v poweroff tinfo431
cd ~/scripts ./control_vb_vm -v savestate mlresearch uwcheng ubuntu
When a VM is powered off, we may want to modify a VM's configuration. This is difficult to do with just the VBoxManage scripts for multiple users. The modify_vb_vm script makes it easier to do for entire classes, individual users within a class, or a particular VM for a user. It operates on all VMs on the current host.
There are many aspects of a VM to modify, so the action(s) that modifies the VM via the VBoxManage modifyvm command is simply passed as a quoted argument.
In some cases, the quoted argument would ideally contain something user-specific. For example, a private "internal network" might be named after the user (e.g., "user123_intnet"). You can use "%u%" in the action where you want the user id to be substituted.
Turn off the audio for class tinfo431:
cd ~/scripts ./modify_vb_vm -v "--audio none" tinfo431
Like everything else, a VM or set of VMs has a lifecycle. Once its usefulness is over (e.g., at the end of a quarter or when a project or research is complete), it can be either very easy or fairly complex to delete a VM or set of VMs.
Note that for the methods below to be effective (i.e., to clean up everything), there should be no processes running for the user(s) to delete. Therefore, prior to deleting the VMs, you should make sure that at least the VMs are powered off. For example, to power off all VMs for the students or teams in course tinfo341:
cd ~/scripts ./control_vb_vm -v poweroff tinfo431
Note that powering off VM(s) is not necessary for the delusers method exemplified below, since delusers includes killing all processes for each user or team account.
While deleting a user/team account via /root/scripts/delete_vb_vm will delete the VM, it won't delete any snapshots placed in snapshot folders located outside of the user/team's home directory.
Instead, use script called /root/scripts/delusers can be modified to delete all users in a specific directory (such as a course home directory), but normally defaults to the list of users in /root/create_batch_users.dat.
That file may not reflect all of the users you want to delete on this host, so make sure you check it first. If the file of users is what is desired, the "delusers" script will attempt to kill all processes for each user, then delete the user account and all snapshot folders it uses, plus delete any "/tmp/.vbox-uuu-ipc" (where "uuu" is the user/team account name) folders that VirtualBox is holding onto (and would prevent re-creating VMs for the same user). Consequently, "delusers" is the cleanest and fastest way to delete all information for all users.
cd /root/scripts vi delusers ./delusers
If there is only one VM or all VMs should be destroyed, the simplest way is to delete either the entire user and home directory; e.g.:
userdel --remove _test
or you could remove just the VirtualBox VMs directory if there was other information in the home directory you wanted to keep; e.g.:
rm -rf /classroom/home/tinfo431/_test/VirtualBox\ VMs
Another way to delete VMs is this, but it also suffers from not cleaning up snapshots located outside of the home directories, and does not clean up VirtualBox /tmp folders:
cd ~/scripts ./delete_vb_vm -v tinfo431
After deleting VMs for a course where the cn host was overcommitted, or to do so even if you aren't sure or don't know if it was overcommitted, remove any resource-freeing activity for the course. For example, for course tinfo452:
cd ~/scripts ./free_vm_resources -r tinfo452
That will remove the cron job to periodically check for any VMs for tinfo452 that may be running and need to have their resources freed, whether or not those VMs still exist.
Each user or team account has a VCL information file that records unique information about created VMs. That information needs to get back to the users somehow, and since we don't allow them to login to a terminal session on the compute nodes, we must put that info elsewhere.
Initially, each user would get a unique password per user account, but that proved difficult for the students, albeit secure. However, most VMs are inaccessible to people on the internet, including other students, and the need for unique passwords was further mitigated by unique and random port mappings. In more recent times, the admin account password is the same for each student in a course, but not necessarily for each VM, and it can be different from the one created when the base VM was created. Consequently, we change the .vcl file to reflect the desired admin account and password per VM, and that is done by the /root/scripts/mod_vcl_file script.
mod_vcl_file requires a smaller file called /root/scripts/mod_vcl_file.patterns, which contains the pattern to match in the .vcl file and what to replace the pattern with, per named VM. That .patterns file will vary per course, and is usually changed whenever a new VM or set of VMs is created on the host.
One must send the name of a user or team account to the standard input of mod_vcl_file so it knows which user. But this can only be done from the home directory of the course. If done manually, this will change each user's .vcl file in the "/classroom/home/tinfo457" directory:
cd /classroom/home/tinfo457 for u in `ls -1`; do echo $u | /root/scripts/mod_vcl_file; done
That command is rather difficult to remember and type, so /root/scripts/apply_mod_vcl was created. For example:
cd /root/scripts ./apply_mod_vcl tinfo431
Now that all of the .vcl information is correct, we should harvest that information, which basically collects in one place the VCL information files for all users in a course on a compute node. We can ask each compute node to do that, thereby collecting all of the information available, perhaps for an entire class whose VMs are spread out over several compute nodes.
The harvest_vb_vm_info script collects all of the desired VCL information files for the compute node on which it is running. Here is an example which gathers all of the VCL info files for the tinfo431 class:
cd ~/scripts ./harvest_vb_vm_info -v tinfo431
If we want to do that over all compute nodes (verify that /root/scripts/cns is correctly set up first):
cd ~/scripts ./do_all "cd scripts && ./harvest_vb_vm_info -v tinfo431"
Now we want to gather all of the harvested information in one place, so we can put it on a system (viz,, cssgate.insttech.washington.edu) that the students can login to. Here is what that would look like:
cd ~/scripts ./gather_vb_vm_info tinfo431
This archives the information in a tar file and transmits it to cssgate (root password is required), where it is unarchived and placed in each user's home directory, creating the home directory beforehand if it doesn't already exist.
Exceptions to normal distribution:
The VMs for students are usually created every quarter, and the previous quarter's information can be overwritten vs. appended to. Here is what that would look like:
cd ~/scripts ./gather_vb_vm_info -d f tinfo431
Note that there is no "-" before the "f", as it indicates an option to pass ("-d") to cssgate, not to gather_vb_vm_info.
Students that are enrolled in more than one course that uses the VCL need to get the VMs used by all courses. There is currently (December 2020) no information to indicate which course uses what VMs, but there is a way to append VCL information for each course.
The first thing is to identify which students are taking both courses (usually, it is only two courses). On a Linux system, we create a sorted list of UW Net Ids for each course, e.g., win2021.tinfo442.sorted and win2021.tinfo457.sorted, and find the lines in common:
comm -12 win2021.tinfo442.sorted win2021.tinfo457.sorted
and you should see the list of student UW Net Ids that are in both courses. Save that list via redirection to a file (e.g., /root/students_to_append) so it can be used on the host(s) for the second course — the first course (e.g., tinfo442) can be handled as indicated above.
To append to existing VCL information from the host for the second course (e.g., tinfo457) as well as overwrite the VCL information for those students only taking one course (e.g., info457):
cd ~/scripts ./gather_vb_vm_info -d f -a -f /root/students_to_append tinfo457
Note that there is no hyphen (-) before the f in -d f as it indicates an option to pass to cssgate, not to gather_vb_vm_info. The -f option indicates the file of users whose .vcl files will be appended.
When there are hundreds of VMs created for use, it can be very difficult to find out what was done (e.g., which compute nodes host which VMs) and what is going on (e.g., current state of the VM). In addition, it would be good to collect that information so that one could change the VM definitions or virtual network at will, perhaps even delegating that authority to the instructor (a process, under development in 2020, called "orchestration").
The first step, however, is to gather the information about current VMs. This could be integrated into create_vb_vm_for_user, but it doesn't handle what existed prior to the integration. As such, it may be best to collect information for all courses or for the "ituser" account.
We can look at /classroom/home for a list of subdirectories, which are usually course names (e.g., "tcss431") or special names like "faculty" or "sbrtest". Underneath the course subdirectories are usually the users' home directories, but other information can be found in them as well. Underneath a non-course directory like "sbrtest" might be only the "VirtualBox VMs" directory, but these special directories might not be that interesting; i.e., we would like to know what is going on for larger numbers of VMs, such as those used for a course.
Hence, monitor_vms was created. It can simply list all VMs on the current host by traversing the information found in /classroom/home or /home/ituser, or it can save that "basic" information (via VBoxManage list vms) to a FirebirdSQL database on cssgate. In addition, it can extract VM-specific information from each VM (via VBoxManage showvminfo vmname --machinereadable) and either print it or save it to another table in the database.
The reason to optionally put the information in a database is to avoid re-running monitor_vms every time, which can be fairly slow. That script does not record any historical information, either. In addition, it is easier to pull data out of a database than out of the output of monitor_vms.
A future use of this VCL information is in a web-based VM orchestration, system — ticket 6089, VCL Orchestration. In 2017 a student intern started work on this, and another student intern in 2020 moved it further along, but as of October 2022, it still isn't yet ready to use.
Here are some common use cases (relative to /root/scripts):
./monitor_vms -?
./monitor_vms
rshall /root/scripts/monitor_vms
./shvcl /root/scripts/monitor_vms
./monitor_vms -d
./monitor_vms -c tcss431
./monitor_vms -u ztanko
./monitor_vms -d -r -i VMStateChangeTime
./monitor_vms -s
Updating means that the "date_run" field will be updated with the latest date-time value, for this VCL, host, course, user and set of VMs. The date_run value is used to determine the latest information in the database, in case VMs, users or classes are deleted since the last time the information was collected.
The VM allocation practice in 2020 and in years preceding it was to pre-define a VM with a certain number of cores and RAM, and provide one or more of these VMs per student or team, depending on the needs of a course. The capacity of the host was taken into account when determining how many VMs could be placed on the host; capacity is defined as available cores and RAM. A host's available cores is the total number of cores minus 2 (to support the host OS), and the available RAM is the amount of host RAM.
If there is one VM per student/team, the maximum number of students/teams that a host can support is:
(host cores minus 2)/(cores needed by that VM)
If there are a set of VMs needed per student/team, the number of students/teams calculation is:
(host cores minus 2)/(total cores needed by the set of VMs)
One takes the number of students in the course and divides by the number of similar hosts to determine how many hosts to allocate to the course.
These calculations presume that students could not or would not manage their usage by shutting down their VMs when not in use, so we must always allow for all VMs running and consuming host resources at the same time.It has long been presumed that most VMs used by students lie idle most of the time, although as of December 2020 there are no statistics for this presumption. If that is the case, then host resources are being wasted on VMs that are doing essentially nothing. One can guesstimate that any given student interacts with the guest OS via the keyboard or mouse ("input activity") for an average of two hours per day, and maybe only on the day of or the day before class is held or an assignment is due. Furthermore, most VMs are not used for performing long-running computations, providing a service to other VMs, or serving as targets for penetration testing.
Let's say that a student has two VMs assigned to her, but only uses (interacts with) each for an average of two hours per weekday, for a total of 20 hours/wk. A quarter is about 10 weeks long, comprising of 10 wks * 7 days/wk * 24 hrs/day, or 1680 hours. The student's use is 20 hrs/wk * 10 wks = 200 hrs, or 200/1680 = 12% of the time. If we could recover the host resources consumed by her two VMs, the host would be available for other use 88% of the time.
Of course, it isn't as simple as that. Students probably don't use the VMs much for any course in the first 2-3 weeks, and tend to use them more in the week or so prior to an assignment deadline. Also, there may be times when the instructor wants all students in his course to use their VMs at the same time, which could be a problem if at that time, the host resources are committed to occasional use by students in other courses. Consequently, being able to recover host resources used by a VM isn't necessarily a panacea, but it can work a high percentage of the time.
The problem with recovering host resources for a VM is knowing when the VM's host resources can be recovered. The hardest part is knowing if the student is actively interacting with the guest OS, by entering keystrokes on a console, terminal or desktop session, and/or moving the mouse, clicking mouse buttons or moving the mouse scroll wheel. This is user "input activity" which gets detected by the guest operating system as a user interacts with a terminal session or desktop.
Of course, there are other uses for computing that do not require interaction ("non-interactional uses"), such as long-running computations, providing services to client computers, and penetration testing. One may not want to free the VM's host resources in these cases despite the lack of interaction. There needs to be a mechanism to to "opt out" of freeing up the VM's host resources.
In 2020, after much investigation, googling and trial and error, it was determined that input activity for Windows and Linux distributions could be reliably detected. For Windows, it involves looking at a change in the IO reads of the csrss executable; "IO reads" covers both keystroke and mouse activity. For Linux distributions, keyboard activity from a console or ssh terminal session can be detected by the script command, while desktops that rely on X Windows or Xorg can utilize the xinput command to detect keyboard and mouse activity.
We want to place information about input activity, specifically, when it happened, in a file that is accessible to the host, not just inside file system of the guest operating system inside the VM. Doing so makes it easier to write a script to process all VMs for a course. This can be accomplished if each VM has a shared folder called guestlog mapped to the host user's home directory. A subdirectory called guestlog/vm_name, where "vm_name" is the name of the VM, is created at VM creation time. That shared folder is defined to be automounted in read/write mode by the guest OS. The definition of the shared folder and its mapped directory is created when the VM is created.
Here is more detail on how this was designed and implemented on each major OS:
A scheduled task is started at boot time. Its function is to get a relevant session id of the csrss.exe process that relates to an RDP-Tcp session or a Console session. That session id is used to query process information (viz., IO reads) to set the baseline for the current number of IO reads done, then periodically check the current IO reads to see if it has changed. If it has, there is input activity; if not, periodically rewrite an output file with the last date and time that input activity has been detected.
This functionality is in the getidleinfo.au3 AutoIT Script file.
For more details, see Keyboard and Mouse Input Activity Detection.
Three login cases must be handled for Linux distributions:
Console and ssh sessions are terminal sessions that receive keystrokes. The script command can record those keystrokes and their timings. However, we don't care what the keystrokes are, so we can discard them to avoid recording plaintext passwords. The timing information is useful, as it co-occurs with the keystrokes and is normally intended for replay of the script (together with the keystrokes). If there is any timing info after a user logs in to a console or ssh session, there is terminal keystroke activity.
The only issue is starting this universally after login. That can be handled by scripts placed in /etc/profile.d — one for ssh and one for console, specifically coded (by testing environment variables such as TERM and SSH_CONNECTION) to only allow those types of logins and no other.
If the script command is running in a console session, startx can't detect the console and fails to start. To bypass this, in the /etc/bash.bashrc file, set the environment variable CONSOLE_OPT_OUT to the name of the user to be able to use startx. For example, for user "itadmin":
export CONSOLE_OPT_OUT=itadmin
Another means of bypassing the recording of timing information in a console is to use virtual terminal 6. That way, if something goes wrong with getidleinfo, there is a way to avoid having the script command run at login, because if something fails in /etc/profile.d/record_console_timing.sh, the login session is immediately exited.
The getidleinfo script (in labadmin@cssgate:scripts) will check for requirements, install everything, create logon scripts called /etc/profile.d/record_ssh_timings.sh and /etc/profile.d/record_console_timings.sh, and set up a cron job to capture ssh or console timing information.
It shouldn't always record console activity; it should ignore it when:
Desktop input activity is handled by xinput. xinput needs the device number of the devices that relate to keyboard and mouse activity, and specifically for VirtualBox keyboard and mouse pointers, as well as the PS/2 mouse device. The device name to number mapping can be found by parsing the output of:
xinput --list
Then, by using the device number after the --test argument of xinput (one invocation per device), one can see keyboard and mouse activity.
A Perl script called getidleinfo is placed in a Linux guest OS after installation, in the /usr/local/bin directory. getidleinfo has several options that can be used to install (-i), start (-s), or detect desktop events (-x).
The getidleinfo installation option is run to prepare the VM for Linux input activity detection:
getidleinfo -i
This is all that would be needed for console and ssh terminal keyboard input activity detection, but not for desktop input activity detection.
With the exception of the event collection forked process, the other backgrounded processes are controlled by the timeout command. That is, those processes will die when the timeout period expires. The reason to do that is to allow the script to limit the size of the files so that the events and the idleinfo.out file of event timestamps don't grow indefinitely.
For more details, see Keyboard and Mouse Input Activity Detection.
If all we cared about is input activity detection on a VM, then one could write a script that checked for no input activity for certain period of time, and if there was none, to take some action on it that reduces the VM resources used on that host.
What action on a VM would reduce its resource use?
The safest action is to save the state of the VM, also known as hibernating the VM, like a laptop computer does to its OS when the lid is closed, to conserve battery power. A laptop computer basically saves the state of the RAM and the current CPU state to disk, so that it can be restored when the power is restored. The VirtualBox hypervisor provides a command (controlvm vm_name savestate) to save the state of a running VM. This would mean that all the VM user would need to do is start the VM again if something saved its state, and nothing would be lost, for the vast majority of cases. That's what makes it "safe".
The next best thing to saving the state of a VM is gracefully shutting down the guest OS. This allows buffers to be written to disk before the OS shuts down, preserving the integrity of the file system. In some cases, applications can detect the impending shutdown and do their own cleanup prior to shutdown. However, we really want this to happen as soon as possible, so we would likely force all applications to close and then shutdown and poweroff. If we wait until the user responds and handles a proper shutdown or saves her files, then it is not useful for freeing resources without user intervention.
For VirtualBox, one can shutdown the guest OS via the controlvm vm_name acpipowerbutton command, which requires the guest OS to be one that responds to ACPI commands; most modern OSes do. However, it is not an immediate command and doesn't always appear to work (although it could be waiting for a grace period before shutting down), so one can also issue a shutdown command via guestcontrol vm_name run ..., which goes directly to the OS. The problem with using guestcontrol is that it requires a valid privileged user and its password, which the student could have changed from the defaults provided, so it may fail.
Based on experience, it appears that an application failing to close combined with a lack of valid privileged credentials means that the VM resource will never be freed. In the ssh_vbm.log file for that user, there will be multiple indications that the acpipowerbutton action was taken, but the VM remains up. Consequently, the logic was changed to look at the last two consecutive entries in ssh_vbm.log for freeing the VM, for the VM of interest. If they are present and increasing in input idle activity, we attempt controlvm vm_name poweroff.
Put another way, on a shutdown action, we avoid a poweroff for as long as possible, after exhausting all other possibilities, but we do want to reclaim the VM resource. Hopefully, this strategy minimizes loss of data or data corruption.
Finally, one can simply "pull the plug" on the computer, which is the VirtualBox controlvm vm_name poweroff command. This could lead to file system corruption, but it is fairly rare given normal student OS use.
Each of the above "actions" will free up the VM's host cores and host RAM usage. That doesn't mean that additional resources aren't consumed. Saving the state will consume more disk space because RAM needs to be saved as well; that extra space will be released when the VM is started again. Shutting down can consume more disk space as well, but not much more. Powering off doesn't normally change the disk space.
We want to free up the host resources used by a VM We have a way of detecting the last time input activity was done. We know what to do to the VM to recover the host resources. All that remains is implementing the script itself.
A script called /root/scripts/free_vm_resources was created that does just that, but also allows for students or SET lab staff to opt-out of freeing VM resources for VMs or change the action taken to free the VM resources.
/root/scripts/free_vm_resources accepts arguments to free resources for either all VMs in a course, all VMs for in that course for a particular user, all VMs of the same name for the course, or a specific VM for a specific user in that course.
The way this is used is, on a quarterly basis, a course-specific script would be placed in a cron job that executes every 15 minutes (or other time amount). For example, /etc/cron.d/free_vm_resources_tinfo457:
# Check every 15 minutes */15 * * * * root /root/scripts/free_vm_resources tinfo457
This can be done easily by this command (which doesn't check or free anything, just creates the crontab):
/root/scripts/free_vm_resources -s tinfo457
and to remove it:
/root/scripts/free_vm_resources -r tinfo457
Except for the -s or -r options, the script will check to see if that course's VM resources can be freed. If so, it could free them, depending on what the opt-out information says to do.
free_vm_resources accepts another option, -c. That means that we only want to check the conditions, to see if they would apply and cause the action to be done; a "dry-run", but one that provides details in the log file (which is /root/free_vm_resources.course.log, where "course" is the course number).
When free_vm_resources does perform an action, besides logging information to its log file, it also adds an entry to $HOME/ssh_vbm.log, with the user @_free_vm_resources and some statistical info. The hope is that some time in the future, that ssh_vbm.log file can be processed for that information and when the VMs were started, to form the basis for mapping when the host is generally available. If that information was presented to the users, they would know when they have a good chance of getting good performance for their VMs vs. during a busy time.
Barring any provided opt-out information, the default condition to check is for VM guest OS's input activity to be idle for more than 60 minutes. If it is, the default action is to save the state of that VM.
We know that for non-interactive VMs, we should not always save their state to free up resources, as that could affect their functionality. In addition, if an instructor needs the students to use their VMs during class time (which is often two hours, much longer than the default input idle activity limit), we may want to not free up the VM resources during that time. Or they may be other unanticipated scenarios where freeing up VM resources should not be done.
We don't want the default to be "do nothing" or "ignore", because then we are back to the current state of affairs. The default should be to save the state of the VM based on some input idle activity criteria, and the user or SET Lab staff can override that by opting out.
This could be implemented by a simple file existence test -- if an opt_out file exists in a VM-specific directory ($HOME/guestlog/vm_name), then that means the VM should be allowed to continue running. However, that isn't very flexible -- it doesn't allow for other conditions or actions that would free VM resources.
For the user documentation (vs. this design and implementation document), see Opting Out of Freeing a VM's Host Resources. That document provides the syntax that the user uses to create an opt_out specification file for a VM. The actual specification is a slightly-encoded version of the nicer user syntax, in the form:
condition:action
"Slightly-encoded" means that the original user syntax is converted to the above form, where spaces between terms, operators or actions are removed, any "op" value is converted to either ">" or "<", any "term2" value is converted to minutes, and the "then" keyword is replaced by a colon.
SET lab staff can create or remove an opt-out specification, either "globally" (for all VMs of a user/team account) or per-VM. Global specifications are placed in the $HOME/guestlog/opt_out file. The specification format is the same as listed above, but if the global file is present, its specification will override any VM-specific opt_out file that the student/team may have requested.
The /root/scripts/opt_out_fvr script provides or removes either the global or the per-VM specification. For example, if we don't want the student's/team's VMs' resources to be freed, this specification would do that:
null:ignore
and the way to do it is via this script (for example course "tinfo457" and user "srondeau"):
/root/scripts/opt_out_fvr -o -s 'null:ignore' tinfo457 srondeau
"srondeau" could be omitted from the above command to disallow freeing any VM resources for any student/team in the "tinfo457" course.
If we want to remove the global specification (using the above example):
/root/scripts/opt_out_fvr -o -r tinfo457 srondeau
and that will basically undo the global override specification, allowing any user-created or default per-VM specifications to be in effect again.
If we want to overwrite or create a per-VM specification, the -o option is NOT used, so the specification gets applied to each VM in the course, each VM owned by a student or team in the course, a specific VM for that student/team, or, by using the period to represent "any", each VM matching the provided VM name in the course. For example, let's gracefully shutdown all tinfo457 fedora VMs that are idle for more than 2 hours. The "2 hours" must be manually converted to minutes, since the specification is always provided in minutes:
/root/scripts/opt_out_fvr -s 'input_idle>120:shutdown' tinfo457 . fedora
Note that this is a per-VM specification, NOT a global one, so it will wipe out any existing opt-out specification that a student/team may have made for the "fedora" VM. This is the only way to set a different opt-out specification for each VM when there are multiple VMs per student/team. That could be useful in a course that has multiple VMs, one of which is the target of a penetration test — that VM should not be freed, For example, course "tinfo442" needs a "kali" VM and a "centos" VM, where the centos VM is the target (needs an unconditional "ignore" action) and the kali VM is the penetration tester VM (which is subject to the default opt-out condition/action):
/root/scripts/opt_out_fvr -s 'true:ignore' tinfo442 . centos
or, if sending that command from one host to a remote host (e.g., cn8), one needs the proper quoting to be used so the resulting command works on the remote host, as in:
ssh cn8 '/root/scripts/opt_out_fvr -s '\''true:ignore'\'' tinfo442 . centos'
A student/team cannot remove an opt-out setting they made, but they can revert it back to the default using the following (for the win10 VM example):
opt_out win10 default then default
Of course, given any choice for a default input idle time (e.g., 60 minutes), someone will want to change it. Before we even started making input activity detection and freeing VM resources the standard, one instructor asked for a longer input idle time. Consequently, a per-course default was created; if an opt_out.default file is found in the course directory (e.g., for tinfo457, it's /classroom/home/tinfo457), then its specification is what is used for the default, not one that is hard-coded in the free_vm_resources script.
Note that a default class opt-out specification is NOT the same as an override specification. Students/teams can still override the class default via a VM-specific opt-out specification, and SET Lab staff can override the class default or any VM specification, at a user level, with an opt-out specification in the user's home directory in the guestlog folder.
To support a class or course default, opt_out_fvr was enhanced to create (via the -d option) or remove (via the -r option) the default specification in the proper location.
Given that we are attempting to free VM resources, if the estimate of 12% of the time the host is utilized is correct, then we could commit nearly four times more VMs per host than we currently do! If students would use any of the available hours of the day to run their VMs, and perhaps could be informed of when their host is likely to have a low load on it, then that degree of overcommitment could work.
Realistically, and without any statistics, putting four times more than the current VM load on the host seems to be far too much; perhaps two times more than current loading, or double the load, would work okay.
Consequently, overcommitting in the presence of actively freeing VM resources might mean that one doubles the amount of cores that the host has to determine how many VMs to allocate on it.
One creates a VM such that someone can remotely access it. This is the initial concern of a user of the VCL, and it is documented in the SET Lab web pages, under How to Use the VCL: Quickstart Guide.
The description of how VM access is set up is scattered across the Compute Node Phase I implementation documentation. Here, we attempt to pull together and document all that is necessary to do so, including some topics that may not have had a home elsewhere and were therefore undocumented.
The main concept of the VCL and any cloud provider is that one can access a VM from anywhere on the internet. A VM is defined with its virtual hardware capabilities, and the VM creator could leave it at that and let the user install or run a guest operating system inside the VM, given a Live CD/DVD image or an installation image. Or the VM creator could pre-install the operating system and configure it for the needs of the end user, team, project, course or research, as has been done for over a decade with the VCL.
The essence of a VM is what is captured in the .vcl file. That file is created every time the compute node or "host" script create_vb_vm_for_user is run. It is also done when the create_batch script is run, since it calls create_vb_vm_for_user for each course/class/project and user defined. That .vcl file is created in the host home directory of the user or team account.
Multiple VMs per user are common in many courses. Often, the VMs will be created such that a virtual network can be easily established. Virtual networks are either predefined for course needs or defined by the user once they have access to the VMs. This need for simple networking is facilitated by placing all of the VMs used in a particular course/class/project for a specific user or team on the same host. Consequently, the VM information of each VM belonging to a user/team on a host is appended to that user's or team's .vcl file when the VM is created.
Normally, only one .vcl file is needed per user, but sometimes a user takes more than one course that uses the VCL. The VM creation process does not take this into account — there is no current (February 2021) catalog of courses that require the VCL. Instead, each user's set of VMs for a course are created independently of that user's VM needs for other courses. There is no current (February 2021) information in the .vcl file that indicates which course uses what VMs, and there probably should be in the future. For now, that means that the VM creator must be aware that a student might be involved in more than one course using the VCL, and then append the host-dependent .vcl file information at gather_vb_vm_info time or manually.
Here is an example of a complex VCL information file:
20201222 13:08:59 vcl user testuser password 'ppppwwwww' 20201222 13:08:59 --------- Host and Guest Login Information ---------- 20201222 13:08:59 vcl host cn8-vcl12 '140.142.71.32' (host ip) 20201222 13:08:59 --------- Information for ubuntu ---------- 20201222 13:08:59 +-------- VRDP and LOGIN Information ---------- 20201222 13:08:59 | vcl vm ubuntu (vrdp) '140.142.71.32:50838' -- use testuser password above 20201222 13:08:59 | vcl vm ubuntu (login) itadmin password 'xxxxxxxx' 20201222 13:08:59 +-------- PORT Information ---------- 20201222 13:08:59 | vcl ubuntu cn8-vcl12 host port '140.142.71.32:52633' (ssh) for guest NIC 2 port 22/TCP 20201222 13:08:59 +------- End of PORT Information ---------- 20201222 13:08:59 --------- End of Information for ubuntu ---------- 20201222 13:18:42 --------- Information for fedora ---------- 20201222 13:18:42 +-------- VRDP and LOGIN Information ---------- 20201222 13:18:42 | vcl vm fedora (vrdp) '140.142.71.32:50159' -- use testuser password above 20201222 13:18:42 | vcl vm fedora (login) root password 'xxxxxxxx' 20201222 13:18:42 +-------- PORT Information ---------- 20201222 13:18:42 | vcl fedora cn8-vcl12 host port '140.142.71.32:52973' (ssh) for guest NIC 2 port 22/TCP 20201222 13:18:42 +------- End of PORT Information ---------- 20201222 13:18:42 --------- End of Information for fedora ---------- 20201222 13:28:58 --------- Information for win10 ---------- 20201222 13:28:58 +-------- VRDP and LOGIN Information ---------- 20201222 13:28:58 | vcl vm win10 (vrdp) '140.142.71.32:50583' -- use testuser password above 20201222 13:28:58 | vcl vm win10 (login) administrator password 'xxxxxxxx' 20201222 13:28:58 +-------- PORT Information ---------- 20201222 13:28:58 | vcl win10 cn8-vcl12 host port '140.142.71.32:54864' (ms-wbt-server) for guest NIC 2 port 3389/TCP 20201222 13:28:58 +------- End of PORT Information ---------- 20201222 13:28:58 --------- End of Information for win10 ---------- 20201222 16:06:49 vcl user testuser password 'wwwwpppp' 20201222 16:06:49 --------- Host and Guest Login Information ---------- 20201222 16:06:49 vcl host cn5-vcl12 '140.142.71.64' (host ip) 20201222 16:06:49 --------- Information for centos8 ---------- 20201222 16:06:49 +-------- VRDP and LOGIN Information ---------- 20201222 16:06:49 | vcl vm centos8 (vrdp) '140.142.71.64:52902' -- use testuser password above 20201222 16:06:49 | vcl vm centos8 (login) itadmin password 'xxxxxxxx' 20201222 16:06:49 +-------- PORT Information ---------- 20201222 16:06:49 | vcl centos8 cn5-vcl12 host port '140.142.71.64:53472' (ssh) for guest NIC 1 port 22/TCP 20201222 16:06:49 +------- End of PORT Information ---------- 20201222 16:06:49 --------- End of Information for centos8 ----------
It contains VMs on two hosts, cn8-vcl12, on which testuser has four VMs, and cn5-vcl12, on which testuser has one VM.
Before we descend into the quagmire of complex VCL information files, let's step back to what the information is in the .vcl file. The order of VCL information is as follows:
The first line of the VCL information file is the vcl user line, as identified by that string. It is the host user name and password, which are the credentials that allow access to the host, whose IP address is provided later in the file.
The credentials together with the host IP address provides ssh access to the host, and commands issued via ssh provide the means of managing and accessing the VM. This is done for a Windows or MacOS user by manage_vc, but must be done manually by a Linux user.
Host-based ssh commands use the information immediately following the Host and Guest Login Information line. The next line contains the host name and the host IP address, but only the host IP address is needed. The host name is not fully-qualified and is for reference only, so instructors and SET Lab staff can easily determine what host is being used vs. looking up the IP address.
Following the host information is a collection of information for each virtual machine. The name of the VM is found after Information for string, and the end of that VM's information is denoted by the line containing End of Information for that VM name. In other words, all of the information between those two lines pertain to the named VM.
The VM-specific information collection varies per VM definition, but can include:
The .vcl file is placed in the user or team account's home directory on cssgate, since cssgate is the gateway to SET Lab information and everyone has a login account there. When there is more than one set of VMs for a user, there will be more than one vcl user line in the user's cssgate .vcl file. That line terminates the information associated with the previous host and starts information for the next host. In essence, one is appending the contents of the second (or n-th) .vcl file to the previous contents, and is what an option of gather_vb_vm_info can do from a VCL host.
While the How to Use the VCL: Quickstart Guide is the user's guide to the VCL, its audience is different from this document's audience — which provides more details and inner workings.
The .vcl file is the only source of VM information from the user's perspective, and hence is the only thing needed to inform the user as to how to access his/her VMs. In the first year of the VCL's existence, it was the only way to gain access, and it was painful for predominantly-Windows users to use, in part due to the cryptic but hard-to-guess password of the "vcl user" information, but also for the configuration of the Microsoft remote desktop client (the mstsc.exe command).
If the VCL was to survive, the students would need something easier to use. In addition, if the VCL would become popular, it would be very difficult and time-consuming for SET Lab staff to manage hundreds of student VMs across several courses.
For example, if the hosts experienced a power failure outside of normal office hours, any running VMs would also fail, and it would take manual intervention to start them again. To handle power outages, the hosts are set up to restart and thus boot the host operating system after power is restored, but that doesn't start the VMs after the host OS is up. At the time, VirtualBox didn't have the capability of starting a VM, and neither did SET Lab staff via scripts and cron jobs.
We also were aware of courses that may require the students to be in full control of their VMs, much like removable hard drives provided for coursework in previous years. Full control is important for installing an OS or configuring it, installing and configuring apps, and also extends to changing VM definitions (e.g., virtual RAM size and networking methods). If one can control the machine, the operating system, and the network, one can do essentially anything related to computing.
Hence, a decision was made to let the users control or "manage" their own set of VMs, and manage_vc was born.
A user wishing to directly manage their VM sends ssh commands and any arguments to the host using the "vcl user" credentials and the host IP address. If the command requires a VM name, that name is pulled from the VCL information file.
For example, using the sample VCL information file , if a user wants start the "fedora" VM associated with the "testuser" account:
ssh testuser@140.142.71.31 start fedora
and manually enter the password in the VCL information file, or
plink -pw ppppwwww testuser@140.142.71.31 start fedora
Issuing ssh commands to the host is the only means a user has to manage his/her VMs (it underlies what manage_vc does), but "connecting to" or interacting with a VM is the most common activity that a user does. Connecting to/interacting with a VM using VCL information is described later.
All host-based ssh commands are run through the /usr/local/bin/ssh_vbm script, which allows only certain VirtualBox subcommands or custom commands — those written by SET Lab staff — to run. Any commands issued by the user are "sanitized" to prevent users from trying to subvert the system.
Depending on the connection/interaction method chosen, the user will see either a virtual desktop, a command console or a terminal session of the VM. The user can also send keystrokes and mouse movements/clicks to the VM.
Connections can be done from a user's client computer via:
This host service provides access to the VM's virtual hardware display, much like sitting in front of a real computer's monitor and watching the OS boot up, but the display is a window on the client computer. The VM must be started (similar to a computer being powered on), but the guest OS does NOT need to be running.
The VM must be defined to use Virtual Remote Desktop Environment (VRDE), which is part of the VirtualBox Extension Pack. All VirtualBox VMs used in coursework and research are configured to use the VRDE.
If the guest OS is a modern Windows OS, it has built-in support for the Remote Desktop Service. In contrast to VRDE, the guest OS must be running and the Remote Desktop Service must be started. The credentials used are specific to the users on the guest OS, and are NOT the same as the VRDE host credentials. Another important issue is network access to the guest OS, which for truly remote access must be either "NAT" with a port on the host mapping to the port on the guest OS, or "bridged" with using the supplied IP address. Due to UW IT Intrusion Detection Settings, the port cannot be the default port unless Husky OnNet VPN is used.
For example:
Either service requires that the guest OS is fully operational: the network is running, the service is started and the service's port is open to external connections. Whether the service can be accessed from an off-campus client depends on how the VM's network is set up or the use of Husky OnNet VPN.
A primitive and slow way to interact with a VirtualBox VM is to take periodic screenshots of its display and interact with it via sending the VM keystrokes and keyboard shortcuts. This is available via manage_vc using the "Do cmd:" dropdown box, or via manual ssh commands.
The screenshot command in ssh_vbm provides a means of taking one or periodic screenshots every n seconds, and provides a web URL which can be used to see the screenshot. To do so, it starts a web service at a user-unique port so that the user can use the VM-specific URL provided to see the VM's display. This is done via the VBoxManage controlvm vmname screenshotpng ... command, together with Perl logic, a Perl package supporting a simple web service, ImageMagick to convert from .png to .jpg, and a simple Perl daemon package for taking periodic screenshots. One can also stop the processes started to display the web page and/or periodic screenshotting.
As for sending keystrokes, the VBoxManage controlvm vmname keyboardputscancode ... command is used, and a syntax for expressing common named keycodes and key sequences was developed. One can send many named keycodes/sequences in one command, or send an entire string.
The user's client program must support the protocol of the service desired.
As of October 2022, the manage_vc facility resides in srondeau's UWTACOMA home directory, in H:\scripts\python\manage_vc, as manage_vc.py. This replaces the older AutoIT Script file called H:\scripts\autoit\manage_vc\anage_vc.au3, which only ran under Windows. Since the VCL was created in 2009, the student population shifted from using primarily Windows personal laptops or home computers to using a mixture of Windows and Mac OS computers, with rare cases of Linux. The long-desired cross-platform version of manage_vc with one source code file was finally implemented in Summer 2022, once it was realized that one could use Python's tkinter package to allow cross-platform windowing.
manage_vc has to get the VCL information file from cssgate.insttech.washington.edu so it can read its contents locally. In other words, it has to remotely copy it from cssgate to the local computer.
For Windows, the pscp command from the PuTTY package is used, since Windows versions prior to Windows 10 in 2019 did not have any natively-supported scp/ssh commands. More recent versions of pscp have required an additional option to be provided: -P 22 — to specify the destination port.
Unfortunately, pscp does not have an option to ignore host key checking such as ssh's option -o StrictHostKeyChecking=no, so some special code has to answer a normally-interactive question about accepting the host key.
Gaining remote access to cssgate requires login credentials, and one must use the UW-ubiquitous "UW Net IDs" (or "uwnetid") because that cssgate requires. Login accounts are created when we first detect that there is a new student enrolled in one of the SET courses; first they are created as an INSTTECH domain user account on a Windows server, and then on cssgate. Both Windows and cssgate use Kerberos credentials (UW Net ID with the user's MyUW/Canvas password) to login.
A home/laptop user most likely will not have defined a local login account using the UW Net ID, so the user will need to specify his/her UW Net ID to manage_vc. manage_vc prompts for the user name, which is normally just the UW Net ID. As team accounts were allowed to be used for shared VMs, support for team accounts was added, allowing a "user name" of this form: uwnetid teamacct to be specified. That indicates to manage_vc to use the uwnetid as the credential source, but instead of the VCL information on the cssgate uwnetid account, use the team account's .vcl file.
To support SET Lab staff, if uwnetid is "root", then cssgate's root account will be used to access any user/team account's home directory. That allows the transfer of any .vcl file to whoever knows the cssgate root account password. For example, root srondeau would get UW Net ID srondeau's .vcl file.
Eventually, support was included for an INI-style file (manage_vc.ini) to define some user-specific information. manage_vc reads and uses the information in that file to override some information, such as the paths to pscp/plink executables, the server for the VCL information file, the user name, and the name of the VCL information file to retrieve. This allows users to assert more control over what manage_vc does, and is especially useful for defining a UW Net ID for the "user" line.
manage_vc also needs to issue commands to the host, so the Windows PuTTY plink command is used. plink also evolved over the years to require an additional option: -no-antispoof — to remove additional prompts for anti-spoofing mechanisms PuTTY's author deemed necessary.
Unlike scp/ssh, pscp and plink allow passing in the password to the remote login account, which removes the need to generate public keys. For copying the .vcl file, we prompt for the password, after which the file is stored on the local computer, usually in the user's profile.
For scp/ssh, one must use the sshpass command to send passwords to ssh or scp. This is commonly used in the Mac version. The Windows version includes a default plink/pscp, so sshpass is not strictly necessary, although it would be a problem (bug) if the user directed us to use ssh/scp via the manage_vc.ini file.
Once the .vcl file is local, the first task of manage_vc is to parse it from its somewhat human-readable format into something manage_vc can use.
Perhaps the .vcl file should have been converted to Yet Another Markup Language (YAML) or some other Extensible Markup Language (XML) format, but it wasn't (maybe YAML wasn't written/popular or easily parseable by AutoIT). There is a host-based Perl script called dump_vcl_file which should do that (might not be up to date), but it was written long after manage_vc was first released.
Limitations on the button size were circumvented by using tool tips to show the full VM name.
Its purpose is to connect to the VM's display via RDP. Its functionality was changed in 2020 to allow for better handling of guest OS RDP information, as "guest RDP" differs from "virtual RDP" information. In addition, there is often a need to edit guest RDP information, since the user may add guest OS user accounts, change passwords, etc. As much as can be automated is for the user, parsing NAT port mapping information if present and using it to define the RDP client connection.
Virtual RDP uses only information which is present in the VCL information file, and therefore doesn't need and should not nbe editable, thereby lessening the chance that other users can easily connect to a particular user's VM.
This includes the do-anything "Do cmd:" facility, which uses default commands present in the drop-down box to the right of the "Do cmd:" button, or commands that a user types in, to execute on the host. This is the manage_vc equivalent of being able to manually issue ssh commands.
manage_vc manages the VMs via sending ssh/plink commands to the host, using the VCL user information for login credentials and the host IP address. On the host end, ssh_vbm receives those commands and attempts to execute them.
On Windows, to perform the "Connect" button functionality, a Microsoft RDP file is created using some information from the VCL information file, the VCL user password is encrypted, and then that RDP file defines the connection information that Microsoft's mstsc command uses to gain access to the VM's virtual display, thus allowing the user to interact with the VM. The Microsoft RDP client works for connecting to the display any VM, as the default virtual RDP service is independent of the guest OS. Other RDP clients that support more recent versions of the RDP protocol may also work.
Because there is no known way for a Mac client to pass Windows credentials like cryptRDP5 does, there is no way to set up the credentials in advance such that a simple click of the manage_vc "Connect" button will open the remote desktop connection. Instead, there is a complicated set of instructions that are told to the user via an information dialog box about how to import most of the RDP info (except credentials) into Microsoft Remote Desktop Client. Entering credentials are an additional, one-time-per-VM step, ultimately via the Mac's KeyChain Access. This is the only place it is not user friendly and differs significantly from the Windows version.
manage_vc is packaged via the make_zip.cmd Windows batch file, which is in the same folder where the master copy of manage_vc.py resides.
A zip file is created for distribution — one for Windows called manage_vc_win.zip and one for Mac OS called manage_vc_mac.zip. These files are placed on the csslab web site that hosts the UsingVCLQS.html file, which references the .zip files and explains how to install and use VMs in the VCL.
The .zip files are created by running (in the folder above):
make_zip win make_zip mac
If you want to also update the web site and have it mapped as the L: drive, as srondeau does:
make_zip win update make_zip mac update
Note that the Windows part of make_zip uses pyInstaller to create an executable called manage_vc.exe with a python interpreter and all required files, resulting in a 10MB+ .zip file. Once unzipped, the executable is ready to use.
The Mac part of make_zip simply bundles the essential files and an install script into a small .zip file. The Mac version's mvc_install script helps the user install Python and its packages, gets Microsoft Remote Desktop Client from the Apple Store, installs developer tools to allow compilation of the sshpass command (critical to supplying passwords to ssh), and creates a script called manage_vc and puts it in /usr/local/bin.
To package it for Windows, run make_zip win, which will:
That subfolder (called manage_vc) includes the executable manage_vc.exe, the default manage_vc.ini file, the cryptRDP5.exe for encrypting Windows passwords for mstsc's use, and the latest versions of the plink and pscp executables (currently, in October 2022, version 0.74).
The resulting manage_vc_win.zip file is placed on the SET Lab web pages, as follows:
net use L: \\itfiles4.d.insttech.washington.edu\lab$ /user:insttech\css_setup * copy /y manage_vc.zip L:\www\Support\HowtoUse
which replaces the current copy there. The quickstart web page is set up to not be cached, so the latest version of the .zip file is always available. Alternatively, you can use:
make_zip win update
For the Mac version of the .zip file, see above.
There may be more than one .zip file whose name starts with "manage_vc_win" or "manage_vc_mac" in that directory. If there is a major change to manage_vc, especially with respect to the VCL information file format, we may keep the older version around in case someone still needs to access their old VCL information. These additional .zip files contain a version number in the file name, while the most recent one does not.
If a VM's host is on a UW-private (viz., 10.0.0.0/8) or semi-private (viz., 172.28.0.0/16 or 172.22.71.0/24) IP subnet, users who are not on a UW campus will not be able to directly access the VMs. Instead, the user must have installed, started and authenticated via Husky OnNet VPN, after which they should be able to access the VMs.
If not using the Husky OnNet VPN, a user's device may be blocked from accessing either cssgate or the VM's host. That can occur due to host-based IP firewalls which have an extensive list of subnets which are blocked, based on at least one IP address in that subnet which attacked cssgate in the past. The user must then contact SET Lab staff and supply the IP v4 address shown by navigating to https://whatsmyip.org, and reporting that to SET Lab staff, who will attempt to unblock the IP address and/or subnet.
Note that the IP address reported by the operating system that the user is running is not adequate, since their device is often behind a home router that issues dynamic IP addresses on its own private IP subnet. That private address is NOT what should be investigated for blocking. Rather, what is desired is the IP address provided by the internet service provider (ISP), which is exposed by browsing to the web page mentioned above.
End users can use either Windows, Mac OS or Linux to access the VCL. As noted, Windows and Mac OS users can use manage_vc, which either bundles Python and required packages with it (for Windows version), or on Mac OS, uses an install script to direct the user to install required software.
Mac and Linux users may require a Microsoft Virtual Desktop License (VDA) if they were assigned a Windows VM. It is the responsibility of SET Lab staff to have enough licenses available for all non-Windows users.
The number of non-Windows users could be determined prior to use by a survey, but getting anyone to take a survey is difficult and typically provides only a 10% response rate. There is code in ssh_vbm that can be enabled to disallow starting a Windows VM unless the user has taken the survey, but periodic distribution of the survey results to all hosts has not been finalized to date (February 2021), so more work needs to be done to enforce this.
Alternatively, one can guess at how many non-Windows users there are that will need access to a Windows VM. If a course doesn't use any Windows VMs, then this is a moot point. For those courses/projects that do require a Windows VM, we can use statistics from previous quarters to guesstimate the percentage of non-Windows users there are, or we can simply choose a number well over the best guess, and then validate as the quarter progresses.
The Microsoft Virtual Desktop License (VDA) ticket describes the general technique, and a script on each host called /root/scripts/count_access_to_windows_vms will detect any occurrence of Windows and non-Windows accesses to a Windows VM. If it finds any, it counts it (but only one if many occurrences are found for the same VM). If the script is run on all hosts during the quarter, one can see if there are enough VDA licenses to cover the non-Windows users.
Such VDA licenses last only for the number of months purchased, and are tied to the UW-Microsoft Enrollment for Education Solutions (EES) agreement, which expires on 1 June. The need for licenses will vary per quarter, so an effective strategy for purchasing the right number of licenses will save money.
As mentioned, a VDA license is the responsibility of SET Lab staff, not of the end user. A Mac/Linux user must install a remote desktop connection client and know how to configure it, and must become familiar with it to interact with the VMs and scp/ssh commands to manage them. There are detailed instructions in the quickstart web page about this.
When a user wants to see the display of the remote VM and interact with it, these are the methods available for a VM which is running (or started), in order from easiest to hardest to use:
The choice can be up to the SET Lab staff VM creator or the user can configure the OS to provide services such as RDP or ssh. The user is responsible for client software that can access either the host or the guest OS services desired.
The speed and ease of interaction is based on how the VM is defined, the guest operating system installed in the VM, the resources available on the host, the speed and reliability of the network connection, and many other factors. SET Lab staff configure VMs based on course needs, and try to make using the VCL and interacting with the VMs a pleasant experience. Factors improving the experience involve:
As of 17 Oct 2022, one can force the values of the .cpuinfo file to be set by the list of all VMs for all current courses or unique ids, as determined within the script by:
/root/scripts/monitor_vms -i cpus | grep "cpus="This forcing of core assignments per course/unique id, user and VM is done by running /root/scripts/force_cpuinfo. By "force", we mean that it overwrites the old .cpuinfo file.
The reason this was written is to lower the complexity of keeping up with differences in the number of cores per VM as more VMs are added or the user changes the number of cores. This script must be run manually, and is best done after creating VMs. A SET VCL administrator will only be able to adjust for user changes if notified, and then must remember to manually run it. It was deemed not enough of an issue to perform a periodic force_cpuinfo so that the cores will be set periodically, as noted below.
In addition, since the .cpuinfo information is only respected by running as root this command:
/root/scripts/control_vb_vm setcore ...
the users cannot do so themselves — taskset is a privileged command. Consequently, a new script called /root/scripts/periodic_set_core was written to run the above command periodically, via the /etc/cron.d/periodic_set_core cron job.
The intent is to respect the core settings in .cpuinfo even though the users or the system shuts down or powers off the VM(s), and the users start them again. If there is a performance problem due to multiple VMs per core, it should resolve after the cron job kicks in and assigns the VMs to their assigned cores.