PO Box 1214 • La Pine • Oregon • 97739 • 541-410-2760

 

M
a
i
n

Home

  • Tech Notes
  • Showcase
  • What We Do
  • Downloads
  • Store
  • IMO
  • Contact Us
  •  
    KVM made simple

    KVM made simple

    *WARNING* This is a work in progress and is not complete nor necessarily completely accurate. But the information is provided as I've discovered it, in the hopes it will help others.

    *NOTE* This "How To" deals with the "full system" VM mode of QEMU/KVM. I don't get into the "user" mode.

    What is KVM?

    You probably already have the idea that this is a program to run a virtual machine (VM), like VMware or VirtualBox. You are partially right and its that partial answer, which I had in common with you, that was one of the biggest obstacles to figuring out how to use it. KVM is only one piece of a larger whole. Its a module in the Linux kernel and a plug-in for the QEMU emulator. As a consequence most of what you do to mange VMs, using KVM, is done with the QEMU tools. And its the QEMU docs that are the most helpful. So for most of us KVM is QEMU.

    For the rest of this document I'm going to use the term "QEMU" to mean the combination of QEMU with KVM support. The vast majority of things that are going to be mentioned here are QEMU related with KVM as a silent partner, super-charging our VMs. Usually the only difference between running stand alone QEMU and using KVM is the "-enable-kvm" switch on the emulator's command line (CLI).

    KVM is not "libvirt", "virsh" or any of the related tools. Most discussions I've found on the Internet or in books always bring "libvirt" into the picture. IMO this is just unneccessary complication and, in typical Red Hat fashion, it only kind of works. Perhaps newer versions work better. In all fairness I'm sure those tools are helpful to someone. But for my uses it tends to make KVM more cryptic. If all your looking to do is learn/use KVM, which you need to do to make those tools work anyways, I'd stay away from them.

    A quick note before we begin: If your not familiar with VM terminology you need to know two words. "host" is the physical computer your running on, which hosts the VMs. A "guest" is the virtual machine your running on the "host".

    Why KVM?

    In short its freakin' fast and incredibly stable. KVM caught my interest, a couple of years back, when my VPS provider said they were migrating to it because it was faster. When I went through the hassle of migrating a webserver to the new KVM platform I was not dissapointed! I later migrated my local WinXP / QuickBooks setup from its physical hardware (which had died) to KVM and performance was great. Since then I've migrated from VirtualBox, both in headless server and desktop roles. KVM makes running headless VMs simpler and potentially more headless. I could brag more but I won't waste any more time. ;-)

    What you need?

    1. A general understanding of Linux.
    2. A working Linux install. KVM is Linux technology.
    3. Linux kernel v2.6.20+ (most modern distros of the last few years use v3.x+)
    4. The related KVM & QEMU packages installed. Don't forget the kernel module! Or you can build from source, but that is a topic for some other book. :-) On my Devuan/Debian systems I needed the "qemu-kvm" & "qemu-utils" packages.
    5. A CPU with hardware virtualization and support enabled in the BIOS. Look for either "vmx" (Intel) or "svm" (AMD) in the "flags" line(s) of /proc/cpuinfo (egrep "^flags.*(svm|vmx)" /proc/cpuinfo).
    6. Rights to the "kvm" device (probably "/dev/kvm"). On Debian that means belonging to the "kvm" group.
    7. Operating systems to install: CDs, DVDs or thumb drive image files.
    8. If you have a graphical desktop on your machine the "aqemu" tool can provide quite a jumpstart. In a typical desktop role it does a fine job of creating and managing running instances. And even if desiring to run VMs in background/headless scenarios its a simple way to build command lines and create disk image files. Aqemu has a handy feature to write its VM definition to a shell script. You can open that and see the command line.

    Quick Start

    From a terminal you can do something like the following to launch a brand new VM:

    qemu-img create -f qcow2 disk1.qcow2 10G
    qemu-system-i386 -enable-kvm -m 1G -cdrom cd_image.iso -hda disk1.qcow2 -boot order=cd,once=dc

    I underlined the variable parts.  "disk1.qcow2" can be anything you want to name the file that will contain your virtual hard disk. It needs to be the same in both commands. "cd_image.iso" is a somewhat complicated topic, if you're not familiar with it. This should be changed to match the name of the file you want to install the OS from. More on this below.

    What the first command does is create a file to contain a 10 gigabyte disk image for our VM, using the "qcow2" file format. Obviously you can use any size you prefer. Suffix the size with "G" for gigabytes and "M" for megabytes. "Qcow2" is currently the most sophisticated format providing us with snapshot abilities and lazy disk allocation. Space on your real disk isn't used until its used in the VM. For fidling with and learning QEMU "qcow2" is ideal. On most systems "man qemu-img", in a terminal, will provide more information and more will be said later.

    The second command launches a 32bit x86 virtual machine with 1GB of RAM. The file "cd_image.iso" is connected as a CD in the CD drive. "disk1.qcow2" is connected as the first IDE drive. Then it tells QEMU to attempt booting off of the "D:" drive (CD) and if that fails the "C:" drive (disk image) the first time. Subsequent reboots will reverse that order, assuming the reboot happens within the virtual machine, not stopping and then restarting it. This is ideal for launching an OS install from a CD or DVD image file and then when the installer reboots it will boot straight off the installed OS. Subsequent VM launches can ommit the "-boot ..." entirely as booting off of "C:" is the default.

    What makes this a KVM is the "-enable-kvm" switch. If you want a 64bit KVM then you can change the "qemu-system-i386" to "qemu-system-x86_64". This requires that your host OS is 64bit because the CPU is incapable of upgrading to 64bit operation for the VM. It can downgrade but not upgrade. The "-i386" portion of the qemu-system-* command tells QEMU what processor to emulate. KVM acceleration only works for like-kind CPUs in the guest. If you wanted to run a 64bit guest on a 32bit host QEMU will do it but KVM can't. You'll have to drop the "-enable-kvm" switch but then the CPU will be emulated with software, and will be much slower. QEMU can emulate many different kinds of CPUs, which is why the Android SDK uses it to provide its virtual devices.

    The "-m" switch sets the amount of RAM available to the VM. The "G" and "M" suffixes function as expected. The defualt is 256M, which is probably not very usable. And you can add more than one CPU to the VM with the "-smp" switch. Just follow it with the number of cores you want it to think it has. These will take advantage of real cores on your system. The man page details a number of different ways those virtual CPUs can be made to appear to the guest.

    The "monitor" of our virtual machine will default to creating a window on our desktop, using the SDL lib. Assuming you are running this on a graphics equipped desktop machine this is a good way to get started. Otherwise you can end up staring at a blinking cursor, wondering what to do next. I believe SDL can take over the physical monitor of a text only Linux machine. But I haven't tried this. I just know that my initial attempts to install an OS on a remote computer via SSH left me staring at a blinking cursor. :-)

    Be aware that this is what QEMU calls a "display". They use the term "monitor" to refer to a virtual terminal like back door that is used to command changes to the virtual hardware of the running VM and read back info about it. From now on I'm going to use the terms as QEMU has defined them. When I first started reading the docs this terminology mixed me up until I realized that's how they were labeling things.

    This VM does not have any viable networking support unless your Linux distribution has done something special. Networking is a complex topic, with many ways to set it up, which will be briefly dealt with later.

    CD/DVD & thumbdrive images to install from

    There are many ways to procure an image file to install from. At least with the free OSes the simplest way to get them is to download them from their official website. An image you would download to burn to disk is usable with QEMU, as is, via the "-cdrom" and related switches.

    If you have a physical optical disk there are a couple of ways to handle things:

    1. If you plan on reusing the image many times you could copy the raw media to a disk file so you can attach it at will to one or more VMs.
       
    2. You can use the disk right out of the physical optical drive on your system, assumig you have one. In this case just provide the device name on the CLI, right affter the "-cdrom" switch.

    In either case you will need to know the device name of your physical CD/DVD drive. This is entirely distribution dependant but typically is "/dev/sr0" (for the first one) and sometimes its "/dev/cdrom", "/dev/cdrom0", "/dev/dvd", "/dev/dvd0" or something along those lines. If your device doesn't register in the SCSI subsystem then you will have to consult other resources to figure out its name. You will also need "rights" to access the device, which is also a distribution dependant issue. On my system I need to be a member of the "cdrom" group.

    Making an image file from a physical disk is pretty straight forward:

    cp /dev/sr0 cd_image.iso

    Replace "/dev/sr0" with your device name and "cd_image.iso" with the name you want to use for this image file. Then you can use whatever you named it in the command "quick start" section.

    Usually bootable thumb-drive images are the same as CD/DVD images. You should be able to use them as downloaded. But exceptions abound.

    What's this "monitor" anyways?

    As mentioned briefly above this is how you control your virtual hardware. My three most common use cases for it are:

    1. Determine the status of my drives
    2. Insert or eject CD/DVD media
    3. Turn off or reset my VM

    The first question to answer is: how do I get access to the monitor? Unless you specify otherwise on the CLI the monitor defaults to piggy-backing off of the current "display" device. This means you have to use special key combinations to be able to get to it. In a typical graphical mode (like the default used in the quick start) you use [CTRL]+[ALT]+2 to get to the monitor and [CTRL]+[ALT]+1 to get back to the VM's display. In a text / terminal mode the hot key switches to [CTRL]+A followed by "c". Near the bottom of the qemu-system man page they describe the key strokes. If the defaults aren't convenitent there are other CLI switches that affect them.

    Once there you will see the QEMU version information (assuming it hasn't scrolled off the screen) followed by a "(qemu)" prompt. The monitor has a whole mess of commands. You complete each command by hitting [ENTER]. The "help" command will dump a brief explanation of them all, which will be too large to fit on the screen. But here's a brief set of commands to match the tasks I listed above:

    1. Drive status: info block
      Typically you want to pay attention to the name and status of your CD device.
       
    2. Insert: change {drive} {file}
      Eject: eject {drive}
      Replace {drive} with the name of the drive shown by "info block". The default would be "ide1-cd0" (1st CD on second IDE controller). Replace {file} with the name of the image file to use. Typically you'll want to provide the full path name. You can change disks without ejecting first. These commands may error out if the guest OS has the drive locked, like when they prevent the physical eject button from working.
       
    3. Power Off (2 options): "system_powerdown" or "q"
      Reset: system_reset
      "system_powerdown" attempts a graceful shutdown of the guest. It does this by sending an "ACPI" event much like the power button on most modern computers. Some OSes will require the "guest agrent" be installed in order to hear the event and start the shutdown procedure. The "q" command (meaning quit) immeadiately terminates the emulator which has an affect similar to tripping over a power cord. It could leave things broken. The "system_reset" is similar to "q" in that there is no orderly shutdown of the guest OS. Its different in that the QEMU process keeps running and simply boots up again. Just like pressing the reset button used to do. I haven't seen too many of those on recent cases.

    The monitor is powerfull allowing you to pretty much hot-add and remove many of the component of the virtual hardware. It also has support for managing snapshots and other extreme-developer related goodies. Its well worth browsing through the help. But you may not be able to scroll back to read it all.

    The monitor output can be redirected to a variety of destinations. The main switch to use is "-monitor". Everybody likes to do things differently. The QEMU guys really bent over backwards providing a large assortment of end points to attach the monitor to. For most of my use cases "-monitor" does what I need.

    Usually I'll use one of three options with it: "vc", "stdio" or "unix:{path},server,nowait". The first is the default. So technically I don't use it. It multiplexes the monitor onto the graphical display. If there isn't one then "stdio" is the default. It attaches the monitor to the Linux "stdin" and "stdout". In Simple terms this means that the monitor is attached to the terminal you started QEMU with. This is real convenient if you have a lot of virtual CDs to swap while loading software on a new VM install. Example:

    qemu-system-i386 -enable-kvm -hda disk1.qcow2 -monitor stdio

    This launches my virtual machine with the disk image I created but leaves the monitor attached to the terminal I used to launch the instance. This way I can swap virtual CDs as needed while installing software on the graphical window used for the display.

    The "unix:{path},server,nowait" is an advanced option I typically use for headless VMs. This creates a "unix domain socket" that you can interact with. Replace {path} with the path name of a the socket you want. Typically "socat" or "nc -U" (needs to be the BSD version) is then used to connect to the socket to manipulate the monitor.

    The Display

    The "display" is what the graphics card of the virtual machine connects to so that you can see what the VM has to show. With Linux web or other kinds of servers that's not so important. "ssh" is the rule for seeing and commanding. There are several switches to redirect the video:

    • -display none - The guest thinks it has a video card but the output goes nowhere.
    • -nographic - Is similar to the above but also redirects the serial port to "stdio" (ie. the terminal if one was used)
    • -curses - This is a text only output that displays on your terminal. When a piece of software switches to a graphics mode this display goes blank stating the graphics resolution in use. Unfortunately many OS installs want to show you pretty pictures which makes it difficult, if not impossible to run the installs with this option. Although I did find being patient and hitting [ESC] a bunch of times with the Debian / Devuan installer allowed switching to a text only mode so that I could perform a remote install over ssh. But I also needed to know the secret command line.
    • -vnc - Currently this is probably the most practical way to gain remote access to a VM using graphics.
    • -spice - This is supposed to be a better way to provide remote graphics and sound for a VM. But I've found client availability real limited. Perhaps I don't know where to look.

    According to the man page most, if not all, of those command line switches are deprecated in favor of using "-display something" but it fails to document how exactly this is supposed to be done and I have't experimented with it. It also didn't mention the availability of all of the display output options.

    The last two switches require an additional argument on the CLI, which is actually a comma separated list of arguments. Check the man page for details. Since I think VNC is the more practical choice for remote access here's a sample:

    qemu-system-i386 -enable-kvm -hda disk1.qcow2 -usbdevice tablet -vnc localhost:1

    This creates a password-less VNC server that will show your running VM. The "localhost" part is the IP address the VNC server attaches to. Using "localhost" will make it only accessible on the local machine unless tunneling is employed, like over ssh. The ":1" tells it which VNC display port to use, they are numbered 0-99. 5900 is added to the display number to specify the TCP port. Depending on your VNC client you might have to specify port 5901 instead of just "1". To tunnel over SSH you will definitely need to use full port number in the ssh command.

    You can append ",password" to this switch's argument to require a password for the VNC connection. However there doesn't appear to be a way to specifiy the password on the CLI. You have to use the monitor's "set_password vnc {password}" command Replace "{password}" with your desired password.

    The "-usbdevice tablet" switch is a handy fix for many VNC clients that makes the mouse usable. Often times the mouse on the local machine and the mouse on the VNC display are in different places, making it difficult to reach some items to click them. This switch fixes that and makes "grabbing" unnecessary, which is an alternate fix for the same issue but with other side affects.

    Networking

    You are probably going to want to access the Internet or your local network from withing your VM. And as I said this is a complex topic, made even more complex by the fact you are dealing with your host and guest network stacks and QEMU's networking support. The two most common toppologies I would use are: All networked VMs appear on my physical network, like any other machine, and access the internet through it. Or a set of VMs on their own isolated little netowrk. I suppose you might also want to setup private access between host and guest.

    To accomplish these tasks you need to know two or three Linux tools for the host side of things: ip tuntap, brctl & iptables. Techincally you should be familiar with how to setup networking in your distro. There are other tools to interconnect QEMU VMs without involving the Linux network stack, but I've read that performance is lacking and I prefer to learn the fewest possible tools with the greatest bredth of functionality, as opposed to learning a larger number of limited purpose tools.

    Since all of the setups mentioned previously use the same set of tools I'm only going to illustrate connecting the VM directly to the local net. The first thing we need to do is setup "tap" devices to function as the network devices in our guests. These things function similarly to point-to-point devices. One end is the host the other is the guest. Creating a tap device requires root privileges and your user may require permissions to the "tun" control device (usually "/dev/net/tun") in order to attach the VM to the tap device. Creating the device is pretty straight forward (do these as root: su, sudo, ...):

    ip tuntap add dev vnet0 mode tap user {user}
    ip link set vnet0 up

    Replace {user} with the user name running the VM, typically you. The first line creates a tap network device with {user} granted access to attach to it. "vnet0" is a name you give the tap device. Call it whatever you want as long as its not the same as an existing network device. We use this name to tell QEMU and our Linux host about the network device. The second line brings it "up" making it ready for use. You can repeat these commands with multiple names to create as many adapters as you want.

    The next step is to create a bridge (as root):

    brctl addbr br0
    brctl addif br0 eth0
    brctl addif br0 vnet0

    "br0" is a name we use to reference the bridge. It too can be named whatever you want as long as its not the same as any other networking device on the host. The first line creates the bridge. The second line adds our host's network adapter into the bridge. If your network device has a different name you will have to use it in place of "eth0" ("ifconfig" or "ip addr" will list your adapters). If you don't want your VM network connected to your physical network just skip the second command.

    The last line adds our tap device to the bridge. Make sure the tap device name (vnet0) matches what you named it in the "ip tuntap..." command. Repeat this line for all tap devices you created.

    Hopefuly you haven't built the bridge yet. If you are adding your host's network adapter to the bridge you should use your distro's network tool to shutdown the existing connection. Then build the bridge and connect your network adapter. Then you need to configure your bridge as if it were your physical network card. Consult your distro's documentation. "br0" will be your new network adapter from now on. Any VM connecting to it via a tap device will have full access to the physical network, as if it were attached, including DHCP support.

    If your primary network device is a wireless adapter they usually don't support bridging and can't be used in this way. You'll probably need to setup a route between the bridge and the wireless adapter, configure NAT, maybe setup DHCP, ... It gets complicated.

    Quick note on firewalls: Usually I leave it up to the guest to configure their firewall inside the VM. However its possible to use the Linux firewall on the host to protect the guest and there are any number of reasons one might want to do this. "Windows" comes to mind. It gets a little weird, at least to my mind, as all firewall rules are written addressing "br0".

    The FORWARD chain is used for things going between bridged devices (guest to local net/Internet). While traffic coming into the host or going out from the host still uses the INPUT and OUTPUT chains. Rules for a specific tap device or the physical network device seem to be ignored. I have not tested anything beyond the base table (ie. I don't know about nat). To be selective in applying rules in the FORWARD chain you will need to base filters on the IP address of the guest VM, otherwise it will impact all cross-bridge traffic... which might be desirable. Firewall rules are a topic of some other tome.

    Back to our regularly scheduled program ...

    To connect the VM to the tap device and by extension to the bridge, or any other networking you might have setup, you do something like this:

    qemu-system-i386 -enable-kvm -hda disk1.qcow2 -net nic,vlan=0,macaddr=DE:AD:BE:EF:00:00 \
        -net tap,vlan=0,ifname=vnet0,script=no

    The "\" is there just to support splitting the command on to two lines. You can put it altogether on one. I did this to make it easier to read on smaller monitors.

    The first "-net nic,..." sets up the virtual network adapter's front-end. The piece seen by the guest. You can also append a ",model=..." option to tell it what kind of network adapter it should look like. My version of QEMU defaults to a Realtek chip. Depending on the guest OS you may have to change this to something it has a driver for. "e1000" is a commonly supported option, which is a typical Intel gigabit network adapter.

    The "DE:AD:BE:EF:00:00" is the "MAC address" for the network adapter, which needs to be unique on your network (anything reachable by the bridge). I use this particular series of mac addresses because its easy to remember. Just change the "00" parts to some random collection of 0-9 and A-F (hexadecimal). There needs to be two digits for each grouping between colons. Don't add any more or take away any sets of digits.

    The second part, "-net tap,..." is the part that defines the back-end. The connection being made to/through the host. In this case we're telling it that we're using a "tap" device. The "vnet0" must match one of the tap devices you built. The "vlan" number must match the "-net nic..." vlan setting.

    The "script=no" part tells it not to run any scripts to setup the tap device. QEMU can use scripts to auto-configure, at least partly, the network adapters. The idea is that the tap device creation/destruction and adding/removing from the bridge could be scripted so that every new VM you setup would just-connect to your network. In my experience with Debian7 - 8 and Devuan they don't work, out of the box, for my purposes. Yours probably won't either. So this is my simplified universal solution for this document. We manually setup the tap devices and add them to the bridge.

    And of course you now need to configure the OS in your guest to use its freshly provided network adapter (not covered here).

    (? host-guest only networking ?)

    Going beyond...

    Reading the man page:

    There is much, much more to QEMU. The "man qemu-system" is probably the best reference. But it seems to be a bit cryptic. One of the things that I noticed is that there are usually several different ways to accomplish the same thing. Take setting up the destination for the "monitor". There are three different switches that can accomplish this: "-monitor", "-mon" & "-serial".

    I think in many situations one switch may be a shortcut for a more verbose one or even set of them. Like "-hda" which creates a disk device. The more complex "-drive" does the same thing but allows control over every aspect, the most useful of which is the kind of controller it appears to be. Other duplicate switches may be the old-way and the alternate the new-way. But that is not always documented.

    Another pattern that may seem odd is the use of front & back ends. The more specific switches may need two parts to actually make the connection between the host and guest. As an example the "-chardev" switch sets up a "character device" back end connection to the host. But that doesn't do you any good until its connected to some front end device, like the monitor ("-mon" not "-monitor").

    And some switches, like the "-net" switch, are an umbrella over a whole mess of potential switches. Like the "-net nic,..." and "-net tap,..." perform two separate functions in defining the front and back ends for the networking ("-net") system. So rather than have a "-tun" switch and a "-nic" switch we have the "-net ..." switch and its many sub-functions.

    Lastly the man page is a reference and its assumed you know how KVM works and its terminology. You may have to experiment. This is fairly safe since these machines are virtual and quickly changed. You can try one set of command line switches with a particular VM. Shut it down and try another. Much easier than ordering a new bit of hardware and waiting for it to arrive. Plus it doesn't cost anything. ;-) Some OSes don't react well to their hardware getting changed so its helpful to use the snapshot capabilities of qcow2 (qemu-img snapshot ...) to roll back if something doesn't work right.

    Emulated Hardware:

    I haven't delved into the hardware that the QEMU provides to the guest. Their default devices have worked with everyhting I've tried so far... well mostly... For amusement I tried installing Novell DOS and Geoworks in a VM. The Geoworks setup died when it came to defining the hardware. But other than that it worked. :-) And I wasn't bored enough to try to setup the NDOS network stack.

    Almost every device has options for changing what it looks like to the guest: hard drive controller, video adapter, network adapter, sound card, ... I know... I said nothing about sound support. I just don't have any need for it and so will probably not play with it. There is also USB support... which I probably won't use either.

    The kind of hardware being emulated becomes important in two situations: performance and guest software support. If the guest software doesn't support the hardware QEMU is providing (won't boot, undetected NIC, missing drivers, ...) you will need to change it. The man page is the reference for changing the hardware types. Some device types also requires running QEMU with specific switches to get it to tell you what hardware it can emulate. Other platforms than the x86 derivatives haven't been covered here and there hardware is likely to be radically different from what is supplied for a "PC". As a consequnce you have to ask the emulator what options it has. This is documented in the man page.

    Almost all of the x86 hardware has a specialized version that works with KVM faster than the other hardware types. They are usually all named "virtio". One exception is the video card which I found the "vmware" device seems to work better for graphics. But these will require installing specialized drivers and probably "guest agent". You'll have to search for them and or the drivers for a particular OS. Typically they come as one package. The "virtio" stuff has "just worked" for me in my Linux guests.

    I've needed the "guest agent" on some windows versions to support automated shutdown of the guest. Would seem like the ACPI emulation ought to be enough. But if you want better disk and network performance you will need the drivers supplied in the ISO image for the guest agent. I think I found them here last.

    Disk performance is usually the biggie for me. In order to get the virtio disk driver installed and working on C:, in windows, I had to add an additional drive using virtio, install the windows drivers for the new drive and then shutdown and restart the VM using virtio for C:. With the drivers installed for the other drive they worked to boot off C:. Your mileage may vary due to different windows versions behaving differently.

    There are floppy disk images for the windows virtio disk driver. Supposedly you can load them during the OS install. This has not worked for me. The floppy images are available for download separately and are also on the CD image you download for everything else. The "-fda" or "-fdb" switches allow you to attach the floppy-disk image as either A: or B: respectively.

    ( more to come - check back )

    (? recipes ?)

    (? monitor on sockets / screen ?)

    (? auto-start/stop ?)

    (? -localtime & -name ?)

    (? disk image formats ?)

    (? libvirt ?)