Discussion:
[libvirt-users] Backup a VM (using live external snapshot and blockcommit)
Jérôme
2015-09-11 12:45:53 UTC
Permalink
Hi.

I'm following here a conversation that was initiated on Kashyap's
website [1].

We have a server we use as a host for virtual machines using KVM
(virt-manager used for VM creation) and we would like to setup VM
backups. Basically, we're thinking of a backup schedule like "keep 7
daily and 4 weekly backups". We'd rather not shutdown the VM every day
so live backups would be nice.

I've been doing my best with documentation found on the Internet. It is
likely that the path I chose was not the best, so feel free to tell me
if I'm asking the wrong questions and I should be proceeding totally
differently.

AFAIU, backups can be done at filesystem level (using LVM) and at
virtualization level (using libvirt). We chose the libvirt way.

AFAIU, live backups using libvirt may be done thanks to blockcommit as
explained here on the wiki [2].

-> Considering our use case, is this the recommended way?

Assuming yes, here is the plan.

I wrote a script that does

# Create snapshot
virsh snapshot-create-as --domain $VM_NAME snap --diskspec
vda,file=$VM_DIR/"$VM_NAME"-snap.qcow2 --disk-only --atomic
--no-metadata --quiesce

# Copy frozen backing file
cp $VM_DIR/"$VM_NAME".qcow2 $SNAP_FILEPATH

# Blockcommit snapshot back into backing file
virsh blockcommit $VM_NAME vda --active --pivot

# Remove snapshot file
rm $VM_DIR/"$VM_NAME"-snap.qcow2

Variables should be self-explanatory:
- VM_DIR is the directory where the VM are stored
- VM_NAME is the name of the VM, and its qcow2 file is called
VM_NAME.qcow2
- SNAP_FILEPATH is the full path (including name) where the backup
should be created

Using this scheme, we only keep snapshots for the time of the VM file
copy, which is less than a minute. The backing chain is at most 'back <-
snap', and most of the time just 'back'.

If something ever happens to the VM (human error while being logged as
root, attack from the internet,...), we'll turn off the VM, replace its
qcow2 file and turn it back on.

I understand that this method only saves disk states, so the VM will be
started as if it had been powered-off suddenly while running (not quite:
thanks to the '--quiesce' option, at least the disks are in a sane
state). Not perfect but better than nothing. Those backups are meant to
be used only when all else failed, anyway, it's not daily use.

-> Does this make sense? Am I missing a feature or even a different
approach that would make things simpler or more secure? Am I using
libvirt snapshots for what they're not meant to?

-> Anything wrong about my snapshot-create-as and blockcommit command
lines? May I remove the snapshot with only a rm command?

Now, a few side questions, as I might have messed up with the VM I was
experimenting with.

I used the same command lines as described above, except I didn't pass
the '--no-metadata' option. Once the backing file was copied, I deleted
the snapshot qcow2 file and thought I was done with it, until I realized
the snapshot was still listed by virsh snapshot-list. And I couldn't
find a way to delete it. (For the record, I asked on serverfault about
that [3].)

Ultimately, I found the snapshot's .xml descriptor and deleted it (in
fact, moved it) while libvirtd was down. Now, the snapshot is not listed
anymore.

-> Am I getting away with it? Are there still some traces about that
snapshot? Is my VM in an unsafe state? Anything I should do about it?

-> What would be the proper way of dropping an external snapshot that
was created without the '--no-metadata' option, then blockcommitted? I
understand libvirt doesn't do it yet.

Thanks for any hint. I naively thought our use case was pretty usual,
and I must admit I didn't think I'd have to dive into this complexity,
which is why I'm thinking there might be a more "common" way...

[1]
http://kashyapc.com/2014/10/07/libvirt-blockcommit-shorten-disk-image-chain-by-live-merging-the-current-active-disk-content
[2]
http://wiki.libvirt.org/page/Live-disk-backup-with-active-blockcommit
[3]
http://serverfault.com/questions/721216/delete-orphan-libvirt-snapshot
--
Jérôme
Eric Blake
2015-09-11 15:05:14 UTC
Permalink
Post by Jérôme
AFAIU, live backups using libvirt may be done thanks to blockcommit as
explained here on the wiki [2].
-> Considering our use case, is this the recommended way?
Yes, using active block-commit is the ideal way to perform a live backup.
Post by Jérôme
Assuming yes, here is the plan.
I wrote a script that does
# Create snapshot
virsh snapshot-create-as --domain $VM_NAME snap --diskspec
vda,file=$VM_DIR/"$VM_NAME"-snap.qcow2 --disk-only --atomic
--no-metadata --quiesce
# Copy frozen backing file
cp $VM_DIR/"$VM_NAME".qcow2 $SNAP_FILEPATH
# Blockcommit snapshot back into backing file
virsh blockcommit $VM_NAME vda --active --pivot
# Remove snapshot file
rm $VM_DIR/"$VM_NAME"-snap.qcow2
Yep, that about covers it. Note that the --quiesce step in snapshot
creation requires qemu-guest-agent running in the guest, and that you
trust interaction with your guest.
Post by Jérôme
I understand that this method only saves disk states, so the VM will be
thanks to the '--quiesce' option, at least the disks are in a sane
state). Not perfect but better than nothing. Those backups are meant to
be used only when all else failed, anyway, it's not daily use.
Yep.
Post by Jérôme
-> Does this make sense? Am I missing a feature or even a different
approach that would make things simpler or more secure? Am I using
libvirt snapshots for what they're not meant to?
No, you're spot on for one of the useful use cases of snapshots.
Post by Jérôme
-> Anything wrong about my snapshot-create-as and blockcommit command
lines? May I remove the snapshot with only a rm command?
Looks correct to me, and matches my recent KVM Forum slides:
http://events.linuxfoundation.org/sites/events/files/slides/2015-qcow2-expanded.pdf
Post by Jérôme
Now, a few side questions, as I might have messed up with the VM I was
experimenting with.
I used the same command lines as described above, except I didn't pass
the '--no-metadata' option. Once the backing file was copied, I deleted
the snapshot qcow2 file and thought I was done with it, until I realized
the snapshot was still listed by virsh snapshot-list. And I couldn't
find a way to delete it. (For the record, I asked on serverfault about
that [3].)
virsh snapshot-delete --metadata $dom $badname

to remove $badname snapshot that no longer exists because you changed
things behind the scenes.
Post by Jérôme
Ultimately, I found the snapshot's .xml descriptor and deleted it (in
fact, moved it) while libvirtd was down. Now, the snapshot is not listed
anymore.
-> Am I getting away with it? Are there still some traces about that
snapshot? Is my VM in an unsafe state? Anything I should do about it?
Directly manipulating .xml files behind libvirt's back is not ideal;
better is to use libvirt APIs (the way snapshot-delete --metadata does).
Post by Jérôme
-> What would be the proper way of dropping an external snapshot that
was created without the '--no-metadata' option, then blockcommitted? I
understand libvirt doesn't do it yet.
Thanks for any hint. I naively thought our use case was pretty usual,
and I must admit I didn't think I'd have to dive into this complexity,
which is why I'm thinking there might be a more "common" way...
Nope, right now, there is still some user burden rather than a
one-command-does-it-all virsh wrapper. But fortunately it is not too
bad (you proved it is scriptable), and you discovered the correct
sequencing.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
Jérôme
2015-09-11 16:18:35 UTC
Permalink
Hi Eric.

Thank you so much for your quick and relieving answer.
Post by Eric Blake
Yes, using active block-commit is the ideal way to perform a live backup.
Great.
Post by Eric Blake
Yep, that about covers it. Note that the --quiesce step in snapshot
creation requires qemu-guest-agent running in the guest, and that you
trust interaction with your guest.
Yes, I think I get this. I don't really figure out what these cases
could be. We're using Debian Jessie and I installed qemu-guest-agent.
Other VM could use other systems, but most likely Linux based.

Do you mean that, in cases where you shouldn't trust the guest, using
'--quiesce' might end up being worse than nothing? Or just useless?
Post by Eric Blake
Post by Jérôme
-> Anything wrong about my snapshot-create-as and blockcommit command
lines? May I remove the snapshot with only a rm command?
http://events.linuxfoundation.org/sites/events/files/slides/2015-qcow2-expanded.pdf
I'll have a look at these, thanks.
Post by Eric Blake
Post by Jérôme
Now, a few side questions, as I might have messed up with the VM I was
experimenting with.
I used the same command lines as described above, except I didn't pass
the '--no-metadata' option. Once the backing file was copied, I deleted
the snapshot qcow2 file and thought I was done with it, until I realized
the snapshot was still listed by virsh snapshot-list. And I couldn't
find a way to delete it. (For the record, I asked on serverfault about
that [3].)
virsh snapshot-delete --metadata $dom $badname
to remove $badname snapshot that no longer exists because you changed
things behind the scenes.
Before removing the .xml file, I tried the command indicated in the wiki
[1] with no success.

"NOTE-2: Optionally, you can also supply '--no-metadata' option to tell
libvirt to not track the snapshot metadata -- this is useful currently
as at a later point when you merge snapshot files, then you have to
explicitly clean the libvirt metadata (by invoking: virsh
snapshot-delete vm1 --delete --current -- repeat this as needed.)"

Shouldn't the

virsh snapshot-delete vm1 --delete --current

be rephrased as

virsh snapshot-delete vm1 --metadata --current

?

I see '--delete' is not listed in the man.

Or even

virsh snapshot-delete vm1 --metadata $badname

since after the blockcommit, the snapshot is unused, I'm not sure it is
considered current.

Anyway, I'm glad you confirm I now have the correct sequence.

Thanks again.

Enjoy the WE.

[1]
http://wiki.libvirt.org/page/Live-disk-backup-with-active-blockcommit
--
Jérôme
Eric Blake
2015-09-11 16:45:34 UTC
Permalink
Post by Jérôme
Post by Eric Blake
Yep, that about covers it. Note that the --quiesce step in snapshot
creation requires qemu-guest-agent running in the guest, and that you
trust interaction with your guest.
Yes, I think I get this. I don't really figure out what these cases
could be. We're using Debian Jessie and I installed qemu-guest-agent.
Other VM could use other systems, but most likely Linux based.
qga with support for quiesce has also been ported to Windows guests.
Post by Jérôme
Do you mean that, in cases where you shouldn't trust the guest, using
'--quiesce' might end up being worse than nothing? Or just useless?
If the agent is not running, using --quiesce will fail the entire
command; you'd learn pretty quickly to retry without --quiesce for
guests that don't know how to handle it. . But if the guest is
malicious, it can pretend to be a guest agent, but intentionally refuse
to reply to the --quiesce request, and leave libvirt hung waiting for a
reply. So it boils down to whether you trust your guests to be
reasonable with their guest agent connection (fine if it is your own
guests, not so much if you are hosting a cloud for other people's guests).
Post by Jérôme
Post by Eric Blake
Post by Jérôme
-> Anything wrong about my snapshot-create-as and blockcommit command
lines? May I remove the snapshot with only a rm command?
http://events.linuxfoundation.org/sites/events/files/slides/2015-qcow2-expanded.pdf
I'll have a look at these, thanks.
The libvirt commands were towards the end, in part 3; but the first two
parts might give a better understanding of the overall operations of
what is happening.
Post by Jérôme
Post by Eric Blake
virsh snapshot-delete --metadata $dom $badname
to remove $badname snapshot that no longer exists because you changed
things behind the scenes.
Before removing the .xml file, I tried the command indicated in the wiki
[1] with no success.
"NOTE-2: Optionally, you can also supply '--no-metadata' option to tell
libvirt to not track the snapshot metadata -- this is useful currently
as at a later point when you merge snapshot files, then you have to
explicitly clean the libvirt metadata (by invoking: virsh
snapshot-delete vm1 --delete --current -- repeat this as needed.)"
Shouldn't the
virsh snapshot-delete vm1 --delete --current
be rephrased as
virsh snapshot-delete vm1 --metadata --current
Yep, sounds like a bug in the wiki, so I fixed it.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
Kashyap Chamarthy
2015-09-11 17:52:57 UTC
Permalink
[. . .]
Post by Eric Blake
Post by Jérôme
Post by Eric Blake
Yep, that about covers it. Note that the --quiesce step in snapshot
creation requires qemu-guest-agent running in the guest, and that you
trust interaction with your guest.
Yes, I think I get this. I don't really figure out what these cases
could be. We're using Debian Jessie and I installed qemu-guest-agent.
Other VM could use other systems, but most likely Linux based.
qga with support for quiesce has also been ported to Windows guests.
Post by Jérôme
Do you mean that, in cases where you shouldn't trust the guest, using
'--quiesce' might end up being worse than nothing? Or just useless?
If the agent is not running, using --quiesce will fail the entire
command; you'd learn pretty quickly to retry without --quiesce for
guests that don't know how to handle it. . But if the guest is
malicious, it can pretend to be a guest agent, but intentionally refuse
to reply to the --quiesce request, and leave libvirt hung waiting for a
reply. So it boils down to whether you trust your guests to be
reasonable with their guest agent connection (fine if it is your own
guests, not so much if you are hosting a cloud for other people's guests).
Post by Jérôme
Post by Eric Blake
Post by Jérôme
-> Anything wrong about my snapshot-create-as and blockcommit command
lines? May I remove the snapshot with only a rm command?
http://events.linuxfoundation.org/sites/events/files/slides/2015-qcow2-expanded.pdf
I'll have a look at these, thanks.
Yes, I highly recommend it. This talk gives an excellent under-the-hood
details of virtual machine disk image backing chain management.
Associated video:


Post by Eric Blake
The libvirt commands were towards the end, in part 3; but the first two
parts might give a better understanding of the overall operations of
what is happening.
Post by Jérôme
Post by Eric Blake
virsh snapshot-delete --metadata $dom $badname
to remove $badname snapshot that no longer exists because you changed
things behind the scenes.
Before removing the .xml file, I tried the command indicated in the wiki
[1] with no success.
"NOTE-2: Optionally, you can also supply '--no-metadata' option to tell
libvirt to not track the snapshot metadata -- this is useful currently
as at a later point when you merge snapshot files, then you have to
explicitly clean the libvirt metadata (by invoking: virsh
snapshot-delete vm1 --delete --current -- repeat this as needed.)"
Shouldn't the
virsh snapshot-delete vm1 --delete --current
be rephrased as
virsh snapshot-delete vm1 --metadata --current
Yep, sounds like a bug in the wiki, so I fixed it.
Indeed, it was a typo. I didn't even notice it until now as I just type
these commands from muscle memory.

Thanks, Eric, for fixing it (and for all the detailed responses).
--
/kashyap
Jérôme
2015-09-11 21:23:38 UTC
Permalink
Le Fri, 11 Sep 2015 10:45:34 -0600,
Post by Eric Blake
But if the guest is
malicious, it can pretend to be a guest agent, but intentionally
refuse to reply to the --quiesce request, and leave libvirt hung
waiting for a reply. So it boils down to whether you trust your
guests to be reasonable with their guest agent connection (fine if it
is your own guests, not so much if you are hosting a cloud for other
people's guests).
Of course. I didn't think of this use case.
Post by Eric Blake
Post by Jérôme
virsh snapshot-delete vm1 --metadata --current
Yep, sounds like a bug in the wiki, so I fixed it.
I thought so but didn't dare to be too affirmative about it.

Glad it is fixed. Hopefully, it will save someone some time and
trouble. Let this be my micro-contribution...

Thanks again.
--
Jérôme
Loading...