-
Notifications
You must be signed in to change notification settings - Fork 35
Description
Hello,
First of all let me thank you for your amazing work! I would like to inquire whether it would be difficult to add Trim/Discard support to the Mars block device?
I will describe my use case, maybe you can offer some suggestions. We are in the middle of some infrastructural changes at the technical faculty of the local university. Currently only one major is affected, be we would like to involve the whole faculty someday. I found your work about Football and Sharding just perfect for our needs. Let me describe: At the moment we use Mars on a pair of servers to provide hot failover for VMs in case one of the hosts fails. There are a lot of VMs, both belonging to the infrastructure (like learning management system, teleconferencing server, etc.) and belonging to remotely accessible virtual labs for the students to practice on. This is basically a tiny (two node) cluster managed with OpenNebula, it stores all LXC containers and full VMs in qcow2 images. I have the images on top of a filesystem on top of a single Mars device. In case primary server would fail, I would promote the Mars device on the secondary and continue to run (almost) from where the primary failed.
What we would like to do is scale out to more then two servers, and in case of failure we just increse the density of VMs on the remaining hosts. Your Football solution with sharding would be great for this. I plan to write an OpenNebula storage driver which would involve Mars football and LVM thin volumes. Current OpenNebula storage drivers only supports thick LVM (which is not good for snapshoting), so that needs some adjustment anyway. While at is I also would like to use this new storage driver as a shared block storage, which makes it necessary to implement a version of the Football concept with sharding in OpenNebula. The idea is the following: If my proposed driver would create a Mars resource for every VM instance, and then one thin LVM pool on top of Mars, I would have the option of creating efficient snapshots of the VM. Once the cluster manager schedules the VM to another host, the Mars device is promoted, and all the relevant volumes to that specific VM will be available on the new host, along with the snapshots, giving the chance to revert to previous snapshots, if needed. The problem is that I would have to allocate a large space for the backing device of the Mars device in advance to accomodate the potential growth of the image and the space requirement of the snapshots. In order to avoid comitting much more space than needed in advance, and in order to reclaim the space that is freed up when deleting snapshots, it would be great for the Mars device to pass down discard requests to the underlying storage. Currently my only idea to save storage space on raw disk without discard support is to create the Mars device on top of kVDO (on top of HW-RAID), but that still needs that empty space be overwritten with zeroes at some regular intervals, which would produce a large amount of unnecessary replication traffic as well.
I would like to ask how difficult would it be to implement discard and whether You plan to implement it in the near future. So far (without the hopes of me understanding the whole Mars code) I found in mars_if.c that the BIO_RW_DISCARD property after all gets copied from orig_biow, which I hoped to mean that if the underlying device supports it, than Mars would also support it. Obviously I'm not getting it right, because no matter what I do, the Mars device itself always has a discard_granularity of 0, regardless of the underlying device's discard capability.
Thank a lot,
schefi