2.07 duplicating VM Guests
I tried 2.07 for a day. I found that the UI was showing multiple copies of the same VM, some of which had valid backup and some not. When I deleted the duplicates then shortly afterwards they would come back. It was essentially unuseable
I have reverted to 2.06 today and so far have not seen the issue occur (yet)
Incidentally its not down to VM's moving around between hosts (or at least not directly) as a friend of mine is running a single host and 2.07 since this morning and I spotted that already he had 2 "copies" of each VM on the pure UI
Hopefully not too serious a bug with the UI - but it does put my confidence level in 2.07 very very low
No pictures / logs I am afraid as I have deleted the appliance and rebuilt from scratch with 2.06
Without taking a look at the logs, it is hard to say precisely what has occurred but going by your description, this does not seem like a GUI bug but rather an artefact of VM discovery. When you connect Pure to your VMware system, it will gather a list of VMs and create its own representations for each of them. This list will be updated as new VMs appear. Keep in mind that the unique identifier of a VM, as far as Pure is concerned, is the VM path (e.g. "[Datastore 01] myvm/myvm.vmx") and if this path ever changes (e.g. VM moved to a different datastore), Pure will consider this to be a new VM.
When adding a new VM to its list, Pure will assign it a random unique name (GUID). Deleting that VM (if it is still present on the VMware side) will cause it to be rediscovered later and added to Pure list again - this time with another GUID. So although it might appear as you have added the same VM that you just deleted, from Pure standpoint, this is a new entity which will start a new backup chain.
I am explaining all of this because without any logs, only you can now what really happened. For example, if you remove your backup volume (e.g. /my/pure/backups1) and add a new, empty one (/my/pure/backups2), all of your VMs will be rediscovered and added to that new volume. Each volume will have its own list of the same VMs. Adding the previous volume back to Pure will cause the two lists to be merged: the VM with the most recent backup will be active while the other copy will be shown as 'offline' (usable for restore but no further backups possible to that backup chain). If one of the duplicates never had any backups, that one will simply be deleted, leaving only one version in Pure.
The above is what should and normally does happen. Removing and adding backup volumes should not cause any VM duplication in Pure. However, if you did not make any changes to your backup repository configuration, it is possible that there was either some kind of race condition or some other delay causing Pure to be unable to read the known VM list from repository and allowing the same VMs to be "discovered" in the connected vSphere in the meantime. The old obviously became available again, causing the duplication.
I will forward this to our testers and ask them to pay additional attention to this issue but if this ever occurs again, I would greatly appreciate the logs. Or if you could forward the logs from that friend of yours, that would be great.
Another thing that you could check is making sure that your backup volume is properly mounted before Pure is started. If you repository is located at '/my/pure/backups' which is actually a mount point for some external storage, it might be possible that Pure starts without the external system being mounted. This would cause the folder to appear empty. This might, in theory, cause Pure to re-populate the folder with the same VMs. Mind you, if you only use one backup volume, you will notice something is wrong since empty volume will imply missing configuration and having to update the vCenter/ESXi connection again (did something like this occur to you?). With more than one volume, it could be possible to lose the VMs from one volume but continue working since the configuration could be restored from the second volume.
We will test this further and if you find a way to provoke this behaviour do let me know.
I may have been moving a few VM's between stores. But not all of them although it is something to consider. I was doing a lot of work on the system yesterday.
However - surely the re-discovery of a VM, if moved to a new store, as a new VM is a fairly serious bug / design flaw. A VM, is the same VM even if moved to a new store. I, and I think many customers, use the concept of "swing storage" where a secondary store is configured that working VM's can be swung to so that the original store can be maintained, rebooted, reconfigured, fixed, replaced, upgraded etc and machines moved back again afterwards. To then have these detected as a new guest is a fairly fundamental design flaw is it not? Then when the store is moved back its yet another backup chain - ouch. It makes the GUI confusing and uses up quite a lot of space within the backup volume holding full backups of the same VM multiple times.
Just to be clear, if a new backup volume is used then I have no issues with a re-discovery generating a new backup chain as it sort of implies that the old chain is potentially lost (although what happens if you copy the backups from one store to the second). Note that this is NOT what was happening - I have used one store from start to finish.
Note that in the above post - I use a single backup store at all times. Not a single VM Store
Further testing shows that moving VM's between storage pools may well be the issue here. I am using FreeNAS as an iSCSI datastore and needed to reboot the NAS following some maintenance (and to test the changes).
First I shut down the Pure Appliance
Second I moved my working set of VM's to swing storage
Third made changes to FreeNAS, rebooted to test a few times - working OK
Fourth I moved the VM's back to the iSCSI pool
Fifth, restarted Archiware Pure
I now notice I have duplicates marked as offline of the working set of VM's and I am generating as a result another set of full VM Backups rather than incrementals.
How long do these orphan chains remain in existence. Are they pruned on a regular basis or do they just keep going as zombies until I get around to removing them. Actually, that a good name for them Zombie's
The Zombies will never be updated and so will never be deleted by the snapshot (backup) retention process. They will just exist and use up a "lot" of disk space.
I would describe this whole concept as sub-optimal behaviour.
Do people agree / disagree with me?
The reason why Pure uses the full VM path as the unique identifier for each VM is because this is the only piece of information that is guaranteed to be unique for that one VM. In VMware world, each VM has a unique 'managed object id' but it is guaranteed to be unique ONLY within one management entity (vCenter server or ESXi host). Two separate ESXi host may use the same moid for two different VMs (in practice, they often do since moids are just incremental names such as vm-13, vm-14...). If the hosts are connected to the same vCenter, each VM will be assigned new, unique moid (note that each host will still keep tracking them using its own moid system in parallel) but should the VM ever need to be restored to a new vCenter/ESXi or should you have to independently reinstall the vCenter, all that information would be lost and each VM will be, yet again, reassigned to the new id. This is especially problematic for users who do not use a vCenter at all, having instead several separate ESXi hosts where VM moids often do overlap. With Pure, we specifically wanted to include support for this type of setup as many users do not own a vCenter license.
I understand that migrating VM to a different storage will cause a backup chain to be started again but hopefully this won't happen on a regular basis (i.e. same VMs will not be flip-flopping between two datastores very often).
Now back to your concrete example. I am not entirely sure I correctly understood all the steps that you performed but this is how Pure is supposed to behave:
- If you move your VMs to a different datastore, next time that Pure connects with VMware, it will add the migrated VMs as new ones
- Since the 'old' VMs are no longer present (there are no VMs with the old paths in the VMware system anymore), they will be either A) shown as 'offline' (if they already have some backups); or B) will be removed from Pure.
- 'Offline' VMs from step 2 will be kept indefinitely as you might want to use them for restore (VMs would appear as offline in other circumstances as well, such as disaster recovery where the original VMs are truly unavailable). The 'new' duplicates can start new backup chains.
- If you move the VMs back to the old storage in VMware, the above process will happen in reverse. Any (old) VMs that were shown as 'offline' will again become active. Any 'new' VMs (those added in step 1) will either become offline (if Pure performed any backups on them in the meantime) or will be removed if no backups were done.
- Adding a previously used backup volume back into Pure (normally you do this on a volume level but even if you manually copied over a single VM backup folder) will force Pure to choose between the two copies of the same VM (if VMs with the same VMware path are found). The most recently backed up copy will be active while the other one will be 'offline' (or deleted if the backup chain was empty). Please do not that all of this occurs either on Pure start or when adding a volume through GUI, manually changing the Pure storage content while software is running is bound to cause undefined behaviour.
- If you do need to migrate your VMs to a different storage and you do end up with 'offline' copies, deleting those offline VMs will free up Pure storage (on next restart). You do need to make this decision yourself as Pure cannot guess whether you still need those VMs or not.
- In your case, if I correctly understand that you a) turned off Pure, b) moved your VMs to swing storage, c) do stuff, d) move the VMs back to the original location (same VM paths) and only then e) turned on Pure - in that case, Pure should be completely oblivious to the things you did while it was turned off and you should not have any duplicate VMs.
I hope I covered all possibilities here. Please compare your steps against the description above and let me know if you observe Pure behaving differently.
And please, try to collect logs as the problem might very well be caused by something entirely different (some delay causing Pure to rediscover 'new' VMs even before it reads the existing ones off its own storage?).
OK - that makes sense. I don't like it - in fact think its wrong - but I can understand it.
However taking point 7
I did exactly what you say. Pure was turned off whilst the VM's were on swing storage. Turned off beforehand. But I have duplicates of every single VM that I swung across. I am on 2.06 - but I still have the backup machine. These zombies contain the previous backups and a new chain was started with the "new" guests.
There is however nothing in the logs that I can see. Just backup completed successfully when I manually initiated a new backup