[Solved] Duplicate VM entries after restore  

  RSS

YannB
(@yannb)
Active Member
Joined: 5 months ago
Posts: 10
03/07/2019 11:23 am  

We are running Pure for weeks and are very happy to forget it, because it simply runs fine 🙂

Yesterday night we had to use Pure to restore a super important macOS VM we accidentally deleted from vCenter. The restore worked fine (GUI to server was disconnected twice, so it was a little bit suspense-packed, actually no speed info within Pure, so had to guess how long it takes) and we were able to start the VM. In a second step we could restore most recent files from a TimeMachine backup -- we also make TimeMachine Backups od macOS-VMs as they are done hourly and for us, within another server architecture.

We realised, that the new VM ("mac-office", I named it like the old one who was deleted from vCenter) got a new MAC address and is now listed twice in Pure: the old VM "mac-office" as "Offline" and the new one on dell-2 ESXi. We are starting with a fresh backup the next schedule (this night).

But how can we delete (old)-VM Backups within Pure? I would like to delete our Offline-VMs, "nix-server" (I can't explain why it was set as offline, a same named "nix-server" is also located on dell-2) and "mac-office".

Is there a simple way or do we have to dig into CLI?

Thanks a lot and regards,

Yann

PS: We are going low on our backup volume. Should I better expand it or add a new one?

This topic was modified 5 months ago by marijan

Quote
marijan
(@marijan)
Member Admin
Joined: 5 months ago
Posts: 18
03/07/2019 11:30 am  

Hi Yann,

I am very happy to hear that Pure is working as expected.
Let me try and clarify some of your questions...

MAC address
When restoring a VM, Pure will obey the MAC address preferences configured on the original VM. In majority of cases, this is set to 'generated' for ESXi generated MACs or to 'assigned' for vCenter assgined MACs. Only if the original VM had the MAC address manually configured would Pure restore it with the exact same address. This is to avoid conflicts in case the original VM is still present.
Does this make sense to you?
Do you strongly believe that the opposite approach would be better (forcing the very same MAC address even if it was auto-assigned originally)?

Duplicate and/or 'offline' VMs in Pure
Pure tracks VMs separately from vSphere. Yes, a Pure VM will normally be linked with its vSphere counterpart but it is perfectly normal to have a VM in Pure which no longer has vSphere instance (e.g. when a vSphere VM get's deleted like in your case). It is also possible to have a vSphere VM without corresponding Pure VM (at least temporarily). This occurs when a new vSphere VM is discovered and before Pure creates its own representation. Btw., if a vSphere VM was never backed up by Pure and later is deleted from vSphere also, Pure will remove this VM from its own inventory.

When a Pure VM has no vSphere counterpart (could not be found on any of the configured vCenter/ESXi servers), it is shown as 'offline' in Pure GUI. Offline VMs cannot be backed up (since there is no live vSphere VM to backup, at least at the moment) but they can be used for restore.

Additionally, if there are multiple Pure VMs pointing to the same vSphere VM, only one of them (the one with the most recent backup snapshot) will be active and all the others will be declared offline. This situation is possible if you get creative with adding, removing and re-adding storage volumes where you create some backups first on one storage volume then another and then you add them all together. Btw., you can see the storage volume where a Pure VM is located by expanding the VM list table to include more details (an icon in upper right part of the GUI).

VM Restore
During restore, Pure essentially creates a completely new VM in the vSphere (of course). The fact that you configured this new VM to have the same given name as the old one is coincidental. As far as Pure is concerned, they should be different VMs. The restore operation is performed on the old Pure VM and a new vSphere VM is created in the process.

However, in your case, both VMs have the exact same name and storage path (Pure uses storage path to the .vmx file as a unique VM identifier). This resulted in restore creating a new VM that points to the same vSphere VM. On next VM list sync, one of those VMs was declared as offline. In your case, it should be the newly restored VM as it does no have any backups yet.
If I am not mistaken, because both VMs point to the same vSphere VM and one of them is essentially empty, if you restart Pure one of them will get deleted. This means that backups will continue with the old VM. But because on vSphere side the VM is different (new), changed block tracking is reset and the very next backup will have to be a full backup. But you would have gotten the same result if the old VM got deleted instead of new one, except that this way you still keep the old backups for some time.

Deleting a VM in Pure
There is a hidden icon at the left side of each VM entry in Pure GUI ('...' icon visible when you hover over a table row). Clicking on this icon will give you a popup menu where you can manually start a backup for this VM, enable autobackups and remove the VM from inventory.
Clicking on 'remove' will give you more explanation but essentially it will immediately remove this VM from Pure inventory (think delete) while leaving the actual backup data in storage until the next Pure restart. On next Pure restart, backup data will get physically deleted and space reclaimed.

Adding more storage space
Pure organises storage into 'backup volumes' - essentially mount points that you provide.
One obvious method of increasing storage would be to externally increase the capacity of that mount point. Pure will detect that there is now more space and will continue as is.
Another option is to add more backup volumes (mount points). Anything acceptable to Linux (Ubuntu) is ok. However, depending on your backup situation, you may want to do some fiddling, as explained below 😀

Fiddling and workarounds
When Pure discovers a new vSphere VM, it immediately assigns it to the primary backup volume. From that point onwards, this VM will be backed up only to that volume. When you add a new volume and change it to become the primary volume, all newly discovered VMs will be assigned there. However, already known VMs will remain where they are.

Unfortunately, 'VM migration' feature did not make it into Pure 2.0.0 and was left for one of the future updates. This idea behind this feature was to allow a Pure VM to be migrated from one backup volume to another, either by copying all backup data or by discarding the existing data and starting from scratch on a new volume. Since volumes could have different chunk size settings, the operation is not trivial and is therefore still unavailable.

However, knowing what you know about VM assignment and deletion, you can move the existing VMs to the new volume by these steps:
1. add a new volume
2. make it a primary volume (and remove 'primary' attribute from the old one)
3. delete (remove from Pure repository) those VMs that you wish to have assigned to the new volume
4. on next VM list refresh, Pure will re-discover those VMs in vSphere inventory and will add them to Pure inventory again. Only now they will be assigned to a new volume (because it's the primary one)
5. restart Pure to delete the old backup data from the old volume.
PLEASE DO NOTE that while this will have the desired effect of re-assigning the VM to a different volume, in reality you are deleting a VM and creating it again. You will lose all existing backups for that VM, as well as any backup scheduling and tagging information. But if your current volume is getting full and you must make some space or if the VMs you want to move have no existing backups yet, this workaround could work for you.

Best regards,
Marijan


ReplyQuote
YannB
(@yannb)
Active Member
Joined: 5 months ago
Posts: 10
03/07/2019 11:52 am  

Hi Marijan,

Many thanks for the many plausible explanations!

MAC address
When restoring a VM, Pure will obey the MAC address preferences configured on the original VM. In majority of cases, this is set to 'generated' for ESXi generated MACs or to 'assigned' for vCenter assgined MACs. Only if the original VM had the MAC address manually configured would Pure restore it with the exact same address. This is to avoid conflicts in case the original VM is still present.
Does this make sense to you?
Do you strongly believe that the opposite approach would be better (forcing the very same MAC address even if it was auto-assigned originally)?

No, I am not convinced that it would be better to force the same MAC address. This may lead to duplicates, which is worse than re-generated random MAC addresses.

Is the UUID of the restored VM also regenerated?

Duplicate and/or 'offline' VMs in Pure

[...]

I get it.

Btw., you can see the storage volume where a Pure VM is located by expanding the VM list table to include more details (an icon in upper right part of the GUI).

That's a good point, I prefer this view than the reduced one.

VM Restore
[...]
If I am not mistaken, because both VMs point to the same vSphere VM and one of them is essentially empty, if you restart Pure one of them will get deleted. This means that backups will continue with the old VM. But because on vSphere side the VM is different (new), changed block tracking is reset and the very next backup will have to be a full backup. But you would have gotten the same result if the old VM got deleted instead of new one, except that this way you still keep the old backups for some time.

In our case the old VM still exists and the new one has been added. The old one can no longer be backed up (because it no longer exists in vSphere), the new one is regularly backed up. I wait a few days until I delete the backups of the old VM.

Deleting a VM in Pure
There is a hidden icon at the left side of each VM entry in Pure GUI ('...' icon visible when you hover over a table row). Clicking on this icon will give you a popup menu where you can manually start a backup for this VM, enable autobackups and remove the VM from inventory.

Thank you. I'll take care of it. I had forgotten the three points in the GUI...

Adding more storage space
Pure organises storage into 'backup volumes' - essentially mount points that you provide.
One obvious method of increasing storage would be to externally increase the capacity of that mount point. Pure will detect that there is now more space and will continue as is.

Since the storage is an NFS share that lies on a ZFS itself, it was sufficient to increase the quota value of the share. The Pure-VM was switched off, after it is switched on again the storage was extended. That's great!

Fiddling and workarounds

I think the easiest way to handle Pure is to use only one data repository. In our constellation, where the data is on a ZFS/NFS share, using another data repository doesn't make sense.

Unfortunately, 'VM migration' feature did not make it into Pure 2.0.0 and was left for one of the future updates. This idea behind this feature was to allow a Pure VM to be migrated from one backup volume to another, either by copying all backup data or by discarding the existing data and starting from scratch on a new volume. Since volumes could have different chunk size settings, the operation is not trivial and is therefore still unavailable.

The VM migration feature would be good, but is not a must-have. An interesting option would be to run the same backup on two repositories for redundancy. Veeam uses a storage replication task. In case one storage breaks down, the backup would still be available on the second storage. 

But even this is just an idea, if the feature were there I would most likely not use it (although I used it with Veeam). I want to simplify our VM backup infrastructure, not complicate it. 

However, knowing what you know about VM assignment and deletion, you can move the existing VMs to the new volume by these steps:

But it would be a waste to lose the old backup data. In case we can't increase our data storage, that would be an option to clean something up.

I still have two questions: 
- which parallelism setting do you recommend as a good compromise between performance and resources? Currently we have set a value of 2 for a Pure-VM with 2 CPU and 2 GB RAM. Will the backup time be significantly reduced if we increase this value? How much should we?
- The verify process takes a relatively long time: does it need to be activated? Can we determine if the verification actually found and repaired corrupted data? In your opinion, is it needed in our case where we write data to an NFS/ZFS volume?

Thanks Marijan for your help. I think you convinced me to use Pure instead of Veeam. 🙂


ReplyQuote
marijan
(@marijan)
Member Admin
Joined: 5 months ago
Posts: 18
03/07/2019 12:15 pm  

Hi Yann.

I'll just quickly answer your questions.

Is the UUID of the restored VM also regenerated?

No, uuid (along with InstanceUuid and some other parameters) is skipped so that vSphere can generate values that will not conflict with the current state. Since none of these are guaranteed to be unique except within a single vSphere environment, restoring to a different vSphere could cause problems.

In our case the old VM still exists and the new one has been added. 
The old one can no longer be backed up (because it no longer exists in vSphere), 
the new one is regularly backed up. I wait a few days until I delete the backups of the old VM.

Ah, then I misunderstood the situation and Pure did not, in fact, confuse the two VMs into one.
Was the restored VM placed in the exact same location as the one that was deleted earlier (e.g. [Same datastore]/)?

> I think the easiest way to handle Pure is to use only one data repository.

I agree. Since a backup volume is just a mount point, this allows you to increase capacity in many ways, depending on the actual hardware backing.

An interesting option would be to run the same backup on two repositories for redundancy. Veeam uses a storage replication task. In case one storage breaks down, the backup would still be available on the second storage. 

What we are trying to achieve with Pure2.0 is to have backups stored in such a way that it will be easy (trivial) to create mirror copies using P5 or other external tools. This also opens the possibility of backing up to tape and to cloud using time tested and reliable methods.
Backup replication directly from Pure is certainly something we will consider for future updates. As well as some smart interconnection with P5 so you can e.g. select the required VM backup snapshot to pull from tape and send to Pure from within P5 GUI. But these are just my wishes for now.

which parallelism setting do you recommend as a good compromise between performance and resources? 

This is something I encourage you to play with and report back the real world measurements. As far as I could see from testing, the bottleneck is always the disk IO. Throwing big amounts of RAM at Pure did not significantly increase performance - Pure will happily use up all the available memory but it will just act as an oversized copy buffer. Neither backup nor verification are very CPU intensive - most of the time, Pure simply moves data from A to B.
If you have time and resources, it would be great if you could try different settings and give back some feedback.

 The verify process takes a relatively long time: does it need to be activated?

Verification process was introduced as a way to make sure that what you and up in backup indeed does correspond to the actual VM state. This would detect read, write and network transport errors - all of which are highly unlikely. Additionally, using ZFS should eliminate read/write errors.
However, the main reason to introduce this feature was the nasty CBT bug that was active when we started developing and which went without a bugfix for a very long time (months? years?). The result was backup corruption in certain cases so we wanted to provide some means of checking and correcting this. So essentially, verify protects you from VMware bugs.

 Can we determine if the verification actually found and repaired corrupted data?

You can check the details of a verification (and other jobs) in the Event Log. There is again a hidden '...' icon to the left of each entry. If there was nothing to correct during verification, you will find entries such as 'verifying : all blocks are consistent' and near the end 'total backup 0/xxxxx (0/xMB): done.'. A successful verify won't write anything to the existing snapshot. The other number 'xxxx' is the amount of data that was changed on the vSphere VM between the backup snapshot being verified and now. This is the amount of data we cannot verify since it has changed - we have nothing to verify against.

In your opinion, is it needed in our case where we write data to an NFS/ZFS volume?

Verification was designed to be an optional feature and new backups always have greater priority than verification. The idea was for you to configure backup windows that you can live with and Pure will make sure to do backups first and if there is enough time left, it will start verifying some of the VMs. Those that do not make it will have higher priority next time. Additionally, you can enable or disable verification for each backup window separately so you could e.g. configure daily backups without verification and then have a separate bw with verification over weekend.

I am glad you like Pure and am grateful for all the feedback.

Cheers


ReplyQuote
Share:

Please Login or Register