How does Pure transfer data during Backup and Restore operations?
Archiware Pure can access Virtual Machine data using two different methods, usually referred as transport modes. This article will explain how they work and what are their differences.
HotAdd transport mode
When Archiware Pure is running as a Virtual Appliance within the same vSphere environment that it is protecting, it is able to leverage the vSphere HotAdd feature to attach and detach SCSI virtual disks without shutting down. During backup, Pure instructs the vSphere Infrastructure server (either a vCenter or a standalone ESXi) to temporarily attach the virtual disks of the target VMs to the Pure Virtual Appliance. This disks then appear as local SCSI drives and Pure is able to access their data directly, using the ESXi I/O stack.
Because the data transfer speed is higher than in network transport mode and because it does not impact LAN bandwidth, the HotAdd mode is preferred and automatically chosen if the requirements are met.
The requirements for the HotAdd transport mode are:
- Pure runs as a Virtual Appliance within the protected vSphere
- ESXi host managing Pure VA has access to datastores where target VMs reside
Because HotAdd only works for SCSI disks and there is a limit of 15 disks that can be attached for each SCSI controller, Pure is limited to a maximum of 45 simultaneously attached virtual disks, meaning that no more than 45 Backup/Verify/Restore/Restore File jobs can be running at a time.
Network transport mode
If the HotAdd requirements are not met, Pure will automatically revert to the Network transport mode. This includes two transport modes that VMware refers to as NBD and NBDSSL, with Pure choosing the SSL protected method by default.
The Network transport mode uses the VMware's closed source implementation of the Network File Copy (NFC) protocol to transfer backup data using the ESXi networking stack. The only requirement is that Archiware Pure has network connectivity to target ESXi hosts on TCP port 902.
Despite Network transport mode typically being much slower than HotAdd, especially on sub 10 GB Ethernet speeds and despite it introducing additional load to the LAN traffic, it is a valid choice as it offers unparalleled freedom by having almost no requirements.
This allows installing Pure on any of the supported platforms such as physical Linux based servers, NAS devices and remote servers as long as there is network connectivity between Pure and the target vSphere. Without the need for a direct storage access, ESXi hosts can use local storage to host VMs and no longer require setting up shared datastores, which is a great advantage for smaller installations with fewer hosts.
Thank you for this information Marijan Kozic
Is there a way to tell from the logs which method was chosen by Pure to back up a specific VM?
As far as I remember, the Quick Verify option applies for the whole job so when you create a backup window, you have the option to check the Quick Verify box. If you need it applied only for some of the VMs, you can simply schedule them in separate windows, even if they overlap. When the second window starts, its VMs will be added to the queue and processed as soon as possible, but respecting their Quick Verify setting.
Thank you. I have only 1 backup window: start at 20:00, run for 10 hours
It is applied to all of my VMs, and so "quick" is checked for all of them. I have my parallelism set to only 1 (I tried higher but it had a negative effect on speed). So they run in sequence. But, for example, right now there is a VM that is still "verifying" even though it is well past the backup window stop time. And, Last Backup Disk usage was 4.2GB which completed in just 0min 59sec, yet Verify runs for 12+ hours? Something seems off. Total disk size of that VM is 370GB.
The fact that a backup or a verify runs even past backup window closing time is normal. Pure will not start a new job once the window closes but VMs already running will be allowed to finish. However, running for 12+ hours when backup runs in under a minute does indicate a problem. It could be something resolvable by a simple restart or it could be something deeper but without looking at the logs, it is hard to tell.
Total disk size of 370GB is the total disk capacity or the actually used size (with capacity being even higher than that)?
I believe that logs should indicate if the quick verify was successfully used (only applies to thin disks) so you might try looking there. Actually, the Event log should also indicate this in the job details.
@marijan Thank you again for the reply. I will try a restart of the host. Total disk size is 370GB (disk is thick provisioned) so I guess it has to check each block. As for logs, I looked but it does not indicate "quick" anywhere, even though I do have the checkbox enabled in the window setting.