Hello,
We are currently in a bad design situation where a windows server 2016 (virtual machine, hosted on vmware esxi), as a storage server, access is storage via virtual disk.
This virtual disk is located on a iscsi datastore, mounted on the ESXi. We have 4 vdisks on the VM using this storage.
The iscsi storage array is on a different datacenter than the ESXi (and thus, the windows server).
This is very bad by design but the situation is "like this" and I have to deal with it.
We faced 2 network outage between both datacenters, and each time, all the previous versions of different storage volumes were lost.
First network outage, we lost previous versions on disk 1 and 2, and this second network outage lost previous versions on disk 3 but not other disks.
I think it depends on disk activity at the moment the network outage happens.
Also take not that data volumes have storage space for VSS snapshots on a different volume V: (also hosted on iscsi datastore)
In event log, it looks like this (this time the G: device lost its snapshots) :
First, some warnings about the storage not available :
![]()
Then some errors like this :
![]()
Also some of these :
![]()
And then, the terrible, the horrific, the end of all :
![]()
"The shadow copies of volume G: were aborted because of an IO failure on volume V:"
I know this is wrong by design, but I would like to know if I could tweak some values to avoid or at least mitigate this.
Like disk timeouts or whatever.
Because this will happen again (network outage) but they are always between 1 - 15 min outage, and if I could tell windows to wait a bit longer before considering to erase all snapshots, it would be great !
Thanks for any help :)