A few simple notes on Win2003/Win2008 cluster repair and commands.

Version : 1.1
Date : 24/04/2012
By : Albert van der Sel
For who : For anyone who needs to maintain clusters, and work with cluster commands


This simple note might be used as a humble source of hints on how to solve problems.
Also, some general information about Windows Failover Clusters can be found too.
But it's certainly not any advanced: It only seemed handy, to have some common info gathered into one doc.

Still, the following is very important:

- You should always crosscheck the hints and methods with other sources, or your support services.
- In case of any doubt, do NOT use the suggestions listed in this note.


There are quite a few differences between Win2K3 and Win2K8 clusters, and that's also true for many repair options.
So be carefull to note for what for platform a certain method applies.


Main Contents:

1. Win2K3: Repair the Cluster Database en log.
2. Win2K3/Win2K8: How Cluster disks are registered.
3. Some simple "diskpart" sessions to show differences in Disk Signatures.
4. Win2K3: Repair Disk ID's.
5. Win2K8: Repair the Cluster Database en log.
6. Example Directory listings "%systemroot%\cluster"
7. Simple flow of a Planned shutdown of a 2 node cluster.
8. Some "cluster.exe" command examples.
9. Some notes on the Cluster log
10. Some notes on Storage
11. Some notes on Storage problems
12. Some general guidelines on Replacing a failed Disk


1. Win2K3: Repair the Cluster Database and Quorum files.

Essentials:

Use only on a Windows 2003 Cluster:

This section deals with:
- A Cluster on Win2K3
- The Quorum files: Chkxxx.tmp and Quolog.log
- The Cluster database: %systemroot%\cluster\CLUSDB

Error: How to handle a situation where the Cluster Quorum might be missing or corrupt.


The Quorum files are present on the Quorum drive (say Q:) in the "Q:\MSCS" directory.
The Cluster Database is present in the "%systemroot%\cluster" directory.

- For the Quorum, the following files should exist:

⇒ Chkxxx.tmp

It is a sort of copy of the cluster configuration database "%Systemroot\Cluster\CLUSDB"
Suppose you have a 2 node cluster. Both then will have a "%Systemroot\Cluster\CLUSDB" database.
Just consider this quorum file (on a shared disk) as the master compared to those local databases.
To proof that: suppose you add a node, then it will create a local CLUSDB copy from the Chkxxx.tmp file.

Here we will consider the situation where the file(s) in Q:\MSCS has gone bad.

If you have the original release of Windows Server 2003, the Cluster service will not start if
the Chkxxx.tmp file is missing or corrupt. However, if you have Windows Server 2003 as of SP1,
in many situations the Cluster service can automatically re-create this file if it is missing or corrupt.

⇒ Quolog.log

The quorum log, which records changes to the cluster configuration database, but only those changes
that occur while one or more nodes are down. The file exists even when all nodes are functioning,
but information is added to it only when a node is offline. Information in the log is carefully
marked according to sequence and timing so that it can be used correctly when nodes go down and come back up.

If you have the original release of Windows Server 2003, the Cluster service will not start if Quolog.log
is missing or corrupt. However, if you have Windows Server 2003 as of SP1, in many situations the Cluster service
can automatically re-create this file if it is missing or corrupt.

⇒ Cluster database CLUSDB:

Cluster database—clusdb
A hive under HKLM\Cluster that is physically stored as the file "%Systemroot%\Cluster\Clusdb".
When a node joins a cluster, it obtains the cluster configuration from the quorum and downloads it to this
local cluster database.

Repairs:

1. To replace a missing or corrupt "Chkxxx.tmp" file, or "Quolog.log", or both on the quorum resource:

If the Cluster service is running (with the /fixquorum option), stop it by typing:

net stop clussvc

On a node that was functioning correctly when problems with the quorum resource appeared,
restart the Cluster service with the "/resetquorumlog" (/rq) option by typing:

net start clussvc /resetquorumlog

The resetquorumlog command creates a new quorum log file, if missing or corrupted, using information stored
in the local node's cluster database and creates a new registry checkpoint file. The new quorum file
is created using information in the cluster database located in %systemroot%\cluster\CLUSDB. If the quorum log file
is not missing or corrupt, this command has no effect.

So the /resetquorumlog option, will recreate if neccessary:

-Quolog.log
-Chkxxx.tmp (registry checkpoint file)

Conclusion:
If you have a missing or corrupt Quorum, and the clussrv services won't start, just use
net start clussvc /resetquorumlog
which will cleanup the corrupt Quorum files, and create new Quorum files (from the local CLUSDB copy)


2. To replace a missing or corrupt "CLUSDB" file:

Like any files, the cluster configuration file (CLUSDB) on a node can become corrupted.

If you try to start the Cluster service with the /fixquorum option on one node at a time
and discover that this fails on one node although it succeeds on another, the cluster configuration file
on the node from which you cannot start the Cluster service might be corrupt.

In Cluster Administrator, view the functioning nodes in the cluster, and find the node that owns the quorum resource.

From the node that owns the quorum resource, view the files on the quorum resource, and locate the Chkxxx.tmp file.

On the problem node, which is not joined to the cluster at the moment, in the systemroot\cluster folder,
locate the CLUSDB file (which you have determined is corrupt) and then rename it.

Copy the Chkxxx.tmp file to the %systemroot%\cluster folder on the problem node, and then rename that file CLUSDB.

If the problem has been corrected on the node, you will be able to start the Cluster service with no start parameters.


3. Force a cluster without quorum, to start with the /forcequorum switch :

If you do not have a "majority" of voting entities (nodes and the Quorum/Witness disk), you can still try to start
the Cluster using the "/forcequorum" switch.

Warning: this could be a dangerous option, so, only use it for very specific and controlled situations.
You must crosscheck this option with other sources of information.

Suppose your Cluster uses some majority node set model. Suppose you have 4 nodes, and two are presumably crashed.
Suppose you cannot be fully sure of the latter, because two nodes are remote, and there might be a severe communication
failure.
In this case, you have to make sure that all nodes are shutdown. Maybe you just simply disconnect
that remote site, if possible.

There still exists a situation where the "quorum is lost", that is, the number of voting devices is below the
minimum required.

Suppose that at your site you must start the cluster. Say that at your site, the (good) nodes
"node3" and "node4" are present. This site can be forced to continue even though the Cluster service thinks
it does not have quorum.

Then:

.Keep the other nodes down or disconnected.

.Start the cluster service on node3 and node4 with the special switch "/forcequorum node3,node4"
For example, on node 3 use: net start clussvc /forcequorum node3,node4

.Do not modify this partial cluster, like adding nodes, move groups, or any other Cluster modification.

If the normal situation is restored again (node1 and node2 are good again), shutdown all nodes,
and startup all machines in the usual manner (without the /forcequorum switch) .



2. Cluster disk registrations.

Essentials:
- Win2K3 and Win2K8
- How are cluster disks "found" and reckognized by the cluster service and associated drivers

Win2K3 and Win2K8 identifies the cluster disks, in different ways.


⇒ In Win2K3 the "Disk Signature" is found by inspecting

HKEY_LOCAL_MACHINE\Cluster\Resources\ResourceGUID\Parameters

The corresponding Device name is then found by inspecting:

HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\ClusDisk\Parameters\Signature\ (Win2K3)


⇒ In Win2K8, the Cluster Service uses two attributes to identify a clustered controlled disk,
namely the Disk Signature and the LUN ID.

The following figures will illustrate this:

Fig 1. Windows Server 2003: find the Disk Signature directly from HKLM\Cluster\Resources\ResourceGUID\Parameters



Fig 2.Windows Server 2003: find the Disk Device from HKLM\System\Services\ClusDisk\Parameters\Signatures



Fig 3. Windows Server 2008: find the Disk GUID from HKLM\Cluster\Resources\ResourceGUID\Parameters\DiskIDGuid



Fig 4.


These differences between Win2K3 and Win2k8, has important consequences for repairs when something
has gone "broken".

  • In Win2K3, if for some reason the Disk Signature has gone bad, you need to use DUMPCFG (or another tool),
    to set matters straight again.

  • In Win2K8, using tools like DUMPCFG are almost unneccessary, because Win2K8 in most cases can repair a broken
    situation. In FailOver Cluster Manager, you have a "repair" option that will set things straight.


See also section 3 to see the diferences in DiskID (Win2K3 with MBR disk) and Disk GUID (Win2K8 with GPT disk)
as shown by the DISKPART utility.



3. Example DISKPART sessions.

Essentials:
- Win2K3 and Win2K8
- How are cluster disks are shown by the DISKPART utility, on those Operating Systems.

Here, we will display two DISKPART sessions, one on a Win2K8 cluster node,
and one on a Win2K3 cluster node.

DISKPART is a prompt utility for lowlevel disk actions. It can be used for locally attached disks, as well
as for LUN's as exposed from a SAN. Here, we will only use DISKPART for displaying information.

Win2K8 node:

C:\TEMP> diskpart

DISKPART>

DISKPART> list disk

Disk ### Status Size Free Dyn Gpt
-------- ------------- ------- ------- --- ---
Disk 0 Online 32 GB 0 B
Disk 1 Online 15 GB 1024 KB
Disk 2 Online 20 GB 1024 KB
Disk 3 Online 16 GB 1024 KB
Disk 4 Reserved 19 GB 0 B *
Disk 5 Reserved 140 GB 0 B *
Disk 6 Reserved 100 GB 0 B *
Disk 7 Reserved 80 GB 0 B *
Disk 8 Reserved 25 GB 0 B *
Disk 9 Reserved 517 MB 0 B *
Disk 10 Reserved 502 MB 0 B *

DISKPART> select disk 6

Disk 6 is now the selected disk.

DISKPART> detail disk

NETAPP LUN SCSI Disk Device
Disk ID: {ED209434-0A84-497F-86DB-961ED43C65AC}
Type : iSCSI
Status : Reserved
Path : 0
Target : 0
LUN ID : 1
Location Path : UNAVAILABLE
Current Read-only State : No
Read-only : No
Boot Disk : No
Pagefile Disk : No
Hibernation File Disk : No
Crashdump Disk : No
Clustered Disk : Yes

Volume ### Ltr Label Fs Type Size Status Info
---------- --- ----------- ----- ---------- ------- --------- --------
Volume 9 L SAN LOG1 NTFS Partition 99 GB Healthy

DISKPART> help

Will show you all possible commands.


Win2K3 node:

C:\TEMP>diskpart

DISKPART>

DISKPART> list disk

Disk ### Status Size Free Dyn Gpt
-------- ---------- ------- ------- --- ---
Disk 0 Online 68 GB 0 B
Disk 1 Online 502 MB 0 B
Disk 2 Online 96 GB 0 B
Disk 3 Online 76 GB 0 B
Disk 4 Online 36 GB 0 B
Disk 5 Online 16 GB 0 B
Disk 6 Online 86 GB 0 B
Disk 7 Online 75 GB 0 B
Disk 8 Online 41 GB 0 B
Disk 9 Online 15 GB 0 B
Disk 10 Online 76 GB 0 B
Disk 11 Online 16 GB 0 B

DISKPART> select disk 6

Disk 6 is now the selected disk.

DISKPART> detail disk

NETAPP LUN SCSI Disk Device
Disk ID: 7690CC96
Type : iSCSI
Bus : 0
Target : 0
LUN ID : 5

Volume ### Ltr Label Fs Type Size Status Info
---------- --- ----------- ----- ---------- ------- --------- --------
Volume 5 J SAN_LOG1 NTFS Partition 86 GB Healthy

DISKPART>


Did you notice the difference in "Disk Signature" between Win2K8 (using GPT) and Win2K3 (using MBR)?
While Win2K3 explicitly uses a "Disk ID" (here shown as 7790CC96) with a Master Boot Record Disk (MBR), Win2K8 uses
a Disk ID as a GUID with a Global Partition Table (GPT) disk (here shown as ED209434-0A84-497F-86DB-961ED43C65AC).



4. Win2K3: Repair Disk ID's.

Essentials:

Use only on a Windows 2003 Cluster:

A physical disk resource may fail to come online, and/or the Cluster service may fail to start.
Sometimes, the following entry can be found in the system log:
Event ID: 1034
Source: ClusDisk
Description: The disk associated with cluster disk resource DriveLetter could not be found.
The expected signature of the disk was "DiskSignature".

The DiskID of a disk, is not what the cluster expected it to be.
For some reason, it's missing, or it's another signature, different from what is registered in the ClusDB.

On Win2K3, Disk signatures are stored on the physical disk in the master boot record (MBR).

3.1 Using "dumpcfg.exe"

On Win2K3, the "dumpcfg" utility is actually replaced by the "ClusterRecovery utility" for 32 bit systems.
We still mention "dumpcfg", because on Win2K3 it still works, and many sysadmins still use it.
Furthermore, the "ClusterRecovery utility" is for 32 bit systems only, which severely limits the applicability.

If you are sure the problem really is due to a changed Disk ID (and the system log points to that fact),
then on Win2K3, you might use "Dumpcfg.exe" to write the expected signature back to that disk.

The signatures of the disks are also registered in the following registry subkey:

HKLM/System/CurrentControlSet/Services/Clusdisk/Parameters

For example:

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\ClusDisk\Parameters\Signatures\547770D6]
"DiskName"="\\Device\\Harddisk8"

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\ClusDisk\Parameters\Signatures\7690CC8C]
"DiskName"="\\Device\\Harddisk9"
etc..

The "Clusdisk.sys" driver needs this information to bring the diskresourses online.

From the active node, that is, the owner of the diskresource, do the following:

-Make a carefull note of the DiskID signatures and the Disknumbers.
-From Computer Management, Disk management, make a a list of lables and Disknumbers.
-From the systemlog, write down the missing Disk ID (signature) and the disknumber or drive letter.

Compare the lists carefully.
You should be able to deduce which Disk number has the wrong DiskID or Signature.

Suppose you have found that for Disknumber 7, the DiskID should be "7690CC8C"

Shutdown the other nodes in a gracefull way.
You only have the owner of the diskresource running.

From the prompt, use "dumpcfg" as follows:

dumpcfg.exe -s 7690CC8C 7


3.2 Using "ClusterRecovery utility":

Again, originally, the tool is for 32 bit systems.

One way to find out what type of system you have, is using the "systeminfo" command.
Read the output carefully. It should say whether you have a x86 or x64 system.

The "ClusterRecovery utility" is a graphical tool, and it's packed with the Win2K3 Resource Kit.
You can use it to:

-Help after replacing a failed cluster disk
-Recovering from disk signature changes
-Migrating data to a different disk on the shared bus

if you have had a disk failure and you must replace or migrate a clustered disk to a new disk,
the "ClusterRecovery utility" can be of help.
Ofcourse, you yourself have to create a new LUN, partition and format the disk and restore all data to your new disks.
After that, you can use the Cluster Administrator to create a new Physical Disk resource to manage the new disk.
Here is where Cluster Recovery comes in handy. It can analyze the cluster resources to find any resource that is dependent
on the original disk resource and move that dependency to the new disk resource. It will rename the original disk resource to
“Original Name (lost)” and rename the new disk resource to “Original Name”. All in all, it helps in replacing a failed disk.



5. Win2K8: Repair DiskID's, Cluster Database, and Quorum files.

⇒ Handling diskproblems like DiskId's:

To repair DiskId's in Win2K8, or other SAN disk related problem, you can use the Failover Manager
in the following way. Select the problem disk, and choose "Repair".

Fig 5.



6. Example Directory listings "%systemroot%\cluster".

⇒ Example Win2K3:


Directory of C:\WINDOWS\Cluster

04/12/2010 05:27 PM 448,512 ClAdmWiz.dll
02/03/2007 05:40 AM 1,331,712 clcfgsrv.dll
04/12/2010 05:27 PM 16,437 ClCfgSrv.inf
04/12/2010 05:27 PM 78,848 ClNetRes.dll
04/12/2010 05:27 PM 87,040 ClNetREx.dll
04/12/2010 05:27 PM 269,824 CluAdmEx.dll
04/12/2010 05:27 PM 770,560 CluAdmin.exe
04/12/2010 05:27 PM 79,872 CluAdMMC.dll
09/03/2011 03:35 PM 110,592 CLUSDB
02/03/2007 05:40 AM 647,168 clusres.dll
02/03/2007 05:40 AM 1,236,992 clussvc.exe
10/26/2011 07:22 AM 7,803,270 cluster.log
09/03/2011 03:35 PM 209,901 cluster.oml
04/12/2010 05:27 PM 65,536 DebugEx.dll
04/12/2010 05:28 PM 133,120 MQClus.dll
04/12/2010 05:29 PM 97,280 ResrcMon.exe
04/12/2010 05:29 PM 42,496 VSSTask.dll
04/12/2010 05:29 PM 79,360 VSSTskEx.dll
04/12/2010 05:30 PM 11,264 WSHClus.dll


Directory of Q:\MSCS

19-02-2009 18:19 "DIR" 080b433e-bdca-4e58-b817-6e4e1ad18af3
17-02-2009 17:09 "DIR" 4f0ded55-3427-4432-b9e9-3b3388306859
17-02-2009 19:02 "DIR" f2cfc595-ea29-4774-bb9c-17b141079e6f
26-10-2011 05:23 77.824 chk963C.tmp
26-10-2011 06:31 32.768 quolog.log



⇒ Example Win2K8:

Directory of C:\Windows\Cluster

14-07-2009 02:40 668.160 ClNetCfg.dll
14-07-2009 02:40 69.120 clnetres.dll
01-11-2011 11:20 262.144 CLUSDB
14-10-2011 07:26 5.242.880 CLUSDB.1.container
14-10-2011 07:26 5.242.880 CLUSDB.2.container
14-10-2011 07:26 65.536 CLUSDB.blf
11-08-2010 06:16 1.240.064 clusres.dll
11-08-2010 06:15 4.577.280 clussvc.exe
14-07-2009 02:40 213.504 DfsrClus.dll
28-04-2010 13:30 "DIR" en
28-04-2010 13:30 "DIR" en-US
14-07-2009 02:49 11.776 FailoverClusters.Agent.Interop.dll
14-07-2009 02:51 3.887.104 FailoverClusters.Common.dll
14-07-2009 02:46 823.296 FailoverClusters.ObjectModel.dll
14-07-2009 02:48 2.011.136 FailoverClusters.SnapIn.dll
14-07-2009 02:46 40.960 FailoverClusters.SnapInHelper.dll
14-07-2009 02:40 239.616 FailoverClusters.SnapInSupport.dll
14-07-2009 02:46 126.976 FailoverClusters.Validation.BestPracticeTests.dll
14-07-2009 02:47 15.872 FailoverClusters.Validation.Common.dll
14-07-2009 02:46 274.432 FailoverClusters.Validation.GeneralTests.dll
14-07-2009 02:46 162.816 FailoverClusters.Validation.StorageTests.dll
14-07-2009 02:47 163.840 FailoverClusters.Validation.Wizard.dll
14-07-2009 02:47 655.360 FailoverClusters.Wizards.dll
14-07-2009 02:41 27.136 iSNSClusRes.dll
14-07-2009 02:41 87.552 mqclus.dll
14-07-2009 02:41 37.376 mqtgclus.dll
14-07-2009 02:41 51.200 nfssh.dll
14-07-2009 02:41 103.936 nfsshEx.dll
27-05-2011 00:21 "DIR" Reports
14-07-2009 02:39 663.552 rhs.exe
14-07-2009 02:41 27.136 vsstask.dll


Directory of Q:\cluster

01-11-2011 11:31 262.144 0.hive
14-10-2011 07:30 5.242.880 0.hive.1.container
14-10-2011 07:30 5.242.880 0.hive.2.container
14-10-2011 07:30 65.536 0.hive.blf



7. Steps for a planned shutdown of a 2 node cluster.

1) move all Resource Groups to one server (example - move all resource groups to Node1, Node2 is now empty).
2) shutdown the "empty" server Node2.
3) Once Node2 is shutdown then shutdown Node1
4) Start up Node1
5) Wait for Node1 has fully started up (All resources come online)
6) Start up Node2
7) Move resource groups to the desired server, if that's neccessary



8. Some CLUSTER.EXE command examples.

The cluster command can be used for really quite some cluster administration from the command prompt.
Although you can create and configure clusters using the cluster command, here we will focus somewhat
on how to obtain information from a cluster.


Listing information:

- Get all cluster names as registered in DNS:

C:\TEMP> cluster /list

Cluster Name
---------------
SQLCLUS1
SQLCLUS2
SQLCLUS3


- Get the properties for SQLCLUS3:

C:\TEMP> cluster SQLCLUS3 /prop

Produces a listing of properties for 'SQLCLUS3'
..
..

- Get version info of a Cluster:

C:\TEMP> cluster SQLCLUS3 /ver

Cluster Name: SQLCLUS3
Cluster Version: 5.2 (Build 3790: Service Pack 2)
Cluster Vendor: Microsoft(R) Cluster service


- List the nodes of a Cluster:

C:\TEMP> cluster node


- Get Quorum information of a Cluster:

C:\TEMP> cluster /quorum


- Get a list of all disks of a Cluster:

C:\TEMP> cluster SQLCLUS3 res


Modifying a Cluster:

- Example on how to move a "resource group" from the current node to another node:

C:\TEMP> cluster group "print1" /moveto:prod2

Note: if you have only a two-node cluster, you can just use /move without a node name.

As another example:

C:\TEMP> cluster . group "Cluster Group" /move:prod2

Here, the period means that we’re modifying the local cluster.
Or, the command in general form:

C:\TEMP> cluster SVCLUS3 group "Cluster Group" /move:prod2


- Example of creating a cluster using the cluster command:

You can even create a cluster using cluster.exe. The following extremely simple example will only create the basic cluster,
and Resources still needs to be added.

C:\TEMP> cluster MYCLUSTER /create /ipaddress:10.10.10.1/255.255.255.0 /Nodes:"srv1","srv2"



9. Some notes on the Cluster log.

Win2K3 Cluster:

With Win2K3 clusters, you will find the "cluster.log" file (which is just a plain text file) in the
"%Systemroot%\cluster" directory. For example, "cluster.log" could be found in "C:\Windows\Cluster".


Win2K8 Cluster:

Per default, you will not have an flat ascii file with all log events. Its been replaced by a more sophisticated
event based tracing system, as part of the "Event Tracing for Windows" (ETW) infrastructure. The cluster events can be viewed from the graphical Win2K8 FailOver Cluster Manager.
From that graphical snap-in, you can define queries with all sorts of criteria like start- and ending time
of the events that you want to see.

However, in some cases, you still want to view all the events in a plain flat file.
In that case, you need to create one.
For that, you can use the "cluster.exe" command.

Examples:

In the most simple form, a report will be generated in "the %systemroot%\cluster\reports" directory, using:

C:\TEMP> cluster log /gen

Some nice switches can be used like:

/Copy:directory (for example: /Copy:logs, where logs should be a direcory below "your current path").

/Span:minutes (for example /Span:30, so that your log will only contains entries from the last 30 minutes).



10. Some notes on Storage.

In some Operating systems, when talking about storage, it's really easy to understand on which disk,
and which partition, you are working with. For example, take a look at this command:

starboss#:/etc> mount -F ufs -o logging /dev/dsk/c0t0d0s3 /mnt

Here, I use the "partition3" (s3), located on "disk0" (d0), located at "target0" (t0), located at "controller0" (c0),
and make it available, as if it's really a local directory "/mnt".

Although the representation c0t0d0s3 is very symbolic (acutally a "logical device name"), thanks to this hierarchical
representation, it's quite easy to understand. At least, that is what I think.
(Note: the true "physical device name" could be not so easy to grasp.)

Indeed, in a simple Windows setup, we have a similar situation. We all understand for example a C:, D:, G: drives.
And if these are associated with partitions on local disks, it's all not too difficult to understand.

But when dealing with a Cluster, almost always a sort of "SAN" (Storage Area Network) is involved, to host
the disks which are under the control of the cluster.
In such a case, often the C: and (optionally) D: drives are local disks, while usually drives as F:, G:, etc..,
are associated with "luns" on a SAN.
Since having a good notion on "storage" is important for maintaining Clusters, we will spend
quite some time on this subject.


10.1 Shared disks: In Windows it's "Shared Nothing" - One node (or owner) at the time.

In a Cluster two (or more) nodes need to be able to access the clusterd controlled disks.
A Windows Cluster, is a "Fail-Over" cluster, which means that in normal operation, only one node may access it,
while the other node(s) may not.

Some people speak about active-active, or active-passive clusters. We can leave that out of the discussion
right now. We really do! Trust me.

We can say that the currently active node is the owner of the clustered controlled disks.
The other node(s) must stay away from the disks.

This also means that in Windows clusters, the storage must be able to understand and implement a sort
of "reservation bit/flag", which will make access exclusive for the node which has set that flag.
In many storage implementations, the socalled "SCSI-3 Persistent Reserve" needs to be supported.

When the active node crashes, ofcourse the other node then gets exclusive access to the shared disks.

The qualifier "shared" must then be understood to be the property that in principle both Nodes can access them,
but ONLY one node "at the time".

Notes:
1. A Failover cluster is unlike to for example an IBM gpfs cluster, where multiple nodes can access the disks simultaneously.
But that cluster is specifically designed for that purpose, and under the hood, many management processes are
(for example) watching for, and managing, locks etc..
For a Fail-Over cluster, that's all unneccessary.

2. In fact, the implementation of the SCSI-3 Persistent Reserve (PER), could be a cause of Storage problems
under clustering, because maybe you explicitly need to "set" (or swich on) that specific option on your storage.


10.2 A note on types of (non-local) Storage.

Truly local storage (at a Server) is often called DAS or Directly Attached Storage.
Maybe the C: (and D:) drives at a Node, are local, but cluster controlled disks usually are not local.
So, "where are they then"?

When you are connected to (what we now see as) a traditional SAN, your Server might have one or more
HBA Fiber cards, which goes to one or more switch(es), which is then further connected to the Storage arrays.
Typically, the elements in transfer are "block address spaces" and "datablocks", and that's why
people talk about "block I/O services" when discussing (traditional) SAN's.
So, here SCSI block-based protocols are in use, over Fibre Channel (FC)

This has been much more "relaxed" nowadays, because you could also connect to SAN using (more or less ordinary)
networkcards. In this case, the elements of "conversation" between the Server and diskarrays,
are just enveloped by more or less regular network packets, like in a normal TCPIP network.
Some time ago, people would say that now you were connected to Network Attached Storage or NAS.
In such a case, typical "filing" functions are used, using CIFS, SMB or NFS or other Server/Redirector type
of protocol. Below SMB or NFS (or other), the "regular" network protocols are used, like TCP-IP.

Nowadays, the enclosures housing the diskarry's, are very intelligent, and can handle block io, often using
FC ports, and network SMB filing protocols. These SAN/NAS devices are often simply called "Filers".

The storage in the arrays, ofcourse, needs to be accessed by your Node. The actual disks inside the array,
are often organized in one or more RAID volumes. A number of partitions then are defined from those volumes,
which acts as seperate addressable Units. If fact, they behave like single diskdrives, although we (as a consumer)
do not know about the true physical implementation.
In another common implementation, a number of physical disks are grouped in a Volume Group, where Logical Volumes
are defined from. Again, such a Logical Volume behaves (as seen from the standpoint of a Server) as just one "disk"
while in fact in reality that "disk" might be a certain number of physical disks in some RAID configuration.

From a tecnical standpoint, the physical disks are addressable on a "bus", but that then is organized in the usage
of multiple "target numbers", where each target supports one or more Logical Unit Numbers (LUN's).
The hardware and software of the SAN/NAS/FILER, makes it possible for the storage Admin to define those LUNs
and let them represent the "disks" from the former alinea.
Each LUN will then be identified by a corresponding LUN-id.

Ofcourse, LUNs exposed to the nodes of a certain cluster, must not be exposed to other servers.
The storage Admin has ways to let a number of LUNs be "viewable" or not, by using "Zoning" or "LUN masking".

The driver software at your node, should be able to send a sort of "report LUN" command, so that in principle,
the SAN/NAS/FILER LUN's can be enummerated (and stored in the Registry). Now, The Windows Admin can see the "disks"
and format them, and usually assign a drive letter (like F:, G:) so that they get ready for use.

10.3 Stack used at your Windows Server Node.

Fig 6. Global model HBA (Fiber) or iSCSI driver stack in Windows


The Clusdisk and PartMgr are closest to the highlevel cluster components.

If you use (traditional) Fiber HBA cards to connect to a (FC) SAN, then the miniport and storport drivers should be present.
If you use iSCSI, then you have a netcard connecting to the storage using the iSCSI initiator (using TCPIP).

The miniport.sys driver is from the Manufacturer, while storport is from Microsoft.
The miniport driver could have a filename determined by the Manufacturer, and it should be linked to storport.

Using NAS/SAN/FILER, you have a regular netcard, controlled by the Microsoft iSCSI initiator software.


10.4 Documenting some storage parameters from your nodes.


1. The "wmic diskdrive" prompt command:

From your currently active node (and which thus owns the diskresources), use the following command:

C:\TEMP> wmic diskdrive > disks.txt

Just take a look at the resulting "disks.txt" textfile. A lot of interesting facts should be present.


2. Some interesting Registry keys:

- Just browse around in "HKLM\cluster". I am sure you will see some keys you will find usefull to export to a file,
which you can use for documentation purposes.

- Go to "HKLM\system\currentcontrolset\services\Partmgr"

Maybe you are interested in exporting the ENUM key.


3. Other Storage related Documentation:

- Document the storport-, miniport- and HBA drivers and versions, and document the latest fixes you might have applied.
- Document the firmware versions of all your HBA interfaces.
- Document the Datastore (SAN/NAS/FILER) names that all of your Virtual- and/or Hardware nodes uses.
- Document Initiator-, and Target names, IP's, ports, and MPIO settings
- Document the Drive names (like F:) with the associated LUN's, and associated Disk ID's.


10.5 Simple example of how to connect a small SAN to a 2 Node cluster.

There are many ways to connect Cluster nodes to a SAN, which ofcourse all depends on the number of nodes,
the size and complexity of the SAN, and the manner in which High Availability (HA) is implemented.

The figure below, illustrates a simple setup, with a certain degree of HA implemented. Ofcourse, a cluster
by itself is a HA implementation, but there still can be lots of "single points of failure".
In the figure below, we see that each node has two FC HBA's (FC SAN), or two netcards (iSCSI SAN),
which gives much more redundancy in case one of those cards would fail.
Furthermore, we can see two switches (SW). In this case, typically, one card is connected to SW1, and
the other to SW2.

Fig 7. Just a simple model of how a 2 node cluster could use a SAN.




11. Some notes on Storage problems.

There are many storage systems, and they are all quite complex, and hence the need for dedicated Storage Admins.
Especially, when implementing a cluster, you will get into the "Validate Cluster" phase, which consists of a number
of rigorous tests, including your shared storage system. Here is where it sometimes goes wrong.

Some of the common Storage related problems, in cluster systems, are the following.

Win2K8 and SCSI 3 Persistent Reservation:

In most cases, with modern Storage and Win2K8 clustering, you do not need to do anything special with respect to Storage.
However, it must be SCSI3 (SPC-3) compliant, otherwise you will experience problems in the "Validate Cluster" phase.
And even if the storage is fully compliant, in some cases, you might need to "enable" the "SCSI3 reservation"
option of your storage.

Microsoft Clusters, use the "Shared Nothing" principle, which simply means that there can be only one owner (node) of the
shared storage, at one time. See also note 1 below, for some exceptions or "relaxations" of the former statement.

- Win2K3 Cluster:

In Windows 2003 Clustering, SCSI-2 commands were used for managing and manipulating storage,
where the most important commands were the "reserve", "reset" and "release" commands.
These commands are used to lock the shared storage for the active node, and move it over to the other node,
if the active node crashes for some reason.

- Win2K8 Cluster:

Under Win2K8 Clustering, the game switched to the SCSI SPC-3 commandset. Now, a simple "Persistent Reservations"
command can be used to lock the storage for use for one node.

However, some older storage devices, might not support the SPC-3 commands.

In some cases, this would mean quite a large obstacle in migrating a Win2K3 cluster to a Win2K8 cluster.

So, if you are planning or implementing a Win2K8 cluster:

1. Make sure the storage supports the SCSI 3 Persistent Reservation command.

2. If the storage supports (1), do you perhaps need to activate it? For example, on some storage system you need
to issue a command like below example:

set device attribute=SCSI3_persist_reserv commit;

before it get's usable for Win2K8 clustering.

Notes:

Note 1:
At Windows Clustering, if VMWare is used, you probably use VMFS (Virtual Machine File System) to store
virtual machine disk images.
In this case, multiple ESX servers can access the same clustered filesystem simultaneously,
while only the individual virtual machine files are locked.

Note 2:
There are many other cluster filesystems, for example used with Linux machines (like GFS) or IBM unix machines (like GPFS).



12. General guidelines for Replacing a failed Disk.

⇒ Win2K3

If a shared disk has failed, and was replaced, on many SAN's, you won't have noticed much if some RAID implementation
was in effect.

In some less fortunate cases, and it was not the quorum disk, you need to create a new LUN,
and use a backup to restore the contents again.

To renew the DiskID, use the "dumpcfg.exe" utility or the "ClusterRecovery utility (32 bits)
For this, most of it was already explained in section 4.
The "ClusterRecovery utility" will analyze all dependencies of resources on the lost LUN, and transfers
the dependencies to the new LUN. Please see section 4.