In middle of night I check my cluster labs, and it show my apache resource group is not running… I re-check it, and I found that the cluster node didn’t mount the webds /global/web. My webds diskset is gone, I don’t know the root cause of this problem… 😀 maybe I’m doing another lab in the same node, and did not consciously change its configuration.
if i check the metaset of the webds status, it show no node that own this diskset
bash-3.00# metaset -s webds Set name = webds, Set number = 2 Host Owner clnode-01 clnode-02 Mediator Host(s) Aliases clnode-01 clnode-02 Driv Dbase d6 Yes
and if I running the metastat status for webds it coming up with error :
bash-3.00# metastat -s webds metastat: clnode-01: webds: must be owner of the set for this command
The resolution is simple, below my troubleshooting :
1. boot your node-1 & node-2 in non cluster mode
2. comment out the share device at /etc/vfstab
3. boot your node-1 & node-2 in cluster mode
4. on node-2 :
force purge the lost disket :
metaset -s <setname> -P -f
5. on node-1 :
force purge the lost disket :
metaset -s <setname> -P -f
re-recreate your metaset disket :
metaset -s <setname> -a -h NodeA NodeB metaset -s <setname> -a <diskpath0> <diskpath1> ... <diskpathN> metaset -s <setname> -a -m NodeA NodeB metaset
(should show new set and ownership)
Note : because my webds disket is set of the svm disk, I re-create the soft partition on it..
bash-3.00# metainit -s webds d1 1 1 /dev/did/rdsk/d6s0 webds/d1: Concat/Stripe is setup bash-3.00# metainit -s webds d200 -p d1 3g d200: Soft Partition is setup bash-3.00# metastat -s webds webds/d200: Soft Partition Device: webds/d1 State: Okay Size: 6291456 blocks (3.0 GB) Extent Start Block Block count 0 32 6291456 webds/d1: Concat/Stripe Size: 10457088 blocks (5.0 GB) Stripe 0: Device Start Block Dbase State Reloc Hot Spare d6s0 0 No Okay No Device Relocation Information: Device Reloc Device ID d6 No -
testing mount, and ls direcory :
bash-3.00# mount /dev/md/webds/dsk/d200 /global/web bash-3.00# ls -l /global/web total 24 drwxr-xr-x 2 root root 512 May 31 21:55 bin drwxr-xr-x 2 root bin 512 May 25 06:22 cgi-bin drwxr-xr-x 2 root root 512 May 31 21:59 conf drwxr-xr-x 2 root bin 1024 May 25 06:22 htdocs drwx------ 2 root root 8192 May 31 20:56 lost+found
in theory, And you should be happy, your cluster resource group is running again.
bash-3.00# clrg status === Cluster Resource Groups === Group Name Node Name Suspended Status ---------- --------- --------- ------ nfs-rg clnode-01 No Online clnode-02 No Offline apache-rg clnode-01:webapp-01 No Online clnode-01:webapp-02 No Offline bash-3.00# clrs status === Cluster Resources === Resource Name Node Name State Status Message ------------- --------- ----- -------------- nfs-res clnode-01 Online Online - Service is online. clnode-02 Offline Offline nfs-stor clnode-01 Online Online clnode-02 Offline Offline mycluster-nfs clnode-01 Online Online - LogicalHostname online. clnode-02 Offline Offline apache-res clnode-01:webapp-01 Online Online - Service is online. clnode-01:webapp-02 Offline Offline apache-stor clnode-01:webapp-01 Online Online clnode-01:webapp-02 Offline Offline mycluster-webapp clnode-01:webapp-01 Online Online - LogicalHostname online. clnode-01:webapp-02 Offline Offline
reboot your node if needed. 🙂