Sunday, February 27, 2011

Replication Manager

RM and latest SE revs
================

- See primus emc212416 for up-to-date Info

RM                                                         SE

5.0.3 thru 5.1.1                                    6.4.2.20

5.1.2                                                   6.5.0.12
                                                           6.5.0.13
                                                           6.5.0.16 
                                                           6.5.0.19
                                                           6.5.0.25

5.1.3                                                   6.5.2.20
                                                           6.5.3.0
                                                           6.5.3.15
                                                           7.0.0.4
                                                           7.0.1.26 (Needed for Flare 29)
                                                           7.1.0.3 (non-windows hosts only)
                                                           7.1.0.17 (Windows only)

5.1.4                                                   6.5.2.20
                                                           6.5.3.0
                                                           6.5.3.15
                                                           7.0.0.4
                                                           7.0.1.26 (Needed for Flare 29)
                                                           7.1.0.3 (non-windows hosts only)
                                                           7.1.0.17 (Windows only)

5.1.5                                                   6.5.2.20
                                                           6.5.3.0
                                                           6.5.3.15
                                                           7.0.0.4
                                                           7.0.1.26 (Needed for Flare 29)
                                                           7.1.0.3 (non-windows hosts only)
                                                           7.1.0.17 (Windows only)

5.1.6                                                   6.5.2.20
                                                           6.5.3.0
                                                           6.5.3.15
                                                           7.0.0.4
                                                           7.0.1.26 (Needed for Flare 29)
                                                           7.1.0.3 (non-windows hosts only)
                                                           7.1.0.17 (Windows only)

5.2.x                                                   6.5.2.20
                                                           6.5.3.0
                                                           6.5.3.15
                                                           7.0.0.4
                                                           7.0.1.26 (Needed for Flare 29)
                                                           7.1.0.3 (non-windows hosts only)
                                                           7.1.0.17 (Windows only)
                                                           7.1.1.18 (replaced 7.1.1.0)
                                                           7.1.2.7 (Needed for Flare 30, however
                                                           RM 5.2.x does not support Flare 30)
                                                          7.1.2.11 (For Linux only)
                                                          7.2.0.0

5.3                                                     6.5.3.0
                                                          7.0.0.4
                                                          7.0.1.26 (Needed for Flare 29)
                                                          7.1.0.3 (non-windows hosts only)
                                                          7.1.0.17 (Windows only)
                                                          7.1.1.18 (replaced 7.1.1.0)
                                                          7.1.2.7 (Needed for Flare 30)
                                                          7.1.2.11 (For Linux only)
                                                          7.2.0.0

5.3.1                                                 7.2.0.0  REQUIRED for RM 5.3.1 !!
                                                       


RM releases,latest release available and End of Life info


RM 3.0       Service Pack 6      EOL 12/2007

RMSE 3.1  Service Pack 5      EOL 04/2008

RM 4.0       Service Pack 7      EOL 12/2008

RM 5.0       Service Pack 6      EOL 12/2009

RM 5.1       Service Pack 6      EOL 12/2010

RM 5.2       Service Pack 5      EOL 12/2011

RM 5.3       Service Pack 1      EOL 08/2013


Microsoft Hot Fixes


903650 Ext Maint Mode: for w2k3 sp1 ONLY! See emc139668

919117 Ext Maint Mode: for w2k3 sp2 ONLY! See emc139668

934396 VDI for SQL 2000 and 2005 (See Product Guide)

936008 VDS windows 2003 only

978897 VDS Windows 2008 & 2008 SP2

949391 VSS windows 2003 sp1 (see 975928 below for Windows 2003 sp2, because 975928 seems to
Replace all corrected files for sp2 only)

951568 Only needed IF 940349 is installed AND you want need to Turn on Microsoft VSS debugging.

975928 VSS for 2003 SP2, 2008 and 2008 SP2 (No clue why MS didn’t include 2008 sp1)

975832 / 975921 First VSS hot fixes for 2008 that I know of…

975832 for w2k8 R1

975921 for w2k8 R2

NOTE that these 2 hot fixes fix different .exe’s than the ones 975928 fix, so they can be applied as well.

957910 Storport W2K3 (943545 and 950903 are older revs)

968675 W2K8 SP2

951246 Needed only for W2K8 SP1 only! RM Servers for MS scheduling bug


IRIndicator file is located in:

Windows: C:\windows\IRIndicator


Unix /var/sadm/PKG/IR/IRIndicator

RM’s symapi_db.bin file is located in:

C:\Program Files\emc\rm\client\bin\symapi_db_ercrm_client.bin


SE symapi_db file is located in:


C:\Program Files\emc\symapi\db\symapi_db.bin


/var/symapi/db/symapi_db.bin

Troubleshooting

- Devices not “visable to host” in RM?

For UNIX:


1. Shutdown RM Client service.

/opt/emc/rm/client/bin/rc.irclient stop

2. Shutdown all SE daemons.

/opt/emc/SYMCLI/V6.5.3/bin/stordaemon shutdown all -immediate

3. Rename existing symapi_db.bin file.

mv /var/symapi/db/symapi_db.bin /var/symapi/db/symapi_db.bin.old

4. Rename existing symapi_db_emcrm_client.bin file.

mv /opt/emc/rm/client/bin/symapi_db_emcrm_client.bin /opt/emc/rm/client/bin/symapi_db_emcrm_client.bin.old

5. Run symcfg discover -clar and confirm new symapi_db.bin created, this task start up the SE deamons automatically.

6. Start RM client service.

/opt/emc/rm/client/bin/rc.irclient start

For Microsoft Windows:

1. Right-click RM Client service > Select Stop.

2. Shutdown all SE daemons. Right-click EMC storapid > Select stop.

3. Rename c:\program files\emc\symapi\db\symapi_db.bin to symapi_db.bin.old

4. Rename c:\program files\emc\rm\client\bin/symapi_db_emcrm_client.bin to symapi_db_emcrm_client.bin.old

5. Start up EMC storapid service.

6. Run symcfg discover -clar and confirm new symapi_db.bin created.

7. Start RM client service.




Top 10 issues for RM



1. Job failures due to the error “cannot locate device x on supported arrays”.


- Steps to fix….stop RM and EMC Storapid services/daemons, and delete the RM and symapi DB’s


- If job still fails, try a “symrslv” on the filesystem and see if that works. If this fails or doesn’t come back with the correct output, symrslv pd on the device should show a “c” beside the device if it’s a Clariion, and an “s” to indicate Symmetrix)


2. VMware concepts.



- Support for Vmware can be broken down into 3 components


VMFS

- Requires a Proxy host…in the application set, you select the VMFS datastore you are replicating and RM will instruct Virtual Centre to take a VSS snapshot of all the VM’s on the VMFS datastore (This can be disabled using the vmsnapdeny.cfg). Proxy host must have placeholder RDM LUN.


- Sometimes with VMFS, you get an error that a snapshot could not be taken of a particular VM on that VMFS volume, but the VM Host ID does not refer to any VM. The easiest way to figure out which VM is giving the problem is to do the following..


If you want to find out the name associated with that ID run the following command from the RM Server:

C:\Program Files\EMC\rm\client\bin>rm_vimclient -h -u -p -ListVMInESX

Look at the line “Disk File Path” of the output. It will tell you the name of the Vmdk that corresponds with the VM having the issue

- The RM_VIMCLIENT (\emc\rm\client\bin) is used by RM to query the Proxy Host for the VMFS information. Syntax of this command is as follows:

rm_vimclient -h -U -P -ListVMFS


Other switches are as follows:

- liststorageadapters


- mapvmfs


- listesx

Virtual Disks


- Requires a proxy host. Not supported for Exchange and Sharepoint as Microsoft VSS is not supported on Virtual disks due to an issue with the SCSI page information restriction with VMware. When Virtual disks are being used, they show up to the VM in syminq.txt as ”Virtual Disks”

RDM’s



- Treat as same as normal fibre disks…no proxy host required. When mounting snaps, you need to create “Placeholder snap sessions from the source LUN’s and present them as RDM devices to the mount host VM. See Primus emc184439


3. Oracle ASM.


- Most common configuration is Oracle ASM on Linux. Linux environments can use ASMlib (library file for Oracle ASM) to address the ASM disks
- Check Oracle ASM config file (/etc/sysconfig/oracleasm)
- Check Oracle ASMlib Log ((/var/log/oracleasm)

- Common ASMLib troubleshooting commands..


On one of the production asm instances: select path from v$asm_disk;
as root : /etc/init.d/oracleasm list disks
as root: /etc/init.d/oracleasm querydisk

- As with all Oracle jobs, the Database user is very important…for Oracle ASM, need to have SYSASM privileges, and with Oracle, need to have SYSDBA


- ASM disks on Linux have to be configured on Partitioned disks.
- OS user is also important especially with regards to mounts and restores.

- Adequate space in /tmp is absolutely vital for saving the archive logs that are needed to be used to mount or restore that Oracle replica. Need about 1 – 2 Gigs free

- If the RM logs do not contain sufficient information, always look for either the Oracle alert log or the Oracle ASM log.

- We do have the following processes that take place during ASM RAC to RAC mounts

A. Creation of ASM init files for all mount RAC nodes


B. Creation of database init file for node chosen as mount host


C. Node reachability check


D. Rescan of ASM disks on remote nodes


E. Propagation of ASM init files created in step 1 to respective nodes.


F. Creation of ASM dump directories.


G. If recovery type is read-only or read-write, the ASM instances are started on all nodes.


H. Rename and propagation of database init file created in step 2 to other nodes. This is because the content of the database init files for all the RAC instances is same, only the names differ.


I. Creation of database dump directories.


J. If recovery type is read-only or read-write, the RAC database instances are started.


See Primus emc234052 for information on an ETA concerning RM and Oracle ASMLlib

 
4. Oracle issues in general.



- Logging much better in RM 5.2.1 onwards to make debugging easier. To get extra information in the RM logs, you can add a variable to the RC.IRCLIENT to create RM ORA trace files

RM_ORA_TRACE=1
export RM_ORA_TRACE
This will produce trace files in /tmp

In RM 5.3 onwards, these variables can be added via the RM GUI.
As with ASM troubleshooting, main pain points are as follows..
- Permissions of Oracle directories
- Permissions of Oracle DBA account…Needs to have sysdba privileges
- Sufficient space in /tmp



5. CELERRA ISSUES


99% of Celerra issues are not RM bugs…they are Celerra/network issues. The most common issue is with the remote replication failing because of poor network bandwidth between the local site and the remote

- From RM 5.2.4 upwards, you can pretty much discount any issues with RM as long as the Dart code is 5.6.47 or higher. 


Celerra Replicator


- The most common issue is with “Celerra Replicator”, which essentially is a remote replication. See the following description of what exactly is taking place during a Celerra Replicator job..


- This creates a local snap, uses that snap as a baseline to copy to a target LUN. Then, it snaps the target LUN. That is your replica. Then, it "deletes" the local snap, but since it’s the source baseline of the replicator session, the celerra won't let it be deleted. It will be deleted later.


- See Primus emc216471 on why Production host needs to have IP connectivity to remote Datamover


- Need port 5033 open for SNAPAPI to work, so if you are getting SNAPAPI errors for a new install/implementation, make sure this port is open


IP issues


- IP issues are also a concern for RM especially for Remote replications using Celerra Replicator. Make sure production host can ping remote datamover.


Celerra NFS issues


- RM 5.3.2 will have all the Celerra NFS hotfixes.

- If your logged into the Celerra via ssh, the following commands :
  To list the check point associated with an RM Application Set:
  
fs_ckpt appsetname -list -all

The jobname will be part of the checkpoint name with date/timstamps and an ID
To delete a Checkpoint:

fs_ckpt -delete id=xxx


6. RM issues on UNIX



Important points to note for UNIX hosts:

- RM does not create an RM symapi DB on Linux, so we use the symapi_db.bin instead
- Storapid process on UNIX is under the control of the stordaemon, so this is the daemon that needs to be stopped if troubleshooting any Solutions Enabler issues


7. Restores on Windows Clusters


- VSS hotfix for Windows 2008 was required to resolve an issue whereby a previous VSS backup failure had left VSS flags on the disks. Microsoft advised installing that hotfix and rebooting all nodes.


- On the odd occasion, you might come across an issue whereby a restore to a cluster is failing due to “Files in use”. RM cannot force unmount or force restore over a file or a disk which has an open handle, so customer needs to use handle.exe or process explorer.exe to find out what process has an open handle.

 
8. Mirrorview



- Replication Manager Client at least one must be connected and zoned to the REMOTE Clariion arrayThe Mirror View image state must be Synchronized on Consistent and the Secondary condition should NOT be in fractured state.
- There is a long and exhaustive checklist for Mirrorview requirements in the Admin Guide
- Mirrorview enhancement made it into RM 5.3, but hotfixes for all affected versions are now listed in Primus emc229210

9. VSS Imports



- By default, RM uses a process called a “VSS Import” to mount the clone disks to a Windows mount host. If the VSS Import fails, RM will wait for 45 minutes, and then switch to using VDS instead.


- The main advantage of using VSS Imports is due to speed…a VSS Import is much faster than using VDS.VDS mounts take longer because each disk is scanned individually and the time it takes to do this is approx 2 – 3 minutes per disk.


- The main disadvantage of using VSS Imports is that it is liable to be affected by many third party software products. Products like the Qlogic Java Management Agent, many disk Monitoring software, Antivirus, can cause the VSS import to fail.


- These 3rd Party products put a “lock” or “VETO” on the disks being imported, and these VETO’s can be seen from the C:\Windows\setupapi log, which is collected by the EMCreports.


- Primus emc221301 can be used to run “Handle.exe” to debug the VETO’s. This output can then be sent to Microsoft for analysis to determine what is causing the VETO. It must be stressed that this is not an RM issue and the only people who can determine what is causing the VETO is Microsoft.


- You can, or course disable the VSS Import without adversely affecting the replica intergrity. See Primus emc185640 for the registry key details


- It is important to get the customer to check his Antivirus software and make sure that the Antivirus is not scanning the mount point that RM is mounting to.

 
10. Hints and Tips
 

A - Support page for RM.
http://supportwip.emc.com/products/ReplicationManager.aspx

B - Steps for uninstalling RM completely from a host if an upgrade fails.



This can be done safely on an RM client, but when upgrading an RM Server at a site, remember to back up the SolidDB first and thr license directory because if the upgrade fails, the SolidDB will be removed.


1. Delete RM services


sc delete "service_name"…for this, you will have the get the "service name" for each of the RM services on the box…to do this, right click on each service and select properties. The service name will be at the top of the page.
2. Remove rm directory from the RM installation location
3. Delete ..\windows\IRIndicator file
4. Delete Replication Manager entries from the registry


C - Cross mounting between Windows 2003 and Windows 2008.



Not supported by Microsoft and as a consequence, not supported by RM. See the following KB article..


Note A shadow copy that was created on Windows Server 2003 R2 or Windows Server 2003 cannot be used on a computer that is running Windows Server 2008 R2 or Windows Server 2008. A shadow copy that was created on Windows Server 2008 R2 or Windows Server 2008 cannot be used on a computer that is running Windows Server 2003. However, a shadow copy that was created on Windows Server 2008 can be used on a computer that is running Windows Server 2008 R2, and vice versa.


http://msdn.microsoft.com/en-us/library/bb968832(VS.85).aspx


 
D - Scripts in RM



- Manually created customer scripts used for pre/post replication or pre/post


Mount need to have the parameter “exit 0” added to the end of the script, so that if the script is successful, it will return a successful exit code to RM. Otherwise, the RM client will wait indefinetly for a response back from the script.


 
E - Windows 2008 and Symmetrix arrays


- Primus emc200609 goes into detail about what Symmetrix flags need o be set on the Symmetrix Directors for Windows 2008 hosts. This is really important for mount hosts.


F - Instructions to collect IRCCD crash dump

1- Install the Microsoft Debugging tool on the host ,If there are any concerns about installing it on a production host, then install it on another system and copy the "C:\Program Files\Debugging Tools for Windows" directory over to the server. Use the following links depending upon the OS


Windows 2003: http://www.microsoft.com/whdc/devtools/debugging/installx86.mspx

Windows 2008: http://www.microsoft.com/whdc/devtools/debugging/install64bit.mspx


2- Go to the install folder C:\Program Files\Debugging Tools for Windows (x86)/C:\Program Files\Debugging Tools for Windows (x64)> and run the following command:


cscript adplus.vbs -crash -pn irccd.exe -o c:\irccdcrashdump -quiet
where c:\irccdcrashdump is the destination folder where dump files would be created.


More information is available on the following Microsoft sites:


http://support.microsoft.com/kb/931673/en-us

http://blogs.technet.com/b/askperf/archive/2007/06/15/capturing-application-crash-dumps.aspx


The method of collecting vss trace on Windows 2008 R2


The Primus Solution emc91744 and vssreports method does not work for the Windows 2008 R2 boxes, to get VSS debug logs on these boxes we have to use the "vsstrace" utility which is a part of Windows 7 SDK.To enable VSS logging we have to use the following command:

vsstrace +f 0xffff -o C:\vss_trace.log

Sometimes the -o switch would not work, to workaround this we can use the redirection operator ">" to specify the output file.So now the command looks like the following:


vsstrace +f 0xffff >C:\vss_trace.log

More details about this process is on the following Microsoft technet article:
http://msdn.microsoft.com/en-us/library/dd765233(v=VS.85).aspx

1 comment:

hai said...

Hi Nikhiram Thanks for providing valuable information on this web page which is guiding what tasks we can take when we got struch with the issue.I like to follow your postings.Thanks Masarrao