Oracle® Database Backup and Recovery Advanced User's Guide 10g Release 2 (10.2) Part Number B14191-02 |
|
|
View PDF |
This section contains these topics:
After Installation of Media Manager, RMAN Channel Allocation Fails: Scenario
Backup Fails Because RMAN Cannot Locate an Archived Log: Scenario
In this scenario, you install and test the media manager as explained in "Configuring RMAN to Make Backups to a Media Manager", but you still cannot make RMAN back up to tape. For example, after allocating the sbt
channel, you receive an error stack similar to the following:
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of allocate command on c1 channel at 08/29/2001 17:16:54
ORA-19554: error allocating device, device type: SBT_TAPE, device name:
ORA-27211: Failed to load Media Management Library
Additional information: 25
The most important line of the error output is the ORA-27211 error. It indicates the basic problem, that the media management library could not be loaded. Typically, there is no need to refer to the trace file or sbtio.log in such a case.
The ORA-27211
error indicates that the channel allocation is failing because the database is not loading the media management library. If the channel allocation fails, then the database generates a trace file in the USER_DUMP_DEST
location that contains the error that caused the channel allocation to fail. The trace file should have the complete path name of the media management library loaded by the database as well as any other media manager errors or operating system errors. For example, the trace file on UNIX may be called something like /oracle/rdbms/log/prod1_ora_16226.trc
, and may contain information such as the following:
*** 2001-08-29 17:16:54.385 SKGFQ OSD: Error in function sbtinit on line 2396 SKGFQ OSD: Look for SBT Trace messages in file /oracle/rdbms/log/sbtio.log SBT Initialize failed for oracle.static
The last line of this output indicates that Oracle is loading the default static library instead of the media management library that you installed.
You may find more detailed information in the file sbtio.log
, as described in the error message. Note, however, that writing SBT trace messages is the responsibility of the media management software, not the Oracle database or RMAN. The media management vendor may not have implemented the writing of trace messages in a particular situation. Contact the media management vendor for details about the trace messages written to sbtio.log.
To test the loading of the media management library, try allocating a channel by using the PARMS
parameter SBT_LIBRARY
to force the loading of the media management library. For example, if your library is called /vendor/lib/some_mm_lib.so
, then run a command such as the following, making sure to specify whatever PARMS
settings are required by your media manager:
RUN { ALLOCATE CHANNEL c1 DEVICE TYPE sbt PARMS='SBT_LIBRARY=/vendor/lib/some_mm_lib.so', 'ENV=(NSR_SERVER=tape_svr,NSR_CLIENT=oracleclnt,NSR_GROUP=oracle_tapes)'; }
If the channel allocation fails, then check the trace file again to see whether you can learn anything new. If the channel allocation with SBT_LIBRARY
succeeds, but an ordinary sbt
channel allocation fails, then the database is probably trying to load a library other than the one you installed. By default, the database expects to find the media management library at $ORACLE_HOME/lib/libobk.so
on UNIX, or %ORACLE_HOME%/bin/orasbt.dll
on NT. You may have more than one library in the operating system path, and the database is loading the wrong one.
If the problem is that the database is not loading the correct library, then make sure that the library is named correctly in the SBT_LIBRARY
parameter.
See Also:
Oracle Database Backup and Recovery Reference for descriptions of the legalPARMS
parametersIn this scenario, an RMAN backup job starts as normal and then pauses inexplicably:
Recovery Manager: Release 10.1.0.2.0 - Production Copyright (c) 1995, 2003, Oracle. All rights reserved. connected to target database: TRGT connected to recovery catalog database RMAN> BACKUP TABLESPACE SYSTEM, tools; allocated channel: t1 channel t1: sid=16 devtype=SBT_TAPE channel t1: starting datafile backupset set_count=15 set_stamp=338309600 channel t1: including datafile 2 in backupset channel t1: including datafile 1 in backupset channel t1: including current control file in backupset # Hanging here for 30 minutes now
If a backup job is hanging, that is, not proceeding, then several scenarios are possible:
A server-side or media management error occurred.
RMAN is waiting for an event such as the insertion of a new cassette into the tape device.
Query sbt
wait events to gain more information. For example, run the following query on the target instance:
COLUMN EVENT FORMAT a10 COLUMN SECONDS_IN_WAIT FORMAT 999 COLUMN STATE FORMAT a20 COLUMN CLIENT_INFO FORMAT a30 SELECT p.SPID, EVENT, SECONDS_IN_WAIT AS SEC_WAIT, sw.STATE, CLIENT_INFO FROM V$SESSION_WAIT sw, V$SESSION s, V$PROCESS p WHERE sw.EVENT LIKE 'sbt%' AND s.SID=sw.SID AND s.PADDR=p.ADDR ;
Examine the SQL output to determine which sbt functions are waiting. For example, the output may be as follows:
SPID EVENT SEC_WAIT STATE CLIENT_INFO ---- ---------- ---------- -------------------- ------------------------------ 8642 sbtbackup 1500 WAITING rman channel=ORA_SBT_TAPE_1
Because the causes of a hung backup job can be varied, so are the solutions. For example, backup jobs often hang simply because the tape device has completely filled the current cassette and is waiting for a new tape to be inserted. Ideally, the query of the sbt
wait events should indicate the problem.
In this example, a single sbtbackup has taken 1500 seconds, so RMAN is waiting on the media manager to finish its write operation. Check that the media manager is functioning normally, and contact the media management vendor's technical support for assistance.
If the sbt
wait event query is unhelpful, then examine media manager process, log, and trace files for signs of abnormal termination or other errors (refer to the description of message files in "Identifying Types of Message Output").
See Also:
"Terminating an RMAN Session: Basic Steps" to learn how to kill an RMAN session that is hangingIn this scenario, you run a backup job and receive message output similar to the following:
channel c8: including datafile number 47 in backupset RPC call appears to have failed to start on channel c9 RPC call ok on channel c9 channel c3: including datafile number 18 in backupset
The RPC
call
appears
to
have
failed
message does not usually indicate a problem. The message indicates one of the following:
The target database instance is slow.
A timing problem occurred.
Timing problems occur in this way. When RMAN begins an RPC, it checks the V$SESSION
performance view. The RPC updates the information in the view to indicate when it starts and finishes. Sometimes RMAN checks V$SESSION
before the RPC has indicated it has started, which in turn generates the following message:
RPC call appears to have failed
If a message stating "RPC
call
ok
" does not appear in the output immediately following the message stating "RPC
call
appears
to
have
failed
", then the backup job encountered an internal problem. Contact Oracle Support for further assistance.
In this scenario, you attempt a backup and receive the following error messages:
RMAN-3014: Implicit resync of recovery catalog failed RMAN-6038: Recovery catalog package detected an error RMAN-20035: Invalid high RECID error
In one common scenario, you restore a backup control file created through a non-Oracle mechanism, and then open the database without the RESETLOGS
option. If you had created the backup control file through the RMAN BACKUP
command or the SQL ALTER
DATABASE
BACKUP
CONTROLFILE
statement, then the database would have required you to reset the online logs.
The control file and the recovery catalog are now not synchronized. The database control file is older than the recovery catalog, because at one time the recovery catalog resynchronized with the old current control file, and now the database is using a backup control file. RMAN detects that the control file currently in use is older than the control file previously used to resynchronize.
Another common scenario occurs when you attempt to copy the target database to a new machine as follows:
On machine 1, you shut down the database and make a copy of the control file with an operating system utility. You do not use CATALOG
to add this control file copy to the repository.
You transfer the control file copy to machine 2.
On machine 2, you create a new initialization parameter file and new database instance.
You mount the control file copy on machine 2. The database does not recognize the control file as a backup control file: to the database it looks like the current control file.
You start RMAN and connect to the new target database and the recovery catalog on machine 2. Because the control file was not created with RMAN and was not cataloged as a control file copy, RMAN sees the database on machine 2 as the database on machine 1.
You restore and recover database the new database on machine 2 and then open it. As a consequence, various records are added to the recovery catalog during the restore and recovery. For example, the highest RECID
in the recovery catalog moves from 90 to 100.
On machine 1, you start RMAN and connect to the original target database and recovery catalog. The recovery catalog indicates that the highest RECID
is 100, but the control file indicates that the highest RECID
is 90. The control file RECID
should always be greater than or equal to the recovery catalog RECID
, so RMAN issues RMAN-20035
.
This solution is safest and is strongly recommended. It preserves the control file, so that the historical information about the database stored in the control file continues to be available after the procedure.
To reset the database with RMAN:
Connect to the target database with SQL*Plus. For example, enter:
% sqlplus '/ AS SYSDBA'
Mount the database if it is not already mounted. For example, enter:
ALTER DATABASE MOUNT;
Start cancel-based recovery by using the backup control file, then cancel it. The reason for canceling is that the USING
BACKUP
CONTROLFILE
clause stamps the control file as a backup, which then permits OPEN
RESETLOGS
. For example, enter:
ALTER DATABASE RECOVER DATABASE UNTIL CANCEL USING BACKUP CONTROLFILE; ALTER DATABASE RECOVER CANCEL;
Use RMAN to connect to the target database and recovery catalog. For example, enter:
% rman TARGET SYS/oracle@trgt CATALOG rman/cat@catdb
Open the database with the RESETLOGS
option. For example, enter:
RMAN> ALTER DATABASE OPEN RESETLOGS;
Take new backups so that you can recover the database if necessary. For example, enter:
BACKUP DATABASE PLUS ARCHIVELOG;
This solution is similar to the previous one, but does require that you re-create your control file. It is better-suited for the case in which you are copying your database to a second system, where you may not want to keep the history from the control file for the copy of the database on the second system, or where you might drop a few datafiles or change the online logs by editing your control file.
To create the control file with SQL*Plus:
Connect to the target database with SQL*Plus. For example, enter:
% sqlplus 'SYS/oracle@trgt AS SYSDBA'
Mount the database if it is not already mounted:
SQL> ALTER DATABASE MOUNT;
Back up the control file to a trace file:
SQL> ALTER DATABASE BACKUP CONTROLFILE TO TRACE;
Edit the trace file as necessary. The relevant section of the trace file looks something like the following:
# The following commands will create a new control file and use it # to open the database. # Data used by the recovery manager will be lost. Additional logs may # be required for media recovery of offline data files. Use this # only if the current version of all online logs are available. STARTUP NOMOUNT CREATE CONTROLFILE REUSE DATABASE "TRGT" NORESETLOGS ARCHIVELOG -- STANDBY DATABASE CLUSTER CONSISTENT AND UNPROTECTED MAXLOGFILES 32 MAXLOGMEMBERS 2 MAXDATAFILES 32 MAXINSTANCES 1 MAXLOGHISTORY 226 LOGFILE GROUP 1 '/oracle/oradata/trgt/redo01.log' SIZE 25M, GROUP 2 '/oracle/oradata/trgt/redo02.log' SIZE 25M, GROUP 3 '/oracle/oradata/trgt/redo03.log' SIZE 500K -- STANDBY LOGFILE DATAFILE '/oracle/oradata/trgt/system01.dbf', '/oracle/oradata/trgt/undotbs01.dbf', '/oracle/oradata/trgt/cwmlite01.dbf', '/oracle/oradata/trgt/drsys01.dbf', '/oracle/oradata/trgt/example01.dbf', '/oracle/oradata/trgt/indx01.dbf', '/oracle/oradata/trgt/tools01.dbf', '/oracle/oradata/trgt/users01.dbf' CHARACTER SET WE8DEC ; # Take files offline to match current control file. ALTER DATABASE DATAFILE '/oracle/oradata/trgt/tools01.dbf' OFFLINE; ALTER DATABASE DATAFILE '/oracle/oradata/trgt/users01.dbf' OFFLINE; # Configure RMAN configuration record 1 VARIABLE RECNO NUMBER; EXECUTE :RECNO := SYS.DBMS_BACKUP_RESTORE.SETCONFIG('CHANNEL','DEVICE TYPE DISK DEBUG 255'); # Recovery is required if any of the datafiles are restored backups, # or if the last shutdown was not normal or immediate. RECOVER DATABASE # All logs need archiving and a log switch is needed. ALTER SYSTEM ARCHIVE LOG ALL; # Database can now be opened normally. ALTER DATABASE OPEN; # Commands to add tempfiles to temporary tablespaces. # Online tempfiles have complete space information. # Other tempfiles may require adjustment. ALTER TABLESPACE TEMP ADD TEMPFILE '/oracle/oradata/trgt/temp01.dbf' REUSE; # End of tempfile additions.
Shut down the database:
SHUTDOWN IMMEDIATE
Execute the script to create the control file, recover (if necessary), archive the logs, and open the database:
STARTUP NOMOUNT CREATE CONTROLFILE ...; EXECUTE ...; RECOVER DATABASE ALTER SYSTEM ARCHIVE LOG CURRENT; ALTER DATABASE OPEN ...;
If you intend to keep and continue using this copy of the database, use the DBNEWID utility to change the name and DBID of the new database as needed.
Caution:
If you do not open with the RESETLOGS
option, then two copies of an archived redo log for a given log sequence number may exist—even though these two copies have completely different contents. For example, one log may have been created on the original host and the other on the new host. If you accidentally confuse the logs during a media recovery, then the database will be corrupted but Oracle and RMAN cannot detect the problem.
In this scenario, a backup job fails because RMAN cannot make a snapshot control file. The message stack is as follows:
RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =========================================================== RMAN-03002: failure of backup command at 08/30/2001 22:48:44 ORA-00230: operation disallowed: snapshot control file enqueue unavailable
When RMAN needs to back up or resynchronize from the control file, it first creates a snapshot or consistent image of the control file. If one RMAN job is already backing up the control file while another needs to create a new snapshot control file, then you may see the following message:
waiting for snapshot control file enqueue
Under normal circumstances, a job that must wait for the control file enqueue waits for a brief interval and then successfully obtains the enqueue. RMAN makes up to five attempts to get the enqueue and then fails the job. The conflict is usually caused when two jobs are both backing up the control file, and the job that first starts backing up the control file waits for service from the media manager.
To determine which job is holding the conflicting enqueue:
After you see the first message stating "RMAN-08512:
waiting
for
snapshot
control file
enqueue
", start a new SQL*Plus session on the target database:
% sqlplus 'SYS/oracle@trgt AS SYSDBA'
Execute the following query to determine which job is causing the wait:
SELECT s.SID, USERNAME AS "User", PROGRAM, MODULE, ACTION, LOGON_TIME "Logon", l.* FROM V$SESSION s, V$ENQUEUE_LOCK l WHERE l.SID = s.SID AND l.TYPE = 'CF' AND l.ID1 = 0 AND l.ID2 = 2;
You should see output similar to the following (the output in this example has been truncated):
SID User Program Module Action Logon --- ---- -------------------- ------------------- ---------------- --------- 9 SYS rman@h13 (TNS V1-V3) backup full datafile: c10000210 STARTED 21-JUN-01
Commonly, enqueue situations occur when a job is writing to a tape drive, but the tape drive is waiting for new tape to be inserted. If you start a new job in this situation, then you will probably receive the enqueue message because the first job cannot complete until the new tape is loaded.
After you have determined which job is creating the enqueue, you can do one of the following:
Wait until the job holding the enqueue completes
Cancel the current job and restart it after the job holding the enqueue completes
Cancel the job creating the enqueue
In this scenario, the database archives automatically to two directories: ORACLE_HOME/
oradata/trgt/arch
and ORACLE_HOME/
oradata/trgt/arch2
. You tell RMAN to perform a backup and delete the input archived redo logs afterward in the following script:
BACKUP ARCHIVELOG ALL DELETE INPUT;
You then run a crosscheck to make sure the logs are gone and find the following:
CROSSCHECK ARCHIVELOG ALL; validation succeeded for archived log archivelog filename=/oracle/oradata/trgt/arch2/archive1_964.arc recid=19 stamp=368726072
RMAN deleted one set of logs but not the other.
In this scenario, you schedule regular backups of the archived redo logs. The next time you make a backup, you receive this error:
RMAN-6089: archive log NAME not found or out of sync with catalog
This problem occurs when the archived log that RMAN is looking for cannot be accessed by RMAN, or the recovery catalog needs to be resynchronized. Often, this error occurs when you delete archived logs with an operating system command, which means that RMAN is unaware of the deletion. The RMAN-6089
error occurs because RMAN attempts to back up a log that the repository indicates still exists.
Make sure that the archived logs exists in the specified directory and that the RMAN catalog is synchronized. Check the following:
Make sure the archived log file that is specified by the RMAN-6089
error exists in the correct directory.
Check that the operating system permissions are correct for the archived log (owner
=
oracle
, group
=
DBA
) to make sure that RMAN can access the file.
If the file appears to be correct, then try synchronizing the catalog by running the following command from the RMAN prompt:
RESYNC CATALOG;
If you know that the logs are unavailable because you deleted them by using an operating system utility, then run the following command at the RMAN prompt to update RMAN metadata:
CROSSCHECK ARCHIVELOG ALL;
It is always better to use RMAN to delete logs than to use an operating system utility. The easiest method to remove unwanted logs is to specify the DELETE
INPUT
option when backing up archived logs. For example, enter:
BACKUP DEVICE TYPE sbt ARCHIVELOG ALL DELETE ALL INPUT;
In this scenario, you are connected to the target database while it is not open and attempting to perform an RMAN operation. You receive the following error:
PLS-00553: character set name is not recognized
Typically, this message means that the character set in the client environment, that is, the environment in which you are running the RMAN client, is different from the character set in the target database environment.
Query the target database to determine the value of the NLS_CHARACTERSET
parameter. For example, run this query:
SQL> SELECT VALUE FROM V$NLS_PARAMETERS WHERE PARAMETER='NLS_CHARACTERSET';
Set the character set environment variable in the client to the same value as the variable in the server. For example, you can set the NLS_LANG
environment variable on a UNIX system as follows:
% setenv NLS_LANG american_america.we8dec % setenv NLS_DATE_FORMAT "MON DD YYYY HH24:MI:SS"
If the connection is made througfh a listener, then the listener must be started with the correct Globalization Support settings. Otherwise, the spawned connections inherit the incorrect Globalization Support settings from the listener.
RMAN fails with ORA-01031 (insufficient privileges) or ORA-01017 (invalid username/password) errors when trying to connect to the target database:
% rman Recovery Manager: Release 10.1.0.2.0 - Production Copyright (c) 1995, 2003, Oracle. All rights reserved. RMAN> CONNECT TARGET sys/mypass@inst1 RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =========================================================== ORA-01031: insufficient privileges
RMAN automatically requests a connection to the target database as SYSDBA
. In order to connect to the target as SYSDBA
, you must do one of the following:
Be part of the operating system DBA
group with respect to the target database (that is, have the ability to connect with SYSDBA
privileges to the target database without a password).
Create a password file with the orapwd
command and the initialization parameter REMOTE_LOGIN_PASSWORDFILE
.
Make sure you are connecting with the correct username and password.
If the target database does not have a password file, then the user you are logged in as must be validated with operating system authentication.
Either create a password file for the target database or add yourself to the administrator list in the operating system.
See Also:
Oracle Database Administrator's Guide to learn how to create a password fileIn this scenario, you attempt to duplicate a database with the DUPLICATE
command, but receive the following error stack:
RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =========================================================== RMAN-03002: failure of Duplicate Db command at 09/04/2001 12:11:29 RMAN-03015: error occurred in stored script Memory Script RMAN-06053: unable to perform media recovery because of missing log RMAN-06025: no backup of log thread 1 seq 16 scn 145858 found to restore
The problem is that RMAN is not able to apply all the archived logs needed for complete recovery. For example, if you only backed up logs through sequence 15
, but the most recent archived log is sequence 16
, then DUPLICATE
fails.
When creating the duplication script, use the SET
UNTIL
command to specify a log sequence number for incomplete recovery. For example, to terminate recovery after applying log sequence 15
, enter:
RUN { SET UNTIL SEQUENCE 16 THREAD 1; # recovers up to but not including log 16 DUPLICATE TARGET DATABASE TO 'dupdb';
}
See Also:
"RMAN DUPLICATE DATABASE at a Past Point in Time: Example" for more information about performing incomplete recovery during the duplication operationIn this scenario, you back up the database, then run the DUPLICATE
command. You receive the following error stack:
RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =========================================================== RMAN-03002: failure of Duplicate Db command at 09/04/2001 13:55:11 RMAN-03015: error occurred in stored script Memory Script RMAN-06026: some targets not found - aborting restore RMAN-06023: no backup or copy of datafile 8 found to restore RMAN-06023: no backup or copy of datafile 7 found to restore RMAN-06023: no backup or copy of datafile 6 found to restore RMAN-06023: no backup or copy of datafile 5 found to restore RMAN-06023: no backup or copy of datafile 4 found to restore RMAN-06023: no backup or copy of datafile 3 found to restore RMAN-06023: no backup or copy of datafile 2 found to restore RMAN-06023: no backup or copy of datafile 1 found to restore
The DUPLICATE
command recovers to archived redo logs, but cannot recover into online redo logs. Thus, if the restored backup cannot be made consistent without applying the online redo logs, then duplication fails with RMAN-06023 errors because RMAN is looking for backups created before the most recent archived log.
After backing up the source database, archive and back up the current redo log:
RMAN> SQL 'ALTER SYSTEM ARCHIVE LOG CURRENT'; RMAN> BACKUP ARCHIVELOG ALL;
This archives all records in the online redo logs so that RMAN can now recover the backup by applying the most recent archived redo log.
In this scenario, you list the database incarnations registered in the recovery catalog and see a database with the name UNKNOWN
:
LIST INCARNATION OF DATABASE; RMAN-03022: compiling command: list List of Database Incarnations DB Key Inc Key DB Name DB ID STATUS Reset SCN Reset Time ------- ------- ------- ------ ------ ---------- ---------- 56 57 TRGT 4052472287 CURRENT 1 Sep 03 2001 06:45:51 1 19 UNKNOWN 4141147584 PARENT 1 Jan 08 2001 14:47:28 . . .
One way you get the DB_NAME
of UNKNOWN
is when you register a database that was once opened with the RESETLOGS
option. The DB_NAME
can be changed during a RESETLOGS
operation, so RMAN does not know what the DB_NAME
was for those old incarnations of the database because it was not registered in the recovery catalog at the time. Consequently, RMAN sets the DB_NAME
column to UNKNOWN
when creating the DBINC
record.