How to make netapp Oracle database snapshot copy crash-consistent


Jephe Wu - http://linuxtechres.blogspot.com

Objective: understanding Netapp point-in-time snapshot Oracle backup without putting in the hot backup mode


Crash-consistent snapshot copies should only be considered under special circumstances where requirements restrict the use of standard backup methods (such as rman or hot backup mode)

Oracle backup overview:
----------------------
physical backup and logical backup

physical backup can be classified as consistent backup or inconsistent backup

consistent backup means controlfile and data file are checkpointed with same SCN, only possible when database is cleanly shut down, no matter it's in nonarchivelog or archivelog mode

Besides the standard 3 methods for backup: cold/offline backup, rman backup and online/hot backup(user-managed backup), Oracle recently certify the third party snapshot copy technology as one of options of backup/recovery as long as it's crash consistent

In the past, Oracle did not support or recommend the use of a snapshot copy created of an online active
database without the database or tablespaces being put in backup mode. The risk was thought to be
the danger of mixing old archive logs with current archive logs, which can lead to data corruption or
potentially destroy the production database.

According to MOS note 604683.1, the snapshot of an online database not in backup mode can be
deemed valid and supported if and only if all of the following requirements are strictly satisfied:
•  Oracle’s recommended restore and recovery operations are followed. 
•  Database is crash consistent at the point of the snapshot. 
•  Write ordering is preserved for each file within a snapshot. 


Oracle recovery overview:
------------------------
instance recovery and media recovery
instance recovery is automatic done by Oracle itself, it requires redo log file only.
Media recovery requires archived redo log.
Media recovery has complete recovery and incomplete recovery


An incomplete recovery of the whole database is usually required in the following situations:
•  Data loss caused by user errors
•  Missing archived redo log, which prevents complete recovery
•  Physical loss or corruption of online redo logs
•  No access to current control file

When performing incomplete recovery, the types of media recovery are available.


  • Time-based recovery  Recovers the data up to a specified point in time. 
  • Cancel-based recovery  Recovers until you issue the CANCEL statement (not available when using Recovery Manager). 
  • Change-based recovery  Recovers until the specified SCN. 
  • Log sequence recovery  Recovers until the specified log sequence number (only available when using Recovery Manager). 

What's the crash consistent?
------------------------------
It's point-in-time(PIT) image of Oracle database, looks like it crashed due to power outage, instance crash or shutdown abort etc, it requires instance recovery after restart database, not media recovery. Netapp snapshot generate Point-In-time image for database.


How to make snapshot crash-consistent?
---------------------------------------
1. all databqase files(controlfile, datafile, online redo log) are in single volume, then snapshot will generate crash-consistent image.
Note: Not require archived logs to be in the same volume.

If a database has all of its files (control files, data files, online redo logs, and archived logs) contained
within a single NetApp volume, then the task is straightforward. A Snapshot copy of that single volume
will provide a crash-consistent copy.

2. use crash consistent group by snapmanager/snapdrive etc if database cross different volumes
e.g. data volume and log volume, data captured by snapshot for data volume must exist in log volume first because Oracle always makes sure it writes to redo log first before writing associated data buffer cache to data file.


Starting from SnapDrive for unix 2.2, SnapDrive supports the feature of consistency groups provided
by Data ONTAP (beginning with version 7.2 and higher). This feature is necessary for creating a
consistent Snapshot copy across multiple controller/volumes.

In an environment where all participating controllers support consistency groups, SnapDrive will use a
Data ONTAP consistency group as the preferred (default) method to capture multicontroller/volume
Snapshot copies.

SnapDrive can simplify the creation of a consistency group Snapshot copy when there are
multiple file systems.

snapdrive snap create -fs /u01/oradata/prod /u02/oradata/prod -snapname snap_prod_cg 


a. POINT-IN-TIME COPY OF THE DATABASE 

After the database is opened, no future redo logs beyond this snapshot
can be applied.

Open resetlogs operation is recommended to avoid potential mixing of existing
archive logs and new archive logs. and start a new incarnation and log ID:

1. SHUTDOWN IMMEDIATE 
2.  STARTUP MOUNT 
3.  RECOVER DATABASE UNTIL CANCEL 
4.  ALTER DATABASE OPEN RESETLOGS; 

b. FULL DATABASE RECOVERY WITH ZERO DATA LOSS 

Restore the snapshot of only the data files. Do not overwrite the current control files, current redo
logs, and current archived logs.

run commands below to fully recover database by applying archived and online redo logs
1. recover automatic database;
2. alter database open;

c. point-in-time(PIT) database recovery
PIT requires the presence of current controlfile, current online redo logs and archived logs.

only restore data files and run the following commands:
1. startup mount

Identify the minimum SCN we have to recover to by script @scandatafile.sql

SQL> @scandatafile  
File 1 absolute fuzzy scn = 861391  
File 2 absolute fuzzy scn = 0  
File 3 absolute fuzzy scn = 0  
File 4 absolute fuzzy scn = 0  
Minimum PITR SCN = 861391  

PL/SQL procedure successfully completed. 


scandatafiles.sql 
# scans all files and update file headers with meta information  
# depending on number and sizes of files, the scandatafile procedure can be 
a  
# time consuming operation.  
# create a script, “scandatafile”, with the following content 

spool scandatafile.sql  
set serveroutput on  
declare  
 scn number(12) := 0;  
 scnmax number(12) := 0;  
begin  
 for f in (select * from v$datafile) loop  
 scn := dbms_backup_restore.scandatafile(f.file#);  
 dbms_output.put_line('File ' || f.file# ||' absolute fuzzy scn = ' || 
scn);  
 if scn > scnmax then scnmax := scn; end if;  
 end loop;  

 dbms_output.put_line('Minimum PITR SCN = ' || scnmax);  
end; 


If the minimum PITR SCN is zero, then database is not required for further recovery, it can to opened now.
if it's no zero, database must be recovered to at least that SCN and onwards.

2. RECOVER AUTOMATIC DATABASE UNTIL CHANGE [Minimum PITR SCN or higher]  
or 
ALTER DATABASE RECOVER DATABASE UNTIL CHANGE [Minimum PITR SCN or higher] 

3. ALTER DATABASE OPEN RESETLOGS 

References:

--------------
1. Using Crash-Consistent Snapshot Copies as Valid Oracle Backups - http://media.netapp.com/documents/tr-3858.pdf
2. MOS Supported Backup, Restore and Recovery Operations using Third Party Snapshot Technologies [ID 604683.1]