Oracle database recovery procedures

This blog is the summary of the article 'Recovering Oracle' at  http://examples.oreilly.com/9781565926424/oracle.html, It's very good article regarding how Oracle perform recovery internally.

----
How Oracle select/update statement works: - from spotlight on Oracle Getting started pdf document from Quest software:





The numbers indicate the order of the flow of information.
The numbered labels in the Oracle architecture diagram correspond to the following activities:
1. The client program (for example, SQL*PLUS, Oracle Power Objects, or some other tool)
sends a SELECT statement to the server process.
2. The server process looks in the shared pool for a matching SQL statement. If none is
found, the server process parses the SQL and inserts the SQL statement into the
shared pool.
3. The server process looks in the buffer cache for the data blocks required. If found, the
data block must be moved on to the most recently used end of the Least Recently
Used (LRU) list.
4. If the block cannot be found in the buffer cache the server process must fetch it from the
disk file. This requires a disk I/O.
5. The server process returns the rows retrieved to the client process. This may involve some
network or communications delay.
6. When the client issues the UPDATE statement the process of parsing the SQL and
retrieving the rows to be updated must occur. The update statement then changes the
relevant blocks in shared memory and updates entries in the rollback segment buffers.
7. The update statement also makes an entry in the redo log buffer that records the
transaction details.
8. The database-writer background process copies modified blocks from the buffer cache to
the database files. The Oracle session performing the update does not have to wait for
this to occur.
9. When the COMMIT statement is issued the log writer process must copy the contents of
the redo log buffer to the redo log file. The COMMIT statement does not return control to
the Oracle session issuing the commit until this write is complete.
10. If running in ARCHIVELOG mode, the archiver process copies full redo logs to the
archive destination. A redo log is not eligible for re-use until it has been archived.
11. At regular intervals, or when a redo log switch occurs, Oracle performs a checkpoint. A
checkpoint requires all modified blocks in the buffer cache to be written to disk. A redo
log file cannot be re-used until the checkpoint completes.





Best Practise:

Controlfile:

  • multiplex them in different places
  • enable rman auto backup settings
  • enable rman daily backup to trace file
  • cannot use a controlfile older than database datafile, in this case, you need to restore both auto controlfile backup and database files from rman backup 
Need media recovery if you restore a backup control file (even if all datafiles are current). Refer to  http://docs.oracle.com/cd/B19306_01/backup.102/b14192/intro004.htm 

  • only when datafile and redo log are okay, then you can recreate controlfile from 'backup to trace' script.
redo logs
  • multiplex members in each redo groups
  • non-active redo log damage is not critical, can be easily repaired.
  • active or current redo log group damage will cause data lost
An active log group is usually the log group that Oracle just finished writing to. However, until a checkpoint occurs, this group is still needed for media recovery 
You can run command 'select group#, status from v$log;' and 'select group#,member from v$logfile;' on mounted and closed database to find out status of redo log groups. As well as command 'select * from v$log_history;'


SPFILE/PFILE
  • enable daily rman pfile backup
sql "CREATE PFILE=''/u01/app/backup/spfile.backup'' FROM SPFILE";
    COMMAN RECOVERY LINKS

    Database Recovery and use previous incarnation for rman recovery:
      Recovering non-critical files 
      Recovering non-critical files would include temporary tablespaces, redo log files, index tablespaces, read-only tablespace, password file.http://www.datadisk.co.uk/html_dos/oracle/recover_noncritical_files.htm

      Recovering critical files
      http://www.datadisk.co.uk/html_docs/oracle/recover_critical_files.htm

      RMAN commands summary:
      http://www.datadisk.co.uk/html_docs/oracle/rman.htm

      Recover from user error as well as flashback architecture:
      http://www.datadisk.co.uk/html_docs/oracle/recover_user_errors.htm
      http://www.datadisk.co.uk/html_docs/oracle/flashback.htm

      Part I - controlfile damage - mount will fail
      1. startup mount and controlfile
      mounting a database (without opening it) reads the control files but does not open the data files. If the control files are mirrored, Oracle attempts to open each of the control files that are listed in the initORACLE_SID.ora pfile or spfile. If any of them is damaged, the mount fails.

      Identify which controlfile is damaged

      • make backup copy of the whole database first, including all the control files, online redo logs and datafiles or at least all controlfiles and online redo logs
      • check message in $ORACLE_BASE/ORACLE_SID/admin/bdump/alert_ORACLE_SID.log to find out which controlfile is damaged.
      • check pfile/spfile control_files line to find out name of controlfiles, list each of them and compare their size and modification time.
                     1. if the file that Oracle is complaining about is just missing, it's easy. Just shutdown database, copy a good controlfile to overwrite corrupted one, then startup mount again.

                     2. if damaged file is not missing but corrupted, solution is to one at a time, try copying every version of each control file to all the other locations, then attempt to mount database again, excluding the one that Oracle has already complaining about since it's obviously damaged.

                     3. if all control files are damaged, we can either

      • recreate it from control file creation script if online redo logs are okay, so we won't have data lost, provide we perform backup control file to trace regularly, see below for details 
      The data files can be older versions that were restored from backup, since they will be rolled forward by the media recovery.

      However, the online redo logs must be current and intact before running "create controlfile" script.

      or
      • restore controlfile from rman controlfile autobackup 

      2 rman control file auto backup (backup format that contains 'DBID')
      You need to configure controlfile autobackup on so that rman will backup controlfile automatically as format c-DBID-timestamp-qq(QQ is the hex sequence that starts with 00 and has a maximum of FF, Sequence of autobackup, first autobackup of the day would be 01, second is 02, etc etc). You should manually configure controlfile autobackup format although the default value looks like the same as the one we need, otherwise, when rman does backup, the backuped controlfile won't be the format we want , which is c-DBID-timestamp-xx format.

      rman target /
      rman> show all;
      CONFIGURE CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE DISK TO '%F'; # default
      rman> CONFIGURE CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE DISK TO '%F';
      rman> show all  [removed '# default']
      CONFIGURE CONTROLFILE AUTOBACKUP ON;
      CONFIGURE CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE DISK TO '%F';

      when you run rman> backup database plus archivelog; , it will backup controlfile as the format we want as follows:

      Starting Control File and SPFILE Autobackup at 04-APR-12
      piece handle=/home/oracle/app/oracle/product/11.2.0/dbhome_1/dbs/c-1294051154-20120404-05 comment=NONE
      Finished Control File and SPFILE Autobackup at 04-APR-12


      note: above c-1294051154-20120404-05 file is backup pieces, not purely controlfile, use 'strings c-1294051154-20120404-05 | head -10' to verify it.


      You can also use "backup current controlfile [format '/tmp/controlfile']" in rman to just backup conrolfile to binary format.


      3. backup control file to trace (script to be used to recreate control file)
      In order to recreate controlfile, you have to backup it first by using one of the following commands:

            Binary format backup:
           rman> backup current controlfile [format '/tmp/controlfile'];
            
            Text format script backup:
      sql> alter database backup controlfile to trace;
      sql> alter database backup controlfile to trace as '/some/path/filename';
      sql> alter database backup controlfile to trace as '/some/path/filename' reuse;

      Specify REUSE to indicate that existing control files can be reused and overwritten.

      You should perform this regularly, either in daily rman backup or separated cronjob as follows:


      export ORACLE_SID=livedb
      week=`date +%w`
      bkp_loc=/tmp
      
      rman target / << END
      sql "alter database backup controlfile to trace as ''${bkp_loc}/controlbkp_${ORACLE_SID}_${week}.sql'' reuse";
      END
      
      
      note: red color '' are 2 single quotes. reuse means it's be able to overwrite existing file name.

      If you only used 'alter database backup controlfile to trace', then
      • for 10g, the trace file *.trc will be put in user_dump_dest parameter; 
      • for 11g, query v$diag_info view to get 'diag trace' path , it will be put there.

      Refer to http://docs.oracle.com/cd/E14072_01/server.112/e10592/statements_1004.htm (search 'backup controlfile to trace' statement.

      
      
      4.  controlfile script location
      firstly, find out where the script is if you have done automatic backup.



      sql> alter database backup control file to trace [as '/tmp/ctl.sql' reuse];

      $ cd $ORACLE_HOME/dbs; grep user_dump_dest *  
      let's say the udump directory is /db/Oracle/admin/crash/udump, then run
      $ cd /db/Oracle/admin/crash/udump ; grep 'CREATE CONTROLFILE' * |awk -F: '{print $1}'| xargs ls -ltr



      4.1 noresetlog case (good redo logs)
      modify backuped controlfile to trace as follows:

      STARTUP NOMOUNT
      CREATE CONTROLFILE REUSE DATABASE "ORCL" NORESETLOGS  ARCHIVELOG
          MAXLOGFILES 16
          MAXLOGMEMBERS 3
          MAXDATAFILES 100
          MAXINSTANCES 8
          MAXLOGHISTORY 292
      LOGFILE
        GROUP 1 '/home/oracle/app/oracle/oradata/orcl/redo01.log'  SIZE 50M BLOCKSIZE 512,
        GROUP 2 '/home/oracle/app/oracle/oradata/orcl/redo02.log'  SIZE 50M BLOCKSIZE 512,
        GROUP 3 '/home/oracle/app/oracle/oradata/orcl/redo03.log'  SIZE 50M BLOCKSIZE 512
      DATAFILE
        '/home/oracle/app/oracle/oradata/orcl/system01.dbf',
        '/home/oracle/app/oracle/oradata/orcl/sysaux01.dbf',
        '/home/oracle/app/oracle/oradata/orcl/undotbs01.dbf',
        '/home/oracle/app/oracle/oradata/orcl/users01.dbf',
        '/home/oracle/app/oracle/oradata/orcl/example01.dbf'
      CHARACTER SET AL32UTF8
      ;
      VARIABLE RECNO NUMBER;
      EXECUTE :RECNO := SYS.DBMS_BACKUP_RESTORE.SETCONFIG('CONTROLFILE AUTOBACKUP','ON');
      RECOVER DATABASE
      ALTER SYSTEM ARCHIVE LOG ALL;
      ALTER DATABASE OPEN;
      ALTER TABLESPACE TEMP ADD TEMPFILE '/home/oracle/app/oracle/oradata/orcl/temp01.dbf'
           SIZE 30408704  REUSE AUTOEXTEND ON NEXT 655360  MAXSIZE 32767M;


      4.2 with resetlog case as follows:  (redo log damaged)

      STARTUP NOMOUNT
      CREATE CONTROLFILE REUSE DATABASE "ORCL" RESETLOGS  ARCHIVELOG
          MAXLOGFILES 16
          MAXLOGMEMBERS 3
          MAXDATAFILES 100
          MAXINSTANCES 8
          MAXLOGHISTORY 292
      LOGFILE
        GROUP 1 '/home/oracle/app/oracle/oradata/orcl/redo01.log'  SIZE 50M BLOCKSIZE 512,
        GROUP 2 '/home/oracle/app/oracle/oradata/orcl/redo02.log'  SIZE 50M BLOCKSIZE 512,
        GROUP 3 '/home/oracle/app/oracle/oradata/orcl/redo03.log'  SIZE 50M BLOCKSIZE 512
      DATAFILE
        '/home/oracle/app/oracle/oradata/orcl/system01.dbf',
        '/home/oracle/app/oracle/oradata/orcl/sysaux01.dbf',
        '/home/oracle/app/oracle/oradata/orcl/undotbs01.dbf',
        '/home/oracle/app/oracle/oradata/orcl/users01.dbf',
        '/home/oracle/app/oracle/oradata/orcl/example01.dbf'
      CHARACTER SET AL32UTF8
      ;
      VARIABLE RECNO NUMBER;
      EXECUTE :RECNO := SYS.DBMS_BACKUP_RESTORE.SETCONFIG('CONTROLFILE AUTOBACKUP','ON');
      RECOVER DATABASE USING BACKUP CONTROLFILE
      ALTER DATABASE OPEN RESETLOGS;
      ALTER TABLESPACE TEMP ADD TEMPFILE '/home/oracle/app/oracle/oradata/orcl/temp01.dbf'
           SIZE 30408704  REUSE AUTOEXTEND ON NEXT 655360  MAXSIZE 32767M;

      note: Specify RESETLOGS if you want Oracle to ignore the contents of the files listed in the LOGFILE clause

         4.3 When to use above script to recreate controlfile? 
      Use this create controlfile command to re-create a control file only if:
      • All copies of your existing control files have been lost through media failure.
      • You want to change the name of the database.( use 'set database')
      • You want to change the maximum number of redo log file groups, redo log file
        members, archived redo log files, datafiles, or instances that can concurrently
        have the database mounted and open.
        4.4 how to recreate it?

         $ sqlplus / as sysdba @/tmp/ctl.sql


      5.  pfile/spfile parameters : control_files and CONTROL_FILE_RECORD_KEEP_TIME
      CONTROL_FILE_RECORD_KEEP_TIME specifies the minimum number of days before a reusable record in the control file can be reused. In the event a new record needs to be added to a reusable section and the oldest record has not aged enough, the record section expands. If this parameter is set to 0, then reusable sections never expand, and records are reused as needed.


      6. References:
      1. controlfile and datafile header structure:

      controlfile structure:
      select * from  V$CONTROLFILE_RECORD_SECTION

      datafile header:
      select * from v$datafile_header
      2. create controlfile script usage -

      Part II - redo logs are damaged
       Missing member of any online log group
      If the redo logs are mirrored, one or more of the mirrored copies are lost, but at least one good copy of each online redo log remains, Oracle will open the database without any errors displayed to the terminal. The only error will be a message like the following one in the alert log:
      Errors in file /db/Oracle/admin/crash/bdump/crash_lgwr_10302.trc:
      ORA-00313: open failed for members of log group 2 of thread 1


      All members of any online log group are corrupted
      However, if all members of any online log group are corrupted, Oracle will complain and the database will not open. The error might look something like this:
      ORA-00327: log 2 of thread 1, physical size less than needed
      ORA-00312: online log 2 thread 1: '/db/Oracle/b/oradata/crash/redocrash02.log'
      ORA-00312: online log 2 thread 1: '/db/Oracle/a/oradata/crash/redocrash03.log'


      Missing all members of any online log group
      A similar problem is if all members of an online log group are missing. Oracle will complain and the database will not open. The error looks something like this:
      ORA-00313: open failed for members of log group 2 of thread 1
      ORA-00312: online log 2 thread 1: '/db/Oracle/b/oradata/crash/redocrash02.log'
      ORA-00312: online log 2 thread 1: '/db/Oracle/a/oradata/crash/redocrash03.log'







      1. controlfile and redo logs
      The "create controlfile" will be useful only if all the data files and online redo logs are in place. The data files can be older versions that were restored from backup, since they will be rolled forward by the media recovery. However, the online redo logs must be current and intact for the "create controlfile" script to work.

      Note: If redo log file lost, consider to use _ALLOW_RESETLOGS_CORRUPTION = TRUE in
      spfile/pfile for creating controlfile to work.
      ALLOW_RESETLOGS_CORRUPTION=TRUE allows database to open without consistency checks.
      This may result in a corrupted database


      2. If both controlfile and redo log are damaged:
      Firstly, assume we have done rman full database backup:
      rman> backup database plus archivelog;


      then all controlfiles and all the current redo log group members are damaged, we can do this:

      sql> shutdown abort;
      rman target /
      rman> startup nomount;
      rman> restore controlfile from autobackup;
      rman> restore database
      note: If you recreate controlfile by using script, then it doesn't find rman backup from flash recovery area by Oracle itself,  you might need to run " catalog start with 'rman latest backupset path' "

      rman> recover database until cancel using backup controlfile;
      or
      rman> recover database until scn SCNnumber; (prompted from above)
      rman> alter database open resetlogs;

      2.  identify damaged datafile or redo log file
      sql > select name from v$datafile;
      sql > select group#, member from v$logfile;

      The main thing to look for is a missing file or a zero length file.  And, each redo log file within a log group should have the same modification time.


      Something else to look for is one or more files that have a modification time that is newer than the newest online redo log file. If a data file meets any one of these conditions, it must be restored from backup.

      The "create controlfile" script requires that all online redo logs be present. If even only one log group is completely damaged, it will not be useful after recreating controlfile.


      3. run controlfile creation script with readonly tablespace

      modify backup controlfile to trace result as follows:
      ---------------
      # The following commands will create a new controlfile and use it
      # to open the database.
      # Data used by the recovery manager will be lost. Additional logs may
      # be required for media recovery of offline data files. Use this
      # only if the current version of all online logs are available.


      STARTUP NOMOUNT
      CREATE CONTROLFILE REUSE DATABASE "CRASH" NORESETLOGS ARCHIVELOG
      MAXLOGFILES 32 # maximum logfile  groups
      MAXLOGMEMBERS 2
      MAXDATAFILES 30
      MAXINSTANCES 8
      MAXLOGHISTORY 843
      LOGFILE
      GROUP 1 '/db/a/oradata/crash/redocrash01.log' SIZE 500K,
      GROUP 2 '/db/b/oradata/crash/redocrash02.log' SIZE 500K,
      GROUP 3 '/db/c/oradata/crash/redocrash03.log' SIZE 500K
      DATAFILE
      '/db/a/oradata/crash/system01.dbf',
      '/db/a/oradata/crash/rbs01.dbf',
      '/db/a/oradata/crash/temp01.dbf',
      '/db/a/oradata/crash/tools01.dbf',
      '/db/a/oradata/crash/users01.dbf'
      ;
      # Recovery is required if any of the data files are restored backups,
      # or if the last shutdown was not normal or immediate.
      RECOVER DATABASE
      # All logs need archiving and a log switch is needed.
      ALTER SYSTEM ARCHIVE LOG ALL;
      # Database can now be opened normally.
      ALTER DATABASE OPEN;
      # Files in read only tablespaces are now named.
      ALTER DATABASE RENAME FILE 'MISSING00006' TO '/dba/oradata/crash/test01.dbf';
      # Online the files in read only tablespaces.
      ALTER TABLESPACE "TEST" ONLINE;


      and temp tablespace creation script;
      -----------------
      sqlplus / as sysdba
      sql> startup nomount;  or alter datbase close;
      sql> @ctl.sql


      4 read-only data files
      To find out if the control file is valid and has been copied to all of the correct locations, attempt to start up the database with the mount option.

      Oracle does not allow read-only data files to be online during a 'recover database using backup controlfile' action. Therefore, if there are any read-only data files, take them offline by running:
      alter database data file 'filename' offline; 

      To find out if there are any read-only data files, issue the following command on the mounted, closed database:
      sql > select enabled, name from v$datafile; 


      5. recover the database using the backup control file.
      Try to recover database normally first as it won't hurt anyway.If it doesn't work, then run
      sql> recover database using backup controlfile

      Oracle will request all archived redo logs since the time of the oldest restored data file
      The most efficient way to roll through the archived redo logs is to have all of them sitting uncompressed in the directory that it suggests as the location of the first file apply online redo log.

      Once Oracle rolls through all the archived redo logs, it may prompt for the online redo log. It does this by prompting for an archived redo log with a number that is higher than the most recent archived redo log available.

      You can use 'strings redo1.log | head -3' to find out sequence number of redo log when recovering the database using a backup control file, it must be opened with the resetlogs option
      sql> alter database open resetlogs;


      6. recover non-critical(system) tablespace (not system, sysaux, undo tablespaces)

      rman> alter tablespace tablespace_name1 offline;

      rman> recover tablespace tablespace_name1 ;

      rman > alter tablespace tablespace_name1 online;

      7. alter database open
      Mounting the database only checks the presence and consistency of the control files. If that works, opening the database is the next step. Doing so will check the presence and consistency of all data files, online redo log files, and any rollback segments.


      There is only one copy of each data file, unlike online redo logs and control files that can be mirrored. So, statistically speaking, it's easier to lose one data file than to lose all mirrored copies of a log group or all mirrored copies of the control file.

      8. damaged redo log groups:
      If all members of a log group are damaged, there is a great potential for data loss. The entire database may have to be restored, depending on the status of the log group that was damaged, and the results of some attempts at fixing it.


      Part III - data file damaged.
      1. error message when data files are damaged.
      Damaged rollback segment






      If a rollback segment is damaged, the error will be like the following one:ORA-01545: rollback segment 'USERS_RS' specified not availableCannot open database if all rollback segments are not available.

      Missing data file






      ORA-01157: cannot identify data file 1 - file not foundORA-01110: data file 1: '/db/Oracle/a/oradata/crash/system01.dbf'
      Corrupted data file






      A corrupted data file can generate a number of different errors. For instance, it may mimic a missing data file:ORA-01157: cannot identify data file 1 - file not foundORA-01110: data file 1: '/db/Oracle/a/oradata/crash/system01.dbf'It may also completely confuse Oracle:ORA-00600: internal error code, arguments: [kfhcfh1_01], [0], [], [], [], [], [], []A corrupted data file may also cause a "failed verification check" error:ORA-01122: database file 1 failed verification checkORA-01110: data file 1: '/db/Oracle/a/oradata/crash/system01.dbf'
      ORA-01200: actual file size of 1279 is smaller than correct size of 40960 blocks


      9. damanaged undo datafile:
      Since Oracle has to open the data files that contain this rollback segment before it can verify that the rollback segment is available, this error will not occur unless a data file has been taken offline. If Oracle encounters a damaged data file (whether or not it contains a rollback segment), it will complain about that data file and abort the attempt to open the database.

      Remember that a rollback segment is a special part of a tablespace that stores rollback information. Rollback information is needed in order to undo (or rollback) an uncommitted transaction. Since a crashed database will almost always contain uncommitted transactions, recovering a database with a damaged rollback segment is a little tricky.

      As previously mentioned, a damaged data file may be taken offline, but Oracle will not open the database without the rollback segment.

      The strategy for dealing with this is to make Oracle believe that the rollback segment doesn't exist. That will allow the database to be brought online. However, there will be transactions that need to be rolled back that require this rollback segment. Since Oracle believes this rollback segment is no longer available, these rollbacks cannot occur. This means that the database may be online, but portions of it will not be available.

      Above method is for temporary solution, the proper solution is to restore then recover undo tablespace. then open database 'startup mount' again.

      11. media recovery
      If any data files are restored from backup, the recover command will be needed. This command uses the archived and online redo logs to "redo" any transactions that have occurred since the time that the backup of a data file was taken. You can recover a complete database, a tablespace, or a data file by issuing the commands recover database, recover tablespace tablespace_name and recover data file data file_name, respectively

      sql> startup mount;
      sql> recover datafile '/db/Oracle/a/oradata/crash/datafile01.dbf'
      or recover [database|tablespace|datafile] '/db/Oracle/a/oradata/crash/datafile01.dbf'

      After receiving the recover command, Oracle prompts for the name and location of the first archived redo log that it needs. If that log, and all logs that have been made sinse that log, are online, uncompressed, and in their original location, enter the word AUTO. This tells Oracle to assume that all files that it needs are online. It can therefore automatically roll through each log that it needs.

      At some point, it may ask for an archived redo log that is not available. This could mean some of the archived redo logs or online redo logs are damaged. If the file cannot be located or restored, enter CANCEL.

      12. system tablespace is damaged
      If the damaged file is part of the SYSTEM tablespace, an offline recovery is required.

      All other missing data files can be recovered with the database online. Unfortunately, Oracle only complains that the data file is missing -- without saying what kind of data file it is.

      You can find out by using this on the mounted, closed database:
      sql>select name from v$datafile where status = 'SYSTEM' ;

      Unlike other tablespaces, the SYSTEM tablespace must be available in order to open the database. Therefore, if any members of the system tablespace are damaged, they must be restored now. Before doing this, make sure that the database is not open. (It is okay if it is mounted.) To make sure, run the following command on the mounted, closed database:

      sql> select status from v$instance;
      mounted

      Only when it's mounted state, not open, then restore the damaged files from the most recent backup available, after restored, recover tablespace system;
      sql> recover tablespace system;


      13. Damaged Non-System Data File?
      To open a database with a damaged, non-system data file, take the data file offline.
      sql> alter database datafile 'filename' offline;

      If database is not at archivelog mode, The only thing Oracle does allow is to drop the data file entirely. This means, of course, that the tablespace that contains this file will have to be rebuilt from scratch. This is but one of the many reasons why a production instance should not be operating in no archive log mode.

      sql> alter database datafile 'filename' offline drop;

      14. bring datafile back on line
      First find out which data files were taken offline.
      sql>select name from v$datafile where status = 'OFFLINE';

      Once the names of the data files that need to be restored are determined, restore them from the latest available backup. Once they are restored, recovery within Oracle can be accomplished in three different ways. These ways vary greatly in complexity and flexibility. Examine the following three media recovery methods and choose whichever one is best for you.

      Datafile recovery:
      sql> recover datafile 'datafile_name';
      sql> alter database datafile 'datafile_name' online;


      tablespace recovery:
      sql> select file_name, tablespace_name from dba_data_files; 
      sql> alter tablespace tablespace_name1 offline;
      sql> recover tablespace tablespace_name1 ;
      sql> alter tablespace tablespace_name2 offline;
      sql> recover tablespace tablespace_name2 ;

      database recovery:
      once database fiels are restored, then
      sql> alter database close;
      sql> recover database
      auto
      sql> alter databse open;


      To make sure that all tablespaces and data files have been returned to their proper status,
      sql> select name, status from v$datafile
      sql> select member, status from v$logfile


      15. Is There a Damaged Log Group?
      When we refer to a damaged log group, we mean that all members of a log group are damaged. If at least one member of a mirrored log group is intact, Oracle opens the database and simply put an error message in the alert log. However, if all members of a log group are damaged, the database will not open, and the error will look something like this:

      The first thing that must be determined is the status of the damaged log group. The
      three possibilities are current, active, and inactive

      sql> select group#, status from v$log;
      current log means oracle is currently writing to it.
      Active means just finished writing to it, but still needed for media recovery, this is shot period of time you can see this status.
      Inactive means oracle will not use it in anyway.

      Is the current online log damaged or not?
      If the current online log group is damaged, there would be a message like the following when attempting to open the database:
      ORA-00313: open failed for members of log group 2 of thread 1
      ORA-00312: online log 2 thread 1: '/db/Oracle/b/oradata/crash/redocrash02.log'
      ORA-00312: online log 2 thread 1: '/db/Oracle/a/oradata/crash/redocrash03.log'

      This is the worst kind of failure to have because there will definitely be data loss. That is because the current online log is required to restart even a fully functioning database. The current control file knows about the current online log and will attempt to use it. The only way around that is to restore an older version of the control file. Unfortunately, you can't restore only the control file because the data files would then be more recent than the control file. The only remaining option is to restore the entire database.

      Is an Active online redo log damaged?
      Remember that an ACTIVE log is one that is still needed for recovery. The reason that it is still needed is because a checkpoint has not flushed all changes from shared memory to disk. Once that happens, this log will no longer be needed.

      The way to attempt to recover from active redo log files damage is to perform a checkpoint. If it is successful, the database should open successfully. To perform a checkpoint, issue the following command on the mounted, closed database:
      sql> alter system checkpoint local;

      no matter above command is successful or not, try to open database; if not, you have to recover entire database.

      Is an inactive online redo log damaged?
      In comparison, this one should be a breeze. An INACTIVE log is not needed by Oracle. If it is not needed, simply drop it and add another in its place.

      sql> select member from v$logfile where GROUP# = '2 ;
      sql> alter database drop logfile group 2 ; 
      sql> alter database add logfile group 2 ('/logs1redolog01.dbf', '/logs2redolog01.dbf', '/logs4redolog01.dbf') size 500K ; 

      16. damanaged undo segments which caused some datafile is damaged.

      Because of the unique nature of damaged rollback segments, there are two choices for recovery. The first is to get the database open sooner, but that may leave it only partially functional for a longer period of time. The second choice takes a little longer to open the database, but once it is open it will not have data files that are needed for this rollback segment. Which is more important: getting even a partially functional database open as soon as possible, or not opening the database until all rollback segments are available? The latter is more prudent, but the former may be more appropriate to the environment.


      a. uncomment in spfile for undo tablespace, then open it temporarily
      Simply delete that part of the line in spfile/pfile for undo tablespace part. First, shut down Oracle completely (this includes un-mounting it as well). Then copy and comment the rollback segment line in the initORACLE_SID.ora file

      b. recover undo tablespace first then 'startup mount'

      The first thing that must be determined is which tablespace the damaged rollback segment is in. Unfortunately, there is no fixed view that contains this information. That means that it will have to be discovered through common sense and deduction. First, remember that this error is not displayed unless a data file has been taken offline. To get a complete list of files that were taken offline, run the following command on a mounted, closed database:


      sql> select TS#, name from v$datafile where status = 'OFFLINE' ;
      Then find out the name of the tablespace that contains this data file:
      sql> select name from v$tablespace where TS# = '5' ;

       How do we know which one contains the rollback segment? Unfortunately, there is no way to be sure while the database is closed. That is why it is very helpful to put the rollback segments in dedicated tablespaces that have names that easily identify them as such. It's even more helpful if the data files are named something helpful as well. For example, create a separate tablespace called ROLLBACK_DATA, and call its data files rollback01.dbf, rollback02.dbf, etc. That way, anyone that finds himself in this scenario will know exactly which data files contain rollback data.


      17. alter database open resetlogs
      This command causes Oracle to open the database after clearing all contents of the online redo log files. Since there is no way to undo this step, it is a good idea to make copies of the online redo log files now.

      sql> select member from v$logfile;  # to find out name of redo logs
      sql> alter database open resetlogs;


      18. References:
      http://www.orafaq.com/wiki/Oracle_database_Backup_and_Recovery_FAQ

      19. RMAN commands:
      http://docs.oracle.com/cd/B28359_01/backup.111/b28270/rcmconfb.htm#i1015217
      configure backup optimisation on;


      When you turn on backup optimization, all backup commands will skip backups of any file if it has not changed and if it has already been backed up to the allocated device type. It includes read only tablespaces.

      select property_name, property_value from database_properties where property_name like '%DEFAULT%'


      DEFAULT_TEMP_TABLESPACE        TEMP
      DEFAULT_PERMANENT_TABLESPACE   USERS
      DEFAULT_EDITION       ORA$BASE
      DEFAULT_TBS_TYPE       SMALLFILE

      How to kickstart from both directly and non-directly connected network

      Jephe Wu - http://linuxtechres.blogspot.com

      Objective: kickstart CentOS 6 installation from both direct and non-direct connected network
      Environment: CentOS 6, LAN
      keywords: dhcp, dhcrelay, kickstart.



      Network diagram:

      ------PC0(DHCP and to be tickstarted)------[eth1:192.168.1.1]DHCP Server[eth0:192.168.0.1]----PC1(non-DHCP network, no need kickstart)---[eth0:192.168.0.2] Linux DHCP Relay agent server [eth1:192.168.2.1] ----PC2(need DHCP and to be kickstarted)

      Steps:

      1. configure DHCP server as follows:

      # this global config is mainly for 192.168.1.1 dhcp network pcs
      option time-offset    -18000; # Eastern Standard Time
      option domain-name    "jephe";
      option routers      192.168.1.1;
      option subnet-mask    255.255.255.0;
      option domain-name-servers  192.168.1.1;
      default-lease-time    21600;
      max-lease-time      43200;
      allow booting;
      allow bootp;


      ddns-update-style none;
      ignore client-updates;


      subnet 192.168.1.0 netmask 255.255.255.0 {
      range dynamic-bootp 192.168.1.33 192.168.1.54;
      next-server 192.168.1.1
      filename "linux/pxelinux.0";


      #if need to make dhcp server to listen on eth0 also even if pc1 is no need dhcp, it's for serving dhcp relay network 192.168.2.x# if listening on eth1 only(set it in /etc/default/dhcp3 for ubuntu)

      subnet 192.168.0.0 netmask 255.255.255.0 {
      }


      subnet 192.168.2.0 netmask 255.255.255.0 {
       range 192.168.2.33 192.168.2.54;
       option domain-name-servers 192.168.0.1;
       option routers 192.168.2.1;   dhcrelay ip
       next-server 192.168.0.1; dhcp/tftp server
       filename "linux/pxelinux.0";
       }


      2. dhcrelay server configuration


      [root]# vi /etc/sysconfig/dhcrelay 
      # Command line options here
      DHCRELAYARGS=""
      # DHCPv4 only
      INTERFACES="eth0 eth1"  #must listen on both interfaces, otherwise, dhcp relay package is not able to forward back to client in non-directly connected network 192.168.2.X
      # DHCPv4 only
      DHCPSERVERS="192.168.0.1"
                           
      3. start up services and monitoring
      chkconfig dhcrelay on
      service dhcrelay start

      tcpdump -n port 67 or port 68  # on dhcp server
      dhcrelay -d -i eth0 eth1 # to monitoring dhcrelay packages

      Configure ASMM and AMM in Oracle

      Jephe Wu - http://linuxtechres.blogspot.com

      Objective: understanding ASMM and AMM in Oracle
      Environment: Oracle 10g and 11g

      References:  Oracle support - Automatic Memory Management (AMM) on 11g [ID 443746.1]


      Concepts:

      The SGA is a group of shared memory structures, known as SGA components, that contain data and control information for one Oracle Database instance.
      A PGA is a memory region that contains data and control information for a server process. It is nonshared memory created by Oracle Database when a server process is started

      There is one PGA for each server process. Background processes also allocate their own PGAs.

      ASMM: Automatic Shared Memory Management (ASMM) was introduced in 10g. You enable the automatic shared memory management feature by setting the SGA_TARGET parameter to a non-zero value.

      AMM:  Automatic Memory Management is being introduced in 11g

      SGA: contains database buffer cache, redo log buffer, shared pool, large pool, Java pool and streams pool

      The shared pool portion of the SGA contains the library cache, the dictionary cache, the result cache, buffers for parallel execution messages, and control structures.


      Configuration:

      1. AMM in 11g:
      Two new parameters have been introduced named MEMORY_MAX_TARGET and MEMORY_TARGET. To do so (on most platforms), you set only a target memory size initialization parameter (MEMORY_TARGET) and optionally a maximum memory size initialization parameter (MEMORY_MAX_TARGET).

      SQL>ALTER SYSTEM SET MEMORY_MAX_TARGET = 800M SCOPE = SPFILE; (must bounce database if change it)
      SQL>ALTER SYSTEM SET MEMORY_TARGET = 800M SCOPE = SPFILE; (dynamically can be changed actually)
      SQL>ALTER SYSTEM SET SGA_TARGET =0 SCOPE = SPFILE;
      SQL>ALTER SYSTEM SET PGA_AGGREGATE_TARGET = 0 SCOPE = SPFILE;


      if using pfile, modify parameter in pfile as follows:

      MEMORY_MAX_TARGET = 808M
      MEMORY_TARGET = 808M
      SGA_TARGET =0
      PGA_AGGREGATE_TARGET = 0


      a. if you don't specify memory_max_target, it will be same as memory_target by default.
      b. if you don't specify memory_target, only memory_max_target, then memory_target will be 0 by default, then you can change it dynamically after startup instance.
      c. if you also set sga_target or pga_aggregate_target, the values act as minimum values for the sizes of the SGA or instance PGA.

      2. ASMM in 10g

      If you'd like to disable AMM, only enable ASMM, then
      sql>alter system set memory_max_target=0 scope=spfile;

      SQL>Alter system set MEMORY_TARGET=0 scope=both;
      SQL>Alter system set SGA_TARGET=500M scope=both;


      3. manual SGA management

      set SGA_TARGET and MEMORY_TARGET to 0 and set value for other SGA components upto value of SGA_MAX_SIZE.

      sga_target vs sga_max_size , is like memory_taget vs memory_max_target in 11g.

      4. pga_aggregate_target

      With automatic PGA memory management, you set a target size for the instance PGA by defining value for parameter named PGA_AGGREGATE_TARGET and sizing of SQL work areas is automatic and all *_AREA_SIZE initialization parameters are ignored for these sessions

      5. best practise:


      in 11g, set AMM(memory_target and memory_max_target, unset others)
      in 10g, set ASMM(sga_target,sga_max_size and pga_aggregate_target)

      6. References and Commands:
      a. show sga
      b. show parameter sga
      c. show parameter pga
      d. select * from v$sgainfo
      e. ipcs -ma (os command)
      f. select name, value from v$parameter
               where name in ('sga_max_size', 'shared_pool_size', 'db_cache_size',
               'large_pool_size','java_pool_size');
      g. show parameter shared_pool

      How to troubleshoot mysql database server high cpu usage/slowness

      Jephe Wu - http://linuxtechres.blogspot.com

      Objective: find out what is causing mysql database slowness
      Environment: CentOS 6.1 64bit, Mysql 5.1


      Steps:

      1. Firstly find out what's causing server CPU high usage

      Normally, we firstly will notice that server cpu load is high, run 'top' to confirm which process is contributing the cpu high usage, it's mysql or other process. Also, run uptime, vmstat 2 and iostat -x to find out if there's any abnormal situation.

      Here, we need some baseline to compare with, before the problem happens and server is running fine, we should have recorded some baseline information first such as the output of:

      vmstat 2 20
      top -b -n 5
      iostat -x 5 | grep sdb


      2. check mysql error log , slow query log etc from /etc/my.cnf


      log_error                = /srv/mysql/log/error.log
      log_slow_queries        = /srv/mysql/log/mysql-slow.log

      innodb_buffer_pool_size=20000M


      3. mysql> show engine innodb status\G

      look at the end of the output as follows:
      ----------------------
      BUFFER POOL AND MEMORY
      ----------------------
      Total memory allocated 22981924466; in additional pool allocated 1048576
      Dictionary memory allocated 2323288
      Buffer pool size   1280000
      Free buffers       0
      Database pages     1115883
      Modified db pages  12101
      Pending reads 0
      Pending writes: LRU 0, flush list 32, single page 0
      Pages read 2075693, created 11255097, written 1339038405
      0.00 reads/s, 2.98 creates/s, 427.44 writes/s
      Buffer pool hit rate 1000 / 1000
      ----------------

      The bufer pool hit rate shoule be nearly 1000/1000. page write/s is 427.44 for this case.

      The innodb_buffer_pool_size system variable specifies the size of the buffer pool. If your buffer pool is small and you have sufficient memory, making the pool larger can improve performance by reducing the amount of disk I/O needed as queries access InnoDB tables.


      4. show processlist; then 'kill query [number]' to kill process.
      # mysql -uroot -ppassword -e "show processlist" | grep client1_hostname | awk '{print "kill query "$1";"}' | mysql -uroot -ppassword

      How to configure Oracle database PGA_aggregate_target size

      Jephe Wu - http://linuxtechres.blogspot.com

      Objective: understanding the concept of PGA (program global area)
      Environment: Oracle database


      Concepts:

      PGA containis private sql area and sql work area, for detail, please refer to
      http://docs.oracle.com/cd/B19306_01/server.102/b14220/memory.htm#i14490

      1. what's pga_aggregate_target?
      The total/aggregated PGA memory( allocated by each server process attached to an Oracle instance.

      It is a global target for the Oracle instance, and Oracle tries to ensure that the total amount of PGA memory allocated across all database server processes never exceeds this target. The key word in this statement is "tries".

      It is possible that PGA memory will grow beyond the "target".
      With PGA_AGGREGATE_TARGET, sizing of work areas for all dedicated sessions is automatic and all *_AREA_SIZE parameters are ignored for these sessions. Beginning with version 10 the PGA_AGGREGATE_TARGET is also applicable to shared server sessions

      For overview of PGA, refer to http://docs.oracle.com/cd/B19306_01/server.102/b14220/memory.htm#sthref1416

      An important point to understand is that PGA_AGGREGATE_TARGET does not set a hard limit on pga size. It is only a target value used to dynamically size the process work areas.
      It also does not affect other areas of the pga that are allowed to grow beyond this limit. for more detail, refer to FAQ: ORA-4030 [Video] [ID 399497.1] - Why do I see processes growing larger than the PGA_AGGREGATE_TARGET/MEMORY_TARGET/MEMORY_MAX/TARGET?

      Can you limit the size of a process?
      You can take steps to control the size of a process as discussed above.
      However, from within the database framework you cannot place a hard limit on the size of a process by setting any initialization parameters or database configuration.
      You can limit the size of a process from the OS side by setting kernel limits or user shell limits. However, this leads to the ORA-4030 and will cause transaction rollback.

      2. set initial size of pga_aggregate_target

      According to Oracle support - Automatic PGA Memory Management [ID 223730.1]

      To determine the appropriate setting for PGA_AGGREGATE_TARGET  parameter we
      recommend to follow the following steps

      - For OLTP(On-line transactional processing system): PGA_AGGREGATE_TARGET  = (total physical RAM * 80%) * 20%
      e.g. physical RAM is 16G, then (16 G * 80%)*20% ~= 2.5G

      - For DSS(Decision support systems):PGA_AGGREGATE_TARGET  = (total physical RAM * 80%) * 50%

      3. check if a running database has suitable pga_aggregate_target size

      V$PGA_TARGET_ADVICE view predicts how the statistics cache hit percentage and
      over allocation count in V$PGASTAT will be impacted if you change the value of
      the initialization parameter PGA_AGGREGATE_TARGET.

      - STATISTICS_LEVEL. Set this to TYPICAL (the default) or ALL; setting this
        parameter to BASIC turns off generation of PGA performance advice views.

      sql> show parameter statistics_level;

      The following select statement can be used to find this information

      SELECT round(PGA_TARGET_FOR_ESTIMATE/1024/1024) target_mb,
             ESTD_PGA_CACHE_HIT_PERCENTAGE cache_hit_perc,
             ESTD_OVERALLOC_COUNT
      FROM   v$pga_target_advice;


      The output of this query might look like the following:

      TARGET_MB  CACHE_HIT_PERC ESTD_OVERALLOC_COUNT
      ---------- -------------- --------------------
      63         23             367
      125        24             30
      250        30             3
      375        39             0
      500        58             0
      600        59             0
      700        59             0
      800        60             0
      900        60             0
      1000       61             0
      1500       67             0
      2000       76             0
      3000       83             0
      4000       85             0


      or more simple one:
      select * from v$pga_target_advice order by pga_target_for_estimate;


      From the above results we should set the PGA_AGGREGATE_TARGET parameter to a
      value where we avoid any over allocation, so lowest PGA_AGGREGATE_TARGET value we can set is 375 ( where ESTD_OVERALLOC_COUNT is 0)

      The amount of the PGA memory available to an instance can be changed dynamically by
      altering the value of the PGA_AGGREGATE_TARGET parameter making it possible
      to add to and remove PGA memory from an active instance online

      4. database views

      a. $pgastat -This view provides instance-level statistics on the PGA memory usage and
      the automatic PGA memory manager. For example:

      SELECT * FROM V$PGASTAT;

      NAME                                               VALUE
      --------------------------------------------------------
      aggregate PGA target parameter                     524288000 bytes
      aggregate PGA auto target                          463435776 bytes
      global memory bound                                25600 bytes
      total PGA inuse                                    9353216 bytes
      total PGA allocated                                73516032 bytes
      maximum PGA allocated                              698371072 bytes
      total PGA used for auto workareas                  0 bytes
      maximum PGA used for auto workareas                560744448 bytes
      total PGA used for manual workareas                0 bytes
      maximum PGA used for manual workareas              0 bytes
      over allocation count                              0 bytes
      total bytes processed                              4.0072E+10 bytes
      total extra bytes read/written                     3.1517E+10 bytes
      cache hit percentage   

      Meanings for above individual items, please refer to Oracle support - Automatic PGA Memory Management [ID 223730.1]

      b. v$sysstat 
      sql> set line 32767;
      sql> select * from v$sysstat where name like '%work%';

      'workarea memory allocated' - the total amount of PGA memory dedicated to work
      areas allocated in Kb.

      'workarea executions - optimal' - the cumulative count of work areas which had
      an optimal size. For example optimal size is defined if the sort does not need to
      spill to the disk

      'workarea executions - onepass' - the cumulative count of work areas using the
      one pass size. One pass is generally used for big work areas where spilling to
      disk cannot be avoided.

      'workarea executions - multipass' - the cumulative count of work areas running
      in more than one pass. This should be avoided and is the symptom of poorly
      tuned system.

      The following query returns a percentage of work areas used with optimal
      memory size.

      select
      trunc (
             (sum(case when name like 'workarea executions - optimal'
                                             then value else 0 end) *100) /
                     (
                        sum(case when name like 'workarea executions - optimal'  
                                                   then value else 0 end)         +
                        sum(case when name like 'workarea executions - one pass' 
                                                   then value else 0 end)         +
                        sum(case when name like 'workarea executions - multipass'
                                                   then value else 0 end)
                  )
              ) optimal_percent
      from v$sysstat
      where name like 'workarea executions - %'
      /


      c. v$process

      The following three columns in the V$PROCESS view report the PGA memory allocated and used by an Oracle process:

          PGA_USED_MEM
          PGA_ALLOCATED_MEM
          PGA_MAX_MEM

      sql> set line 32767;
      sql> select * from v$process;
      sql> select max(pga_used_mem), max(pga_alloc_mem), max(pga_max_mem) from v$process;


      5. init.ora parameters

      When we set the PGA_AGGREGATE_TARGET  and WORKAREA_SIZE_POLICY to auto, then the *_area_size parameter are automatically ignored and oracle will automatically use the computed value for these parameters.

      Using automatic PGA memory management will help also reducing the possibility of getting ora-4030 errors unless we hit a OS limit, because work area sizes will be controlled and adjusted automatically based on the PGA_AGGGREGATE_TARGET parameter first and then the current work load.

      6. Commands and references:


      show parameter WORKAREA_SIZE_POLICY;
      show parameter pga_aggregate_target;
      show parameter statistics_level;
      SELECT * FROM V$PGASTAT;
      select * from v$sysstat where name like '%work%';
      select * from v$process;
      select * from v$pga_target_advice order by pga_target_for_estimate;


      Oracle process diagram: http://advait.files.wordpress.com/2008/04/oracle_processes1.gif