How to Limit data warehouse DB query to run more than one hour


Jephe Wu - http://linuxtechres.blogspot.com

Objective: to limit a particular schema/user to run not more than one hour
Environment: Oracle 11.2.3.0 enterprise edition 64bit on AIX 7.1


Method 1: Require Oracle Enterprise Edition  - use OEM DB console

1. create a Oracle database user/schema
sqlplus / as sysdba
sql> create user user1 identified by oracle;
sql> grant connect,resource to user1

2.  use Enterprise Manager DB Console, server tab, Consumer Groups

create a special group called 'test'. Then click user button to add 'user1' to this group.

3. go to 'Consumer Group Mappings', this is critical, select 'Oracle user', click on button 'Add Rule for Selected Type', then choose user 'user1'.

4. click on Plan, create a plan called 'myplan', besides 'Other_groups' group, add user 'test' into this plan. Make 'Execution Time Limit(Sec) as 3600, and choose 'Cancel SQL' as action.

5. activate plan 'myplan', only one of the plans can be activated anytime.

Note: if you need to modify 'myplan', you need to deactivate plan first.

6. double check settings from command line

sqlplus test1
SQL> select username,resource_CONSUMER_GROUP,count(*) from v$session group by username,resource_CONSUMER_GROUP;

USERNAME       RESOURCE_CONSUMER_GROUP  COUNT(*)
------------------------------ -------------------------------- ----------
1
DBSNMP       OTHER_GROUPS 3
SYSMAN       OTHER_GROUPS 7
SYS       OTHER_GROUPS 1
      _ORACLE_BACKGROUND_GROUP_ 21
USER1       TEST 1

6 rows selected.

Note: you should see user1 is tied to 'test' resource group, not 'other_groups'.

7. testing (you can use 1 sec for testing purpose , instead of 3600s)

sqlplus test1
SQL> select count(*) from dba_segments, dba_extents, dba_tables;
select count(*) from dba_segments, dba_extents
                     *
ERROR at line 1:
ORA-00040: active time limit exceeded - call aborted

Method 2: Require Oracle Enterprise Edition - use resource manager SQL script CLI, refer to http://www.pythian.com/news/2740/oracle-limiting-query-runtime-without-killing-the-session/


[oracle@oratest bin]$ more resman.sh
#!/bin/sh

sqlplus / as sysdba @resman.sql

sleep 2

sqlplus jephe/jephe @resman2.sql

[oracle@oratest bin]$ more resman.sql
set serverout on size 5555
--
-- first remove an existing active plan
ALTER SYSTEM SET RESOURCE_MANAGER_PLAN ='';
--
-- delete any existing plan or group
-- we have to create a pending area first
exec dbms_resource_manager.clear_pending_area();
exec dbms_resource_manager.create_pending_area();
exec dbms_resource_manager.DELETE_PLAN ('LIMIT_EXEC_TIME');
exec dbms_resource_manager.DELETE_CONSUMER_GROUP ('GROUP_WITH_LIMITED_EXEC_TIME');
exec DBMS_RESOURCE_MANAGER.VALIDATE_PENDING_AREA;

exec DBMS_RESOURCE_MANAGER.SUBMIT_PENDING_AREA();
begin
  dbms_resource_manager.create_pending_area();
  --
  -- we need a consumer group that maps to the desired oracle user:
  dbms_resource_manager.create_consumer_group(
    CONSUMER_GROUP=>'GROUP_WITH_LIMITED_EXEC_TIME',
    COMMENT=>'This is the consumer group that has limited execution time per statement'
    );
  dbms_resource_manager.set_consumer_group_mapping(
    attribute => DBMS_RESOURCE_MANAGER.ORACLE_USER,
    value => 'JEPHE',
    consumer_group =>'GROUP_WITH_LIMITED_EXEC_TIME'
  );

  -- and we need a resource plan:
  dbms_resource_manager.create_plan(
    PLAN=> 'LIMIT_EXEC_TIME',
    COMMENT=>'Kill statement after exceeding total execution time'
  );

  -- now let's create a plan directive for that special user group
  -- the plan will cancel the current SQL if it runs for more than 120 sec
  dbms_resource_manager.create_plan_directive(
    PLAN=> 'LIMIT_EXEC_TIME',
    GROUP_OR_SUBPLAN=>'GROUP_WITH_LIMITED_EXEC_TIME',
    COMMENT=>'Kill statement after exceeding total execution time',
    SWITCH_GROUP=>'CANCEL_SQL',
    SWITCH_TIME=>3600,  # adjust this to your case
    SWITCH_ESTIMATE=>false
  );

  dbms_resource_manager.create_plan_directive(
    PLAN=> 'LIMIT_EXEC_TIME',
    GROUP_OR_SUBPLAN=>'OTHER_GROUPS',
COMMENT=>''
  );

  DBMS_RESOURCE_MANAGER.VALIDATE_PENDING_AREA;
  DBMS_RESOURCE_MANAGER.SUBMIT_PENDING_AREA();

end;
/

exec dbms_resource_manager_privs.grant_switch_consumer_group('JEPHE','GROUP_WITH_LIMITED_EXEC_TIME',false);

exec dbms_resource_manager.set_initial_consumer_group('JEPHE','GROUP_WITH_LIMITED_EXEC_TIME');

-- to enable it:
ALTER SYSTEM SET RESOURCE_MANAGER_PLAN ='LIMIT_EXEC_TIME';

select username,resource_CONSUMER_GROUP,count(*) from v$session group by username,resource_CONSUMER_GROUP;
-------------------------
exit;

[oracle@oratest bin]$ more resman2.sql
select username,resource_CONSUMER_GROUP,count(*) from v$session group by username,resource_CONSUMER_GROUP;
select count(*) from dba_segments, dba_extents,dba_tables;
exit;

[oracle@oratest bin]$ more remove_all.sql
set serverout on size 5555
--
-- first remove an existing active plan
ALTER SYSTEM SET RESOURCE_MANAGER_PLAN ='';
--
-- delete any existing plan or group
-- we have to create a pending area first
exec dbms_resource_manager.clear_pending_area();
exec dbms_resource_manager.create_pending_area();
exec dbms_resource_manager.DELETE_PLAN ('LIMIT_EXEC_TIME');
exec dbms_resource_manager.DELETE_CONSUMER_GROUP ('GROUP_WITH_LIMITED_EXEC_TIME');

Method 3:  use 'alter system kill session' to kill the entire session, not query itself, doesn't require Oracle Enterprise Edition


# cat  /home/oracle/scripts/limit_exec_time.sh
#!/usr/bin/bash

# Variables
export PATH=/usr/bin:/usr/local/bin:/usr/ccs/bin:/u01/app/oracle/product/11.2.0/dbhome_1/bin/
export NLS_DATE_FORMAT='DD-MON-YYYY HH24:MI:SS' # So time is shown in logs as well as date
export ORACLE_SID=oradb

ORAENV_ASK=NO
. oraenv
unset ORAENV_ASK

export PREPATH=/home/oracle/scripts

sqlplus / as sysdba @/home/oracle/scripts/query_exec_time.sql
sleep 1
grep alter $PREPATH/limit_exec_time.sql 
if [ $? -eq 0 ];then
sqlplus / as sysdba @$PREPATH/limit_exec_time.sql > $PREPATH/result.txt 2>&1
mail -s "session killed" jwu@domain.com < $PREPATH/result.txt 
fi



oracle@oradev1 (oradb)> cat query_exec_time.sql 
set head off feedback off echo off term off pagesize 0  linesize 32767 trimsp on tab off;
spool /home/oracle/scripts/limit_exec_time.sql
select 'select sid,sql_text from v$sqltext , v$session where v$sqltext.address = v$session.sql_address and sid in ' || sid || ';' from v$session where status='ACTIVE' AND username ='JEPHE' and last_call_et/120 > 1;
select 'alter system kill session ''' || sid ||','|| serial# || ''';' from v$session where status='ACTIVE' AND username ='JEPHE' and last_call_et/1800 > 1;
select 'exit;' from dual;
spool off;
exit;


* * * * *  /home/oracle/scripts/limit_exec_time.sh 

Note: above scripts will detect user 'JEPHE' , kill the session which run statement more than 1800s (half hour).


References:

1.  v$session explanation

LAST_CALL_ET NUMBER If the session STATUS is currently ACTIVE, then the value represents the elapsed time in seconds since the session has become active.
If the session STATUS is currently INACTIVE, then the value represents the elapsed time in seconds since the session has become inactive.

2. long running sessions




4. Should Sessions be Killed in OS or Using Alter System Kill Session? [ID 161794.1]
Do not kill the sessions at the OS level.
Use ALTER SYSTEM KILL SESSION 'sid, serial#';   

This acts in the following manner:

1. It terminates a session, rolls back ongoing transactions, releases all 
session locks, frees all session resources.

2. If the session is performing some activity that must be completed (eg 
waiting for a reply from a remote database or rolling back a transaction), 
Oracle waits for this activity to complete, kills the session then returns 
control.

3).If the wait lasts for 60 seconds then Oracle marks the session to be killed, 
and returns control with a message that the session is marked to be killed. It 
then gets killed when the activity is complete.

and 

IMMEDIATE Specify IMMEDIATE to instruct Oracle Database to roll back ongoing transactions, release all session locks, recover the entire session state, and return control to you immediately.'

6. search 'cancel long run query' in oracle support

Method 4: use event to simulate ^C operation to cancel SQL only, not entire session. It will cancel the whole session of PL/SQL, not inside individual statement. For normal scipts, it will cancel individual statement.

*/5 * * * * /home/oracle/bin/cancel_query/limit_exec_time.sh > /dev/null 2>&1


oracle@ora1 > cat /home/oracle/bin/cancel_query/limit_exec_time.sh
#!/usr/bin/bash
# purpose: detect if there's any statement running more than 1 hours under user FINANCE, cancel those SQL statementes if any.
# Variables
export PATH=/usr/bin:/usr/local/bin:/u01/app/oracle/product/11.2.0/bin
export NLS_DATE_FORMAT='DD-MON-YYYY HH24:MI:SS' # So time is shown in logs as well as date
export ORACLE_SID=PROD
ORAENV_ASK=NO
. oraenv
unset ORAENV_ASK

export PREPATH=/home/oracle/bin/cancel_query
rm -f $PREPATH/limit_exec_time.sql
sqlplus / as sysdba @$PREPATH/query_exec_time.sql
sleep 1
fgrep -q ',' $PREPATH/limit_exec_time.sql 
if [ $? -eq 0 ];then

cd $PREPATH

while read line
do 
SID=`echo $line | awk -F',' '{print $1}'`
SERIAL=`echo $line | awk -F',' '{print $2}'`
sed -e "s#SID#$SID#g" -e "s#SERIAL#$SERIAL#g" template.sql > 1.sql
sqlplus / as sysdba @$PREPATH/1.sql >/dev/null 2>&1
mail -s "query cancelled" jephe@domain.com < $PREPATH/limit_exec_time.sql
done  < limit_exec_time.sql

fi


oracle@ora1 > cat query_exec_time.sql 
set head off feedback off echo off term off pagesize 0  linesize 32767 trimsp on tab off;
spool /home/oracle/bin/cancel_query/limit_exec_time.sql
select sid ||','|| serial# from v$session where status='ACTIVE' AND username ='FINANCE' and last_call_et/3600 > 1;
spool off;
exit;


$ cat /home/oracle/bin/cancel_query/template.sql


DECLARE
  l_status  v$session.status%TYPE;
BEGIN

  dbms_system.set_ev( SID, SERIAL, 10237, 1, '');

  LOOP
    SELECT status INTO l_status FROM v$session
      WHERE sid = SID and serial# = SERIAL;
    EXIT WHEN l_status='INACTIVE';
  END LOOP;

  dbms_system.set_ev( SID, SERIAL, 10237, 0, '');
END;
/
exit;


8. References:

http://docs.oracle.com/cd/E11882_01/server.112/e25494/dbrm001.htm
http://www.pythian.com/news/2740/oracle-limiting-query-runtime-without-killing-the-session/
http://www.oracle.com/technetwork/database/performance/resource-manager-twp-133705.pdf
Oracle University course materials

How to quickly use vnc to install Oracle database

Jephe Wu - http://linuxtechres.blogspot.com

Objective: use simplest/fastest way to install Oracle database


Steps:

1. read http://www.oracle.com/technetwork/articles/servers-storage-admin/ginnydbinstallonlinux6-1845247.html  for how to install Oracle on OL6

For OL 5, the package is oracle-validated rpm. and please refer to
http://www.oracle.com/technetwork/articles/servers-storage-admin/ginnydbinstallonlinux-488779.html

2. install vnc-server for OL5 and tigervnc-server for OL6, and install xterm package so that we can have a xterminal window after vnc

Note: xterm rpm package doesn't depend on X11 server, it requires a few tools and fonts package.

If you didn't choose 'X Window System' during installing Oracle Linux 6 or RHEL 6, you have to install xorg-x11-twm package also for having window manager in vnc for dbca to work.

3. run 'vncserver' after login as 'oracle' user

4.  run 'vncpasswd' to set vnc password for 'oracle' user

5. run installer from xterm terminal windows after vnc into the server

6.  run 'dbca', 'emca' etc to create database and Enterprise Manager database control if necessary.

7. make sure /etc/hosts has correct ip address and hostname entry, then run the following to create EM.

emca -config dbcontrol db -repos recreate


How to setup efficient rman backup in Oracle database 11g


Jephe Wu - http://linuxtechres.blogspot.com

Environment:  Oracle database 11.2.0.3 Enterprise edition 64bit.
Objective: create rman backup script to backup database plus archivelog and keep 3 days.


Steps/scripts:

1. Environment variables
[oracle@db01 fast_recovery_area]$ more /usr/local/bin/oraset 
export PATH=$PATH:/u01/app/oracle/product/11.2.0/db_1/bin:/home/oracle/bin
export NLS_DATE_FORMAT='DD-MON-YYYY HH24:MI:SS' 
export ORACLE_SID=orcl

ORAENV_ASK=NO
. /usr/local/bin/oraenv
unset ORAENV_ASK

2. cronjobs 
[root@db01 bin]# more /etc/cron.d/rman 
#!/bin/sh
MAILTO=jephe.wu@domain.com
0 23 * * * oracle /home/oracle/bin/rman.sh

3. scripts
[root@db01 bin]# more /home/oracle/bin/rman.sh

#!/bin/bash
. /usr/local/bin/oraset
MAIL_LIST='jwu@domain.com'

export TIMESTAMP=`date +%Y%m%d%H%M`
export BACKUP_DIR=/u01/orarman
export LOG=$BACKUP_DIR/rman_full_${TIMESTAMP}.log
export week=`date +%w`

find $BACKUP_DIR/rman_full*.log -type f -mtime +7 -exec rm -f {} \;

export ORACLE_HOME=/u01/app/oracle/product/11.2.0/db_1 # this is important, otherwise the next line $ORACLE_HOME/bin/rman will be invalid.

$ORACLE_HOME/bin/rman log=${LOG} <<EOF

connect target /
set echo on;
show all;
report unrecoverable;

# refer to http://docs.oracle.com/cd/B14117_01/server.101/b10734/rcmconc3.htm for explanation
# Run CROSSCHECK to change the status of these files to EXPIRED and then run DELETE EXPIRED to delete the records from the RMAN repository
# To ensure that data about backups in the recovery catalog or control file is synchronized with actual files on disk or in the media management catalog, perform a crosscheck. 
# The CROSSCHECK command operates only on files that are recorded in the RMAN repository.
# Update outdated information about backups that disappeared from disk or tape or became corrupted, maybe due to umount a partition or something.
# Update the repository if you delete archived redo logs or other files with operating system commands
# crosscheck command itself doesn't delete record, you must use delete command to remove the record for the expired file from the repository
crosscheck backup;
delete force noprompt expired backup;

crosscheck copy;
delete force noprompt expired copy;

crosscheck archivelog all;
delete noprompt expired archivelog all;

use CONTROLFILE_RECORD_KEEP_TIME=7 by default to record those backup history 

# You can specify that DELETE should remove backups that are EXPIRED or OBSOLETE. 
# if you run DELETE EXPIRED on a backup that exists, RMAN issues a warning and does not delete the backup. You can override this behavior and delete the backup by running DELETE FORCE.

# Configure environment, redundant to run each time, but ensures set correctly.
CONFIGURE DEFAULT DEVICE TYPE TO DISK;
CONFIGURE BACKUP OPTIMIZATION CLEAR;
CONFIGURE DEVICE TYPE DISK PARALLELISM 2 BACKUP TYPE TO BACKUPSET;
CONFIGURE CHANNEL DEVICE TYPE DISK FORMAT='/u01/orarman/%n_%U.bu' MAXPIECESIZE 20G;
CONFIGURE RETENTION POLICY TO RECOVERY WINDOW OF 5 DAYS;
Note: The Retention Policy applies to Backups. It does not apply to ArchiveLogs.
The ArchiveLog Deletion Policy applies to ArchiveLogs. It does not apply to Backups.
CONFIGURE ARCHIVELOG DELETION POLICY TO BACKED UP 1 TIMES TO DISK;
CONFIGURE CONTROLFILE AUTOBACKUP ON;
CONFIGURE CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE DISK TO '/u01/orarman/%n_%F';

# Backup datafiles, archive redo logs, and control file
backup as compressed backupset incremental level=0 database plus archivelog;
or
backup database plus archivelog;

# for backup ... delete [all] input if any
# If you specify DELETE INPUT (without ALL), then RMAN deletes only the specific files that it backs up. 
# If you specify ALL INPUT, then RMAN deletes all copies of the files recorded in the RMAN repository.


# Always use DELETE command within RMAN to remove RMAN backups, rather than an operating system or media manager utility or command
# Otherwise, the RMAN repository can contain records of backups that are no longer available for use in restore operations.

delete force noprompt obsolete;   # only delete obsolete backups according to retention policy, not archivelogs.
delete noprompt archivelog all backed up 1 times to device type disk;   # only for archivelog, not for backups.

# backup controlfile to trace
sql 'ALTER DATABASE BACKUP CONTROLFILE TO TRACE';
or
sql "ALTER DATABASE BACKUP CONTROLFILE TO TRACE AS ''/u01/orarman/controlfile.backup.${week}.sql'' reuse";
sql "CREATE PFILE=''/u01/orarman/spfile.backup'' FROM SPFILE";


# now do a trial restore to see if all is ok
restore database validate check logical;
restore controlfile to '/tmp/' validate check logical;
restore spfile validate check logical;
restore archivelog from time 'sysdate-5' validate check logical;

Note: since we used the recovery windows as 5 days in the beginninig, so we use use 'sysdate-5' here.

# now do a checkup for running database
validate database include current controlfile plus archivelog;
----------
For 10g, use below to validate, 10g doesn't support 'validate database' command to check running database.

restore database validate check logical;
restore archivelog from time 'sysdate-2' validate;
restore controlfile validate;
restore spfile validate;
-----------------------------------


# report 
report unrecoverable;

Note: for datawarehouse, it's critical there's no nolog operation during rman backup period.
#report schema;
#list backup summary;
#list backupset;
#list backup of controlfile;
exit;
EOF

if [ $? -ne "0" ]; then
   mailx -s "rman full backup for `hostname` failed" $MAIL_LIST < $LOG
else
   mailx -s "rman full backup for `hostname` successful" $MAIL_LIST < $LOG
fi

2. archivelog retention policy and auto deletion

Based on page http://docs.oracle.com/cd/B28359_01/backup.111/b28270/rcmconfb.htm  for archivelog retention.

Archived redo logs can be deleted automatically by the database or as a result of user-initiated RMAN commands. Only logs in the flash recovery area can be deleted automatically by the database. For archived redo log files in the flash recovery area, the database retains them as long as possible and automatically deletes eligible logs when additional disk space is required. You can manually delete eligible logs from any location, whether inside or outside the flash recovery area, when you issue BACKUP ... DELETE INPUT or DELETE ARCHIVELOG.

The archived redo log deletion policy is configured to NONE by default. In this case, RMAN considers archived redo log files in the recovery area as eligible for deletion if they meet both of the following conditions:

You can use the CONFIGURE ARCHIVELOG DELETION POLICY BACKED UP integer TIMES TO DEVICE TYPE command to enable an archived log deletion policy. This configuration specifies that archived logs are eligible for deletion only when the specified number of archived log backups exist on the specified device type.

Refer to http://www.fatihacar.com/blog/delete-archivelog-using-rman-in-oracle/ as follows:

Archivelog List Commands
RMAN>list archivelog all;
RMAN>list copy of archivelog until time ‘SYSDATE-10′;
RMAN>list copy of archivelog from time ‘SYSDATE-10′
RMAN>list copy of archivelog from time ‘SYSDATE-10′ until time ‘SYSDATE-2′;
RMAN>list copy of archivelog from sequence 1000;
RMAN>list copy of archivelog until sequence 1500;
RMAN>list copy of archivelog from sequence 1000 until sequence 1500;
Archivelog Delete Commands
RMAN>delete archivelog all;
RMAN>delete archivelog until time ‘SYSDATE-10′;
RMAN>delete archivelog from time ‘SYSDATE-10′
RMAN>delete archivelog from time ‘SYSDATE-10′ until time ‘SYSDATE-2′;
RMAN>delete archivelog from sequence 1000;
RMAN>delete archivelog until sequence 1500;
RMAN>delete archivelog from sequence 1000 until sequence 1500;

Note : Also, you can use noprompt statement for do not yes-no question.
RMAN>delete noprompt archivelog until time ‘SYSDATE-10′;

RMAN>delete noprompt archivelog all backed up 1 times to device type disk;



3. how to use rman to restore archive log

RMAN> run {
set archivelog destination to ‘/tmp’;
restore archivelog from logseq=6 until logseq=7;
}

#######
How to check datafile block corruption?

rman> backup validate check logical datafile 3;
rman> backup validate check logical database;

The CHECK LOGICAL option checks for both PHYSICAL and LOGICAL Block corruptions

sql> select * from v$database_block_corruption;

References:
1. How to identify all the Corrupted Objects in the Database with RMAN [ID 472231.1]
2. Master Note for Handling Oracle Database Corruption Issues [ID 1088018.1]
3. How to Check Archivelogs for Corruption using RMAN [ID 377146.1]

4. 



If you simply want to check for corruption but not actually create a backuppiece then use the 'validate' option:
RMAN>backup validate archivelog all;
4. How To Check (Validate) If RMAN Backup(s) Are Good [ID 338607.1]

Example: Refer to Oracle support:

11.2 RMAN Block recovery syntax
=========================

In 11.2 you can use RMAN to:

1) validate all database files and archived redo log files for physical and logical corruption:

BACKUP VALIDATE CHECK LOGICAL
  DATABASE ARCHIVELOG ALL;

2) to check individual data blocks, as shown in the following example:

VALIDATE DATAFILE 4 BLOCK 10 TO 13;

3) validate backup sets:

VALIDATE BACKUPSET 3;

You specify backup sets by primary key, which is shown in the output of the LIST BACKUP command.


The following RMAN command recovers the corrupted blocks:


1) recover all corrupted blocks reported in v$database_block_corruption

RMAN> RECOVER CORRUPTION LIST;

2) recover individual blocks, see eg:

RMAN> RECOVER DATAFILE 1 BLOCK 233, 235 DATAFILE 2 BLOCK 100 TO 200;