Archive

Archive for December, 2008

Simulating ASM with 10g Database by faking the hardware

December 27th, 2008 sam No comments

Simulating Asm by faking hardware

1. Faking Hardware

2. Instaling ASM Lib

3. Configuring the disks

4. Install DB & ASM instance

Faking Hardware:

Step 1) Here we create 4 zero-filed files using the DD comand

As Root run the following commands, this will create 4 files of 2Gb each

mkdir /asmdisk

dd if=/dev/zero of=/asmdisk/disk1 bs=1024k count=2000

dd if=/dev/zero of=/asmdisk/disk2 bs=1024k count=2000

dd if=/dev/zero of=/asmdisk/disk3 bs=1024k count=2000

dd if=/dev/zero of=/asmdisk/disk4 bs=1024k count=2000

Step 2) Use the loopback device to assign these disks

/sbin/losetup /dev/loop1 /asmdisk/disk1

/sbin/losetup /dev/loop2 /asmdisk/disk2

/sbin/losetup /dev/loop3 /asmdisk/disk3

/sbin/losetup /dev/loop4 /asmdisk/disk4

Now we need to configure entries to the file “/etc/rc.local” so that these divices are

reinitialised on reboot also.

Add the following entries to the file “/etc/rc.local”

/sbin/losetup /dev/loop1 /asmdisk/disk1

/sbin/losetup /dev/loop2 /asmdisk/disk2

/sbin/losetup /dev/loop3 /asmdisk/disk3

/sbin/losetup /dev/loop4 /asmdisk/disk4

/etc/init.d/oracleasm createdisk ASM1 /dev/loop1

/etc/init.d/oracleasm createdisk ASM2 /dev/loop2

/etc/init.d/oracleasm createdisk ASM3 /dev/loop3

/etc/init.d/oracleasm createdisk ASM4 /dev/loop4

Instaling ASM Lib

ASMLib is a support library for the Automatic Storage Management feature of Oracle

Database 10

g. ASMLib allows an Oracle Database using ASM more efficient and

capable access to the disk groups it is using. ASMlib are provided by Oracle from below

link : http://www.oracle.com/technology/tech/linux/asmlib/index.html

Pls download the Lib files accoring to your OS & kernel version.

[root@csstage root]# uname -r

2.4.21-27.EL

It gives 3 rpms for download as per the version , pls install then using below command

[root@csstage asm]# rpm -ivh *.rpm

After this completes we go to next step of configuring the disks.

Configuring the ASM Lib & disks

Now we need to use the ASM Lib , configure it & configure the disks accordingly

[root@csstage root]# /etc/init.d/oracleasm configure

Configuring the Oracle ASM library driver.

This will configure the on-boot properties of the Oracle ASM library

driver. The following questions will determine whether the driver is

loaded on boot and what permissions it will have. The current values

will be shown in brackets (‘[]‘). Hitting <ENTER> without typing an

answer will keep that current value. Ctrl-C will abort.

Default user to own the driver interface []: oracle

Default group to own the driver interface []: oinstall

Start Oracle ASM library driver on boot (y/n) [n]: y

Fix permissions of Oracle ASM disks on boot (y/n) [y]: y

Writing Oracle ASM library driver configuration: [ OK ]

Creating /dev/oracleasm mount point: [ OK ]

Loading module “oracleasm”: [ OK ]

Mounting ASMlib driver filesystem: [ OK ]

Scanning system for ASM disks: [ OK ]

Now as the ASM Lib are configured, we will configure the disks

[root@csstage root]# /etc/init.d/oracleasm createdisk ASM1 /dev/loop1

Marking disk “/dev/loop1″ as an ASM disk: [ OK ]

[root@csstage root]# /etc/init.d/oracleasm createdisk ASM2 /dev/loop2

Marking disk “/dev/loop2″ as an ASM disk: [ OK ]

[root@csstage root]# /etc/init.d/oracleasm createdisk ASM3 /dev/loop3

Marking disk “/dev/loop3″ as an ASM disk: [ OK ]

[root@csstage root]# /etc/init.d/oracleasm createdisk ASM4 /dev/loop4

Marking disk “/dev/loop4″ as an ASM disk: [ OK ]

So now our hardware is all set to go & we need to install 10g database on the server using

the above disks.

Install DB & ASM instance

Now we install the Database with ASM option

Create the ASM Instance

ASM runs as a separate Oracle instance, which can be created and configured using OUI.

Now that ASMLib is installed and the disks are marked for use, you can create an ASM

instance.

Log in as oracle and start runInstaller:

$ ./runInstaller

1. Select Installation Method

o Select Advanced Installation

o Click on Next

2. Specify Inventory Directory and Credentials

o Inventory Directory: /u01/app/oracle/oraInventory

o Operating System group name: oinstall

o Click on Next

3. Select Installation Type

o Select Enterprise Edition

o Click on Next

4. Specify Home Details

o Name: Ora10gAsm

o Path: /u01/app/oracle/product/10.2.0/asm

Note:Oracle recommends using a different ORACLE_HOME for ASM

than the ORACLE_HOME used for the database for ease of

administration.

o Click on Next

5. Product-specific Prerequisite Checks

o If you’ve been following the steps in this guide, all the checks should pass

without difficulty. If one or more checks fail, correct the problem before

proceeding.

o Click on Next

6. Select Configuration Option

o Select Configure Automatic Storage Management (ASM)

o Enter the ASM SYS password and confirm

o Click on Next

7. Configure Automatic Storage Management

o Disk Group Name: DATA

o Redundancy

- High mirrors data twice.

- Normal mirrors data once. This is the default.

- External does not mirror data within ASM. This is typically used if an

external RAID array is providing redundancy.

o Add Disks

The disks you configured for use with ASMLib are listed as Candidate

Disks. Select each disk you wish to include in the disk group.

o Click on Next

8. Summary

o A summary of the products being installed is presented.

o Click on Install.

9. Execute Configuration Scripts

o At the end of the installation, a pop up window will appear indicating

scripts that need to be run as root. Login as root and run the indicated

scripts.

o Click on OK when finished.

10. Configuration Assistants

o The Oracle Net, Oracle Database, and iSQL*Plus configuration assistants

will run automatically

11. End of Installation

o Make note of the URLs presented in the summary, and click on Exit when

ready.

12. Congratulations! Your new Oracle ASM Instance is up and ready for use.

Create the Database

Once the ASM instance has been created, create a database that uses ASM for storage:

Log in as oracle and start runInstaller:

$ ./runInstaller

1. Select Installation Method

o Select Advanced Installation

o Click on Next

2. Select Installation Type

o Select Enterprise Edition

o Click on Next

3. Specify Home Details

o Name: OraDb10g

o Path: /u01/app/oracle/product/10.2.0/db

Note:Oracle recommends using a different ORACLE_HOME for the

database than the ORACLE_HOME used for ASM.

o Click on Next

4. Product-specific Prerequisite Checks

o If you’ve been following the steps in this guide, all the checks should pass

without difficulty. If one or more checks fail, correct the problem before

proceeding.

o Click on Next

5. Select Configuration Option

o Select Create a Database

o Click on Next

6. Select Database Configuration

o Select General Purpose

o Click on Next

7. Specify Database Configuration Options

o Database Naming: Enter the Global Database Name and SID

o Database Character Set: Accept the default

o Database Examples: Select Create database with sample schemas

o Click on Next

8. Select Database Management Option

o Select Use Database Control for Database Management

o Click on Next

9. Specify Database Storage Option

o Select Automatic Storage Management (ASM)

o Click on Next

10. Specify Backup and Recovery Options

o Select Do not enable Automated backups

o Click on Next

11. Select ASM Disk Group

o Select the DATA disk group created in the previous section

o Click on Next

12. Specify Database Schema Passwords

o Select Use the same password for all the accounts

o Enter the password and confirm

o Click on Next

13. Summary

o A summary of the products being installed is presented.

o Click on Install.

14. Configuration Assistants

o The Oracle Net, Oracle Database, and iSQL*Plus configuration assistants

will run automatically

15. Execute Configuration Scripts

o At the end of the installation, a pop up window will appear indicating

scripts that need to be run as root. Login as root and run the indicated

scripts.

o Click on OK when finished.

16. End of Installation

o Make note of the URLs presented in the summary, and click on Exit when

ready.

17. Congratulations! Your new Oracle Database is up and ready for use.

Note : In between the installation , You might be asked to start CSS Deamon , if you

havent already done

To start the CSS daemon and configure the host to always start the daemon upon reboot,

do the following:

1. Log in to the host as root.

2. Ensure that $ORACLE_HOME/bin is in your PATH environment variable.

3. Enter the following command: localconfig add

Start the ASM instance:

$ export ORACLE_SID=+ASM

$ sqlplus “/ as sysdba”

SQL*Plus: Release 10.2.0.1.0 – Production on Sun Sep 3 00:28:09 2006

Copyright (c) 1982, 2005, Oracle. All rights reserved.

Connected to an idle instance.

SQL> startup

ASM instance started

Total System Global Area 83886080 bytes

Fixed Size 1217836 bytes

Variable Size 57502420 bytes

ASM Cache 25165824 bytes

ASM diskgroups mounted

SQL> select group_number,disk_number,name,state,mode_status,mount_status,total_mb,

from v$asm_disk;

G# D# NAME STATE MODE_ST MOUNT_S TOTAL_MB

—- ———- ———-

1 0 ASM1 NORMAL ONLINE CACHED 2000

1 1 ASM2 NORMAL ONLINE CACHED 2000

1 2 ASM3 NORMAL ONLINE CACHED 2000

1 3 ASM4 NORMAL ONLINE CACHED 2000

1 4 ASM5 NORMAL ONLINE CACHED 2000

2 0 ASM6 NORMAL ONLINE CACHED 2000

2 1 ASM7 NORMAL ONLINE CACHED 1000

Categories: DBA Tags:

Hide network and local drives on windows machine

December 24th, 2008 ather No comments

To disable the display of local or networked drives when you click My Computer.

1.Go to start->run.Type regedit

2) Go to HKEY_CURRENT_USER\Software\M*cro$oft\Windows\CurrentVersion\Policies\Explorer

In the right pane create a new DWORD item and name it NoDrives (it is case sensitive). Now modify it’s value and set it to 3FFFFFF (Hexadecimal) . Restart your computer. My Computer will display no drives.

To enable display of drives in My Computer, delete the DWORD item that you created.

Restart your computer.You can now see all the drives again. :)

Categories: Uncategorized Tags:

Operations on Redo Log’s Files

December 24th, 2008 sam No comments

Steps for Renaming Redo Log Members

Shut down the database.

1.             SHUTDOWN

2. Copy the redo log files to the new location.Operating system files, such as redo log members, must be copied using the appropriate operating system commands. See your operating system specific documentation for more information about copying files.

The following example uses operating system commands (UNIX) to move the redo log members to a new location:

mv /diska/logs/log1a.rdo /diskc/logs/log1c.rdo
mv /diska/logs/log2a.rdo /diskc/logs/log2c.rdo

3. Startup the database, mount, but do not open it.

4.             CONNECT / as SYSDBA
5.             STARTUP MOUNT

6. Rename the redo log members.

Use the ALTER DATABASE statement with the RENAME FILE clause to rename the database redo log files.

ALTER DATABASE
  RENAME FILE '/diska/logs/log1a.rdo', '/diska/logs/log2a.rdo'
           TO '/diskc/logs/log1c.rdo', '/diskc/logs/log2c.rdo';

7. Open the database for normal operation.

The redo log alterations take effect when the database is opened.

ALTER DATABASE OPEN;

Dropping Log Groups

To drop a redo log group, you must have the ALTER DATABASE system privilege. Before dropping a redo log group, consider the following restrictions and precautions:

· An instance requires at least two groups of redo log files, regardless of the number of members in the groups. (A group comprises one or more members.)

· You can drop a redo log group only if it is inactive. If you need to drop the current group, first force a log switch to occur.

· Make sure a redo log group is archived (if archiving is enabled) before dropping it. To see whether this has happened, use the V$LOG view.

·                SELECT GROUP#, ARCHIVED, STATUS FROM V$LOG;

·                   GROUP# ARC STATUS
·                --------- --- ----------------
·                        1 YES ACTIVE
·                        2 NO  CURRENT
·                        3 YES INACTIVE
·                        4 YES INACTIVE

Drop a redo log group with the SQL statement ALTER DATABASE with the DROP LOGFILE clause.

The following statement drops redo log group number 3:

ALTER DATABASE DROP LOGFILE GROUP 3;

When a redo log group is dropped from the database, and you are not using the Oracle-managed files feature, the operating system files are not deleted from disk. Rather, the control files of the associated database are updated to drop the members of the group from the database structure. After dropping a redo log group, make sure that the drop completed successfully, and then use the appropriate operating system command to delete the dropped redo log files.

When using Oracle-managed files, the cleanup of operating systems files is done automatically for you.

Dropping Redo Log Members

To drop a redo log member, you must have the ALTER DATABASE system privilege. Consider the following restrictions and precautions before dropping individual redo log members:

· It is permissible to drop redo log files so that a multiplexed redo log becomes temporarily asymmetric. For example, if you use duplexed groups of redo log files, you can drop one member of one group, even though all other groups have two members each. However, you should rectify this situation immediately so that all groups have at least two members, and thereby eliminate the single point of failure possible for the redo log.

· An instance always requires at least two valid groups of redo log files, regardless of the number of members in the groups. (A group comprises one or more members.) If the member you want to drop is the last valid member of the group, you cannot drop the member until the other members become valid. To see a redo log file status, use the V$LOGFILE view. A redo log file becomes INVALID if the database cannot access it. It becomes STALE if the database suspects that it is not complete or correct. A stale log file becomes valid again the next time its group is made the active group.

· You can drop a redo log member only if it is not part of an active or current group. If you want to drop a member of an active group, first force a log switch to occur.

· Make sure the group to which a redo log member belongs is archived (if archiving is enabled) before dropping the member. To see whether this has happened, use the V$LOG view.

To drop specific inactive redo log members, use the ALTER DATABASE statement with the DROP LOGFILE MEMBER clause.

The following statement drops the redo log /oracle/dbs/log3c.rdo:

ALTER DATABASE DROP LOGFILE MEMBER '/oracle/dbs/log3c.rdo';

When a redo log member is dropped from the database, the operating system file is not deleted from disk. Rather, the control files of the associated database are updated to drop the member from the database structure. After dropping a redo log file, make sure that the drop completed successfully, and then use the appropriate operating system command to delete the dropped redo log file.

To drop a member of an active group, you must first force a log switch.

Forcing Log Switches

A log switch occurs when LGWR stops writing to one redo log group and starts writing to another. By default, a log switch occurs automatically when the current redo log file group fills.

You can force a log switch to make the currently active group inactive and available for redo log maintenance operations. For example, you want to drop the currently active group, but are not able to do so until the group is inactive. You may also wish to force a log switch if the currently active group needs to be archived at a specific time before the members of the group are completely filled. This option is useful in configurations with large redo log files that take a long time to fill.

To force a log switch, you must have the ALTER SYSTEM privilege. Use the ALTER SYSTEM statement with the SWITCH LOGFILE clause.

The following statement forces a log switch:

ALTER SYSTEM SWITCH LOGFILE;

Redo Log File Status

To see the status of the redo log file there are two dynamic views are there.
A)V$LOG
B)V$LOGFILE

A)V$LOG:
———-
V$LOG displays redo log file information from the control file.

Status:
——–
1)UNUSED – Online redo log has never been written to. This is the state of a redo log that was just added, or just after a RESETLOGS, when it is not the current redo log.

2)CURRENT – Current redo log. This implies that the redo log is active. The redo log could be open or closed.

3)ACTIVE – Log is active but is not the current log. It is needed for crash recovery. It may be in use for block recovery. It may or may not be archived.

4)CLEARING – Log is being re-created as an empty log after an ALTER DATABASE CLEAR LOGFILE statement. After the log is cleared, the status changes to UNUSED.

5)CLEARING_CURRENT – Current log is being cleared of a closed thread. The log can stay in this status if there is some failure in the switch such as an I/O error writing the new log header.

6)INACTIVE – Log is no longer needed for instance recovery. It may be in use for media recovery. It might or might not be archived.

B)V$LOGFILE:
——————
V$LOGFILE view contains information about redo log files.
1)INVALID – The file is corrupted or missing.
2)STALE -This redo log file member is new and has never been used.
3)DELETED -The file is no longer being used.
4) -The redo log file is in use and is not corrupted.

Categories: DBA Tags:

Make Firefox 10 times faster — No Software needed

December 23rd, 2008 ather 1 comment


1. Type “about:config” into the address bar and hit enter. Scroll down and look for the following entries:

2. Alter the entries as follows:

Set “network.http.pipelining” to “true
Set “network.http.proxy.pipelining” to “true

set “network.http.pipelining.maxrequests” to some number like 50. This means it will make 50 requests at once.

3. Lastly right-click anywhere and select New-> Integer. Name it “nglayout.initialpaint.delay” and set its value to “0“. This value is the amount of time the browser waits before it acts on information it recieves.

Categories: Uncategorized Tags:

Statspack in Brief

December 18th, 2008 sam No comments

Database Statistics

Statspack is different from BSTAT/ESTAT

  1. Statspack collects more data, including high-resource SQL
  2. It precalculates many ratios such as cache hit ratios, rates and transaction statistics
  3. It uses permanent tables owned by PERFSTAT user to store performance stats
  4. It separates data collection from report generation
  5. It makes data collection easy to automate using either DBMS_JOB or an OS utility to schedule collection
  6. tasks

Different levels of Statspack for monitoring:

level 0 General Performance Stats : Wait stats, system events, system stats, rollback seg
data, row cache, SGA, bg events, lock stats, buffer pool stats and parent latch stats

level 5 SQL statements with high level resource usage. Larger the shared pool more time it takes to

take a snapshot.

  1. No of executions of the SQL statement. Default: 100
  2. No of disk reads performed by the SQL statement. Default: 1,000
  3. No of parse calls performed by the SQL statement. Default: 1,000
  4. No of buffer gets performed by the SQL statement. Default: 10,000
  5. Size of the sharable memory used by the SQL statement. Default: 1 MB
  6. Version count for the SQL statement. Default: 20

level 6 All the statistics + SQL Plans & SQL Plan usage. Pre-requisite is, the plan of the statement

which is to be gathered has to be in the shared pool at the time that snapshot is taken and it

must exceed one of the SQL thresholds & specify the executions thresholds to 0 for those

snapshots.

level 7 Lists all the segs that are heavily accessed and contended, thus, modifying the physical layout

of some seg or of the tablespaces they reside in.

  1. Logical Reads
  2. DB block changes
  3. Physical Reads. Default: 10,000
  4. Physical Writes. Default: 1,000
  5. Physical Reads direct
  6. Physical Writes direct
  7. Global cache consistent read blocks served (RAC specific). Default: 1,000
  8. Global cache current blocks served (RAC specific). Default: 1,000
  9. Buffer busy waits. Default: 100
  10. ITL waits. Default: 100
  11. Row lock waits. Default: 100

level 10 Parent & child latch information, session specific stats can be calculated by specifying the

session id in the call to the Statspack.

To install Statspack run the following script: SPCREATE.SQL

To remove Statspack run the following script : SPDROP.SQL

Statspack reporting script: SPREPORT.SQL

Statspack reporting script for the specific SQL hash value specified: SPREPSQL.SQL

Statspack reporting script for the database and instance specified: SPREPINS.SQL

Statspack script for automating Statspack stats collection: SPAUTO.SQL

Statspack script for converting data from 9.0 to 9.2. Before running this backup the schema: SPUP90.SQL

Statspack script for upgrading data from 8.1.7: SPUP817.SQL

Statspack script for upgrading data from 8.1.6: SPUP816.SQL

Statspack script for purging a limited range of snapshot IDs for a given database instance: SPPURGE.SQL

Statspack script for truncating all performance data in Statspack tables: SPTRUNC.SQL

Statspack documentation: SPDOC.TXT

How to install and take a snapshot:

Set timed_statistics to TRUE to get the time at which the data was collected.

SQL>  CONNECT / AS SYSDBA
SQL>  define default_tablespace='TOOLS'
SQL>  define temporary_tablespace='TEMP'
SQL>  define perfstat_password='my_perfstat_password'
SQL>  @?/rdbms/admin/spcreate
SQL>  CONNECT perfstat/my_perfstat_password
SQL>  EXECUTE statspack.snap;

How to create a Statspack Report Without Prompts

SQL>  connect perfstat/my_perfstat_password
SQL>  define begin_snap=1
SQL>  define end_snap=2
SQL>  define report_name=batch_run
SQL>  @?/rdbms/admin/spreport

How to gather optimizer stats on the PERFSTAT Schema:

EXECUTE DBMS_STATS.GATHER_SCHEMA_STATS(OWNAME=>’PERFSTAT’, CASCADE=.TRUE):

or

EXECUTE DBMS_UTILITY.ANALYZE_SHCEMA(‘PERFSTAT’,'COMPUTE’);

Where to find the snapshot level and threshold information used by the package: STATS$STATSPACK_PARAMETER

Temporarily using new values at snapshot level and take a snapshot:

EXECUTE STATSPACK.SNAP(i_snap_level=>6);

Saving the values of the snapshot level permanently and take a snapshot:

EXECUTE STATSPACK.SNAP(i_snap_level=>6, i_modify_parameter=>’true’);

Saving the values of the snapshot level permanently without taking a snapshot:

EXECUTE STATSPACK.MODIFY_STATSPACK_PARAMETER

(i_snap_level=>6, i_buffer_gets_th=>10000, i_disk_reads_th=>1000);

Categories: DBA Tags:

Rename Solaris Zone

December 16th, 2008 ather No comments

The process of renaming a zone is essentially a task of renaming, editing and replacing strings in a series of (mostly XML) configuration files. All of the tasks below were carried out from the global zone on the system in question.

1. Shut down the zone to be renamed

# zoneadm -z <oldname> halt

2. Modify the configuration files that store the relevant zone configuration

# vi /etc/zones/index
Change all references of <oldname> to <newname> as appropriate
# cd /etc/zones
# mv <oldname>.xml <newname>.xml
# vi <newname>.xml

Change all references of <oldname> to <newname> as appropriate

3. Rename the main zone path for the zone

# cd /export/zones
# mv <oldname> <newname>

Your zone path may be different than the one shown above

4. Modify (network) configuration files of new zone

Depending on the applications installed in your zone, there may be several files you need to update. The essential networking files are:

# cd /export/zones/<newname>/root
# vi etc/hosts
# vi etc/nodename

But others containing your old host/zone name can also be found using this command:

# cd /export/zones/<newname>/root/etc
# find . -type f | xargs grep <oldname>

5. Boot the new zone again
# zoneadm -z <newname> boot

Categories: Solaris Tags:

Analysis for IO bottleneck in Unix

December 16th, 2008 sam No comments

# sar -d

Linux 2.4.21-27.ELsmp (pw101) 12/04/2005

12:00:00 AM DEV tps rd_sec/s wr_sec/s

….

Average: dev8-128 7.16 5.37 75.07

Average: dev8-129 7.16 5.37 75.07

Average: dev8-130 0.00 0.00 0.00

….

The above command finds the busiest device. To determine what that device is, do:

# more /proc/devices

Character devices:

1 mem

2 pty

3 ttyp

4 ttyS

5 cua

7 vcs

Block devices:

1 ramdisk

2 fd

3 ide0

7 loop

8 sd

71 sd

129 sd

The above example indicates the ‘Block device’ 8 is ‘sd’ or scsi disk, and #129 is one of

scsi disks in the system. So ‘sar -d’ tells you that one of the scsi disks, or device

‘dev8-129′ is busy.

Now the following command is a bit busy, but you can always strip away piped commands to see

what output is like for each command. But the shown example give you which program (along with

its pid) has the most open files on a given busy device. In our example, the busy device we

are interested is ‘dev8-129′, but is represented as ’8,129′ in the ‘lsof’ output.

# lsof | grep “8,129″ | awk ‘{print $1″ “$2}’ | uniq -c | sort -n -r

[count p pid]

307 java 31916

307 java 31915

307 java 31914

307 java 31913

307 java 31912

307 java 31911

307 java 31910

307 java 31909

307 java 31908

307 java 31907

307 java 31906

307 java 31905

307 java 31904

307 java 31903

307 java 27645

55 db2sysc 32011

51 db2sysc 32012

46 httpd 32009

46 httpd 32008

46 httpd 32006

46 httpd 31901

46 httpd 31900

46 httpd 31899

46 httpd 31898

46 httpd 31897

46 httpd 31874

45 db2sysc 32019

40 db2fmp 31996

31 sshd 26713

29 db2sysc 32025

29 db2sysc 32024

The above output gives a clue as to which process(es) to look into for disk I/O problems

(i.e., ‘java’). The ‘java’ processes are having the largest number of open files, and

chances are doing the most disk I/O. Of course, some files maybe memory-cached, so this

method may not always work. I’d like to hear from you if you have a better solution. Please

contact me (see footer for contact info).

Categories: Linux Tags:

Simple Solaris IP Multipathing

December 15th, 2008 sam No comments

Simple Solaris IP Multipathing

IP multipathing consists of grouping two identical network cards together and having a live IP address be able to automatically fail over from one card to the other with no loss or degredation of service.


The steps to accomplish this are as follows:

1. Ensure that both cards are seen by the system and have different MAC addresses
2. Group the cards together
3. Add a test ip address to the first card
4. Add a test ip address to the second card
5. Change the hostname.* files to keep these settings after a reboot

Before going into each of these steps in detail, a few details should be kept in mind, in regards to assigning IP addresses. When activating IP multipathing on a server, a total of three IP addresses will be used. The private address space that you have typically uses a Class C address space, with only 254 usable addresses, so it is advised that only production servers get multipathed. Also, to keep the IP address space ‘clean’, the standard that has been adopted for IP assignments is that the main failover IP for production servers should be in the range of 192.168.2.1 to 54, the first test address should be 100 higher than the main address, and the second test address should be 200 higher than the main address. This will put the final octet of the first test address in the range 101 to 154, and the final octed of the second test address in the range of 20 1 to 254. DNS entries for the test addresses should also be created, even though the addresses are not used. The dns name for the first test address should be hostname of the server with ‘-test1′ concatenated to the end, and the dns name for the done likewise with ‘-test2′ appended.

For this exercise, we will use a hostname of server1, a main IP address of 192.168.2.9, and gigaswift ethernet cards ce0 and ce1

Ensure that both cards are seen by the system and have different MAC addresses
By default, Sun servers have the PROM setting ‘local-mac-address?’ set to false, which causes all ethernet cards to assume the MAC address of the primary (built in) ethernet card. This can be checked by using the following command:

eeprom local-mac-address?

If the result is false, then issue this command.

eeprom local-mac-address?=true

Next, run ‘ifconfig -a’ to ensure that both cards are seen by the system. Most likely, only ce0 will be seen and be in use. If this is the case, then run ‘ifconfig ce1 plumb’ to plumb the second card. If the local-mac-address? eeprom variable had to be set to true, then the results of ‘ifconfig -a’ will reveal that both cards have identical MAC addresses. If this is the case, then manually set the MAC address of the second card to some unique address with the following command:

ifconfig ce1 ether de:ad:be:ef:f0:0d

The next time the server reboots, both ethernet cards will take on new mac addresses.

Group the cards together
Both ethernet cards need to be assigned to the same group, to allow multipathing to know which cards an IP address can fail over to. This is accomplished with the following commands:

ifconfig ce1 server-int
ifconfig ce0 server-int

In this case, ‘server-int’ is the groupname. The groupname can be any artitrary string.

Add a test ip address to the first card
Each ethernet card will have a ‘test’ IP address assigned to it that will only be used by the system to verify that the card is functioning correctly. To set this address for the first card, issue the following command (this is all one command to be typed on one line, regardless of how word-wrapping handles it in this document):

ifconfig ce0 addif 192.168.2.109 netmask + broadcast + deprecated -failover up

Add a test ip address to the second card
To add the second ethernet card’s test IP, issue the following (this is all one command to be typed on one line, regardless of how word-wrapping handles it in this document):

ifconfig ce1 192.168.2.209 netmask + broadcast + deprecated -failover standby up

Change the hostname.* files to keep these settings after a reboot
The current /etc/hostname.ce0 contains just the hostname of the server. This will need to be replaced with the following:

group server-int
set 192.168.2.9/24 broadcast + up
addif 192.168.2.109/24 broadcast + deprecated -failover up

Create the /etc/hostname.ce1 file with the following text in it:

group server-int
set 192.168.2.209/24 broadcast + -failover deprecated standby up

Categories: Solaris Tags:

Backup and Recovery

December 15th, 2008 sam No comments

Media Recovery – Oracle

Executive Overview:

The media failure is the biggest threat for enterprise data management. A media failure is a physical problem that occurs when a computer unsuccessfully attempts to read from or write to a file necessary to operate the database. The technique you use to recover from media failure of a database file depends heavily on the type of media failure that occurred. The strategy you use to recover from a corrupted data file is different from the strategy for recovering from the loss of the control file. And also the important point needs to be remembered that the database is operating in archive log mode or no archive log mode. If the DB is operating in ARCHIVELOG mode, there are options for recovery depending on the type of failure. Broadly there are two types of media recovery:

· Complete media recovery

· Incomplete media recovery

Complete Media Recovery:

A complete media recovery will recover all lost changes to the database. You need to recover the entire database or a single data file which becomes corrupted. The dynamic view V$RECOVER_FILE determines what files requires recovery. The complete media recovery can be accomplished only if all online and archived redo log files are available. Complete media recovery can be performed on offline data files while the database is open and need to be opened with the RESETLOGS option if a backup control file or new control file was created for the recovery.

Types of complete media recovery are:

1. Closed Database Recovery

2. Open Database Recovery with Offline Tablespace

3. Open Database Recovery with Offline Datafile

1) Closed Database Recovery

If your system suffers a media failure and the database stays down, you need to perform a close database recovery.

Follow these steps to execute a closed database recovery:

a. Make sure the database is shutdown.

b. Correct the media problem if possible.

c. Restore the most recent backup of only those data files that were damaged by the media failure.

(There is no need to restore any undamaged data files or any online redo log files. If the hardware problem has been repaired, and damaged data files can be restored to their original locations, do so. If the hardware problem still exists, restore the data files to another location. This location will need to be recorded in the control file later in this procedure)

d. Start SQL*Plus and connect to Oracle as SYS.

e. Start the instance and mount it, but do not open the database.

f. If you restored lost files to alternate locations, the new location of these files must be recorded in the control file.

Follow these three steps to relocate the data file(s):

ü Make sure the tablespace that contain the data files are offline.

ü Make sure that the new, fully specified file names are different from the old file names.

ü Use the SQL command ALTER TABLESPACE with the RENAME DATAFILE option to change the file names within the database.

g. Query the V$DATAFILE view and make sure all data files you want to recover are online. If a data file is offline, issue the ALTER DATABASE command with the DATAFILE ONLINE option.

For example: ALTER DATABASE DATAFILE ‘userdata01.dbf’ ONLINE;

h. To start closed database recovery, use the RECOVER command.

Oracle will now start the roll forward by applying the archived redo log files and the online redo log file. If AUTORECOVERY is set to ON, the applying of the log files is automatic. If it is not set to ON, you will be prompted for each log file.

When recovery is complete, open the database with the ALTER DATABASE OPEN; command.

2) Open Database Recovery with Offline Tablespace

If your system suffers a media failure but the database stays up, you can perform an open database recovery and recover only the tablespaces that are damaged. Users can continue to access the online tablespaces and data files that were not damaged by the failure. Oracle automatically takes damaged data files offline.

Follow these steps to execute an open database recovery with offline tablespaces:

a. The database should be started and open.

b. Take all tablespaces containing damaged data files offline using the command: ALTER TABLESPACE tablespace_name OFFLINE

You can query the V$DATAFILE view to see which data files are offline.

c. Correct the problem that caused the media failure.

(If the problem cannot be corrected in a reasonable amount of time, your other option is to restore the damaged files to another location.)

Follow these three steps to relocate the data file(s):

ü Make sure the tablespace that contains the data files is offline.

ü Make sure that the new, fully specified file names are different from the old file names.

ü Use the ALTER DATABASE command with the RENAME FILE option to change the file names within the database.

(If the media problem can be corrected, restore the most recent backup files of only the data files damaged by the media failure. Remember that the database is open. Do NOT try to restore undamaged data files, log files, or control files.)

d. Use the RECOVER TABLESPACE command to start offline tablespace recovery of all damaged data files in one or more offline tablespaces.

Oracle will now start the roll forward by applying the archived redo log files and the online redo log file. If AUTORECOVERY is set to ON, the applying of the log files is automatic. If it is not set to ON, you will be prompted for each log file.

e. The damaged tablespaces of the open database are now recovered up to the point of failure. You can bring the offline tablespaces online using ALTER TABLESPACE command with the ONLINE option.

3) Open Database Recovery with Offline Datafile

If there is a media failure but the database stays up, you can perform an open database recovery with only the data files that are damaged. Users can continue to access the online tablespaces and data files that were not damaged by the failure. Oracle automatically takes damaged data files offline.

Follow these steps to execute an open database recovery with offline data files.

a. The database should be started and open.

b. Take all tablespaces containing damaged data files offline using the ALTER DATABASE DATAFILE ‘stuff01.dbf’ OFFLINE;

You can query the V$DATAFILE view to see which data files are offline.

c. Correct the problem that caused the media failure.

(If the problem cannot be corrected in a reasonable amount of time, your other option is to restore the damaged files to another location.)

Follow these three steps to relocate the datafile(s):

ü Make sure the tablespace that contains the data files is offline.

ü Make sure that the new, fully specified file names are different from the old file names.

ü Use the ALTER DATABASE command with the RENAME FILE option to change the file names within the database.

(If the media problem can be corrected, restore the most recent backup files of only the data files damaged by the media failure. Remember, the database is open. Do NOT try to restore undamaged data files, log files, or control files.)

d. Use the RECOVER DATAFILE command to start offline datafile recovery of all damaged data files in one or more offline data files tablespaces.

Oracle will now start the roll forward by applying the archived redo log files and the online redo log file. If AUTORECOVERY is set to ON, the applying of the log files is automatic. If it is not set to ON, you will be prompted for each log file.

e. The damaged data files of the open database are now recovered up to the point of failure. You can bring the offline data files tablespaces online using the ALTER DATABASE DATAFILE TABLESPACE command with the ONLINE option.

Incomplete Media Recovery:

In some situations complete media recovery may not be possible or may not be desired. This can happen because all the files needed for a complete recovery are not available (for example, all online redo log files are lost and you were not duplexing your log groups). You might want to perform an incomplete recovery if a user drops a table and you want to recover to the point before the table was dropped.

Incomplete media recovery is also called point-in-time recovery. Point in time recovery that is not continued to a complete recovery must be terminated by the OPEN RESETLOGS option. The database must be closed during incomplete recovery operations.

There are three types of incomplete media recovery:

1. Cancel-Based Recovery

2. Time-Based Recovery

3. Change-Based Recovery

1) Cancel-Based Recovery

Cancel-Based recovery allows canceling recovery at a desired point. This situation is most likely occur if archive log files or redo log files needed for recovery are lost or damaged and cannot be restored. In this situation, you would apply all logs until you reached the missing files and then cancel the recovery.

Follow these steps to execute a cancel-based recovery:

ü If the database is still open, shut down the database using the SHUTDOWN command with the ABORT option.

ü Make a full backup of the database, including all datafiles, a control file, and the parameter files in case an error is made during the recovery.

ü Correct the problem that caused the media failure.

(If the problem cannot be corrected, the datafiles must be restored to an alternate location. If this is the case, the ALTER TABLESPACE RENAME DATAFILE command must be used to change the location of the datafile in the control file.)

ü If the current control files do not match the physical structure of the database at the time you want to recover to, restore a backup of the control file that matches the database’s physical file structure at the point in time you want to recover to. Replace all current control files of the database with the one you want to use for recovery. If you do not have a backup copy of the control file, you can create a new one.

ü Restore backups of all datafiles. Make sure the backups were taken before the point in time you are going to recover to. Any datafiles added after the point in time you are recovering to should not be restored. They will not be used in the recovery and will have to be recreated after recovery is complete. Any data in the datafiles created after the point of recovery will be lost.

Note: Make sure read-only tablespace are offline before you start recovery so recovery does not try to update the datafile headers.

a. Start SQL*Plus and connect to Oracle as SYS.

b. Start the instance and mount the database using the STARTUP command with the MOUNT option.

c. If you restored files to an alternative location, change the location now in the control file by using the ALTER TABLESPACE RENAME DATAFILE command.

d. Use the RECOVER DATABASE UNTIL CANCEL command to begin cancel-based recovery. If a backup of the control file is being used, make sure to specify the USING BACKUP parameter.

Oracle will now start the roll forward by applying the archived redo logfiles and the online redo logfile. Oracle will prompt you for each logfile. If you used a backup control file, you must enter the names of the online redo logfiles.

e. Continue applying redo logfiles until the most recent, undamaged logfile has been applied.

Enter “CANCEL” instead of the logfile name to cancel the recovery. Oracle will respond with a recovery successful message.

f. Use the ALTER DATABASE OPEN command with the RESETLOGS or NORESETLOGS option.

You should use the RESETLOGS option if you used a backup of the control file in recovery, or the recovery was incomplete. Use the NORESETLOGS option if the recovery was complete. If you are using a standby database and must reset the logs, the standby database will have to be re-created.

You can check the ALERT file to see if your incomplete recovery was actually a complete recovery.

If the recovery was a complete recovery, the message in the ALERT file is as follows:

RESETLOGS after complete recovery through change scn

If the recovery was incomplete, the following message is recorded:

RESETLOGS after incomplete recovery UNTIL CHANGE scn

g. After opening the database using the RESETLOGS option, perform a normal shutdown and a full database backup. If you do not do this, any changes made after the recovery and before the next full backup are unrecoverable. If you did not reset the logs, the database is still recoverable.

2) Time based Recovery

To recover the database to the time specified by the date. The date must be a character literal in the format ‘YYYY-MM-DD:HH24:MI:SS’.

ü If the database is still open, shut down the database using the SHUTDOWN command with the ABORT option.

ü Back up the database as a precaution and correct any media failures.

ü Restore backup control files (if necessary) and backup data files and bring them online.

Question: You do not have a backup of a data file

Answer: Create an empty replacement file, which can be recovered.

ü Perform media recovery on the restored backup using the RECOVER DATABASE statement with the UNTIL TIME option.

Note: Files in read-only tablespaces should be offline if you are using a control file backup. Otherwise, the recovery will try to update the headers of the read-only files.

a. Start SQL*Plus and connect to Oracle as SYS.

b. Start the instance and mount the database using the STARTUP command with the MOUNT option.

c. If one or more damaged data files were restored to alternative locations, indicate the new locations of these files to the control file of the associated database. For example, enter:

ALTER DATABASE RENAME FILE ‘/oracle/dbs/df2.f’ TO ‘/oracle/newloc/df2.f’;

d. Obtain the names of all data files requiring recovery by:

ü Checking the list of data files that normally accompanies the control file being used.

ü Querying the V$DATAFILE view.

e. Make sure that all data files of the database are online. All data files of the database must be online unless an offline tablespace was taken offline normally. For example:

ALTER DATABASE DATAFILE ‘users1′ ONLINE;



f. Issue the RECOVER DATABASE UNTIL TIME statement to begin time-based recovery. The time is always specified using the following format, delimited by single quotation marks: ‘YYYY-MM-DD:HH24:MI:SS’.

The following statement recovers the database up to a specified time using a control file backup:

RECOVER DATABASE UNTIL TIME ’2007-12-31:12:47:30′ USING BACKUP CONTROLFILE;

g. Apply the necessary redo log files to reconstruct the restored data files. Unless the application of files is automated, Oracle supplies the name it expects to find from LOG_ARCHIVE_DEST and requests you to stop or proceed with applying the log file. If the control file is a backup, you must supply names of online logs.

h. Apply redo log files until the last required redo log file has been applied to the restored data files. Oracle automatically terminates the recovery when it reaches the correct time, and returns a message indicating whether recovery is successful.

3) Changed Based Recovery

This recovers the database to a transaction consistent state immediately prior to the system change number (SCN) specified by integer.

This describes how to perform recovery to a specified SCN in these stages:

ü If the database is still open, shut down the database using the SHUTDOWN command with the ABORT option.

ü Back up the database as a precaution and correct any media failures.

ü Restore backup data files.

a. Start SQL*Plus and connect to Oracle as SYS.

b. Start the instance and mount the database using the STARTUP command with the MOUNT option.

c. If one or more damaged data files were restored to alternative locations, indicate the new locations of these files to the control file of the associated database.

Note: To determine the SCN needed for recovery, you can query the V$LOG_HISTORY view.

For example: Select * from v$log_history where rownum < 3;

THREAD# SEQUENCE# TIME LOW_CHANGE# HIGH_CHANGE# ARCHIVE_NAME

1 47213 06/26/98 12:47:55 116098950 116098954 /u01/arch/log_47213.arc

1 47212 06/26/98 12:47:33 116098947 116098949 /u01/arch/log_47212.arc

The LOW_CHANGE# column represents the SCN at the beginning of the log file and the HIGH_CHANGE# represents the SCN at the end of the log file.

d. Begin change-based recovery, specifying the SCN for recovery termination. The SCN is specified as a decimal number without quotation marks. For example, to recover until SCN 116098954 issue:

RECOVER DATABASE UNTIL CHANGE 116098954;

e. Oracle begins the roll forward phase of media recovery by applying the necessary redo log files (archived and online) to reconstruct the restored data files. Unless the application of files is automated, Oracle supplies the name it expects to find from LOG_ARCHIVE_DEST and requests you to stop or proceed with applying the log file. If the control file is a backup file, you must supply names of online logs.

f. Oracle apply the redo log files until the last required redo log file has been applied to the restored data files. Oracle automatically terminates the recovery when it reaches the correct system change number, and returns a message indicating whether recovery is successful.

CONCLUSION

The practices described above are a checklist to ensure that media recovery is being organized in a right direction. Correct media recovery leads to maximum availability and minimize recovery time. This equates to more uptime and higher availability in the case of an unplanned or planned outage and helps enterprises meet the SLAs associated with recovery time objectives.

Reference: http://www.stanford.edu

Categories: DBA Tags:

Recovering Database with Missing Archived Logs

December 15th, 2008 sam No comments

Introduction

In this paper we will present a method for retrieving data from an Oracle database that cannot be opened normally because one or more datafiles is inconsistent with the others. An example of a scenario where you would find yourself in this situation is as follows:

A disk failed on our server and we lost a datafile. We restored the datafile from a hot backup taken a week ago, but it turns out we are missing a few redo logs archived since then. Oracle is complaining because we can’t produce the required archived redo logs, and we can’t open the database. The datafile in question contains our most important tables. Is there any way we can salvage our data?

Every DBA should know that there is a problem here. The missing archived redo logs contain transactions which affect the data in the database. So it’s a given that you’re going to lose some data, but the question is, “How much?” Oracle takes a hard-line position and will not let you open the database normally because a data integrity issue exists. However, you may be able to retrieve much of your data if you use non-traditional means to get Oracle to drop its hard-line attitude. Retrieving the data that can be salvaged with the understanding that some data will be lost could be a whole lot better than losing all of the data because a subset of it has been corrupted.

In the next section of this paper, we will look at an overview of how you might go about salvaging data from an inconsistent datafile and get the database back up and running properly again. In the succeeding sections of this paper, we’ll dig into each step of the process in greater detail.

Overview

If you’ve lost a datafile that contained only indexes for heap-organized tables, or other data that is easily recreated, then your best bet might be to drop the tablespace and recreate and repopulate it from scratch. But if you’ve lost a datafile that contained important data not easily recreated, and all you have is an old backup of the file without all of the intervening archived redo logs, then you will want to extract what data you can from the problem tablespace, drop the tablespace, and then recreate and repopulate the tablespace.

Although the exact steps will vary depending on the particular situation, the general steps involved are:

  1. Taking a cold backup of what you have now.
  2. Restoring the lost datafile from a backup and applying the archived redo logs that you do have.
  3. Setting an undocumented instance parameter which will allow you to open the database in its current state.
  4. Doing exports and selects to retrieve what data you can from the problem tablespace.
  5. Restoring the entire database from the cold backup taken earlier.
  6. Taking the damaged datafile offline.
  7. Doing exports and selects to retrieve additional data not salvaged in step 4.
  8. Restoring again from the cold backup.
  9. Dropping the problem tablespace.
  10. Recreating the problem tablespace.
  11. Rebuilding the data in the problem tablespace with the data extracted in steps 4 and 7.

Some of these steps can be quite tedious and time-consuming. You may choose to skip a few or even several of the steps depending on how much disk space, tedium, and database down time you are willing to pay in exchange for potentially salvaging more of the lost data.

As we discuss each of the steps in greater detail, we will walk through an example case where a datafile called ordtab03.dbf in tablespace ORDTAB was lost due to a disk crash. This datafile contained many extents of the ORDERS table. The datafile was restored from a hot backup taken July 4, 2004, but some of the archived redo logs between July 4 and the present day have been lost.

Step 1: Backup the Database

The first thing you should do is take a cold backup of whatever datafiles, online redo logs, and control files you currently have. If rebuilding everything from existing backups, then make sure you have a place to keep the backup files handy, as you’ll probably want them more than once. It’s best if you have cold backups. If you’ve just lost one or a few datafiles and the database is still open, make a hot backup of each of the remaining datafiles and save this somewhere (and make sure you keep the archived redo logs generated during and after the hot backup).

In a later step we will be using an undocumented and unsupported Oracle feature in order to make the best of a bad situation. Taking a backup of the database now, before we “cross the line” into unsupported territory, allows us to return to this side of the line in a later step.

After you’ve created the backup, it’s time to work on the database. Before you shut the database down, create a backup control file script:

        ALTER DATABASE BACKUP CONTROLFILE TO TRACE RESETLOGS;

This will create a script in your user_dump_dest directory to create a new control file. The name of the file will be similar to that of other trace files-something like [instance_name]_ora_[PID].trc. Rename the file to newcontrol.sql.

Now we want to edit the file. Open it with a text editor such as vi or emacs. There will be about 15 lines at the top of the file with version and connection information. We don’t want these messing up our script, so delete all the lines above the first line beginning with a pound sign (#). (Instead of deleting these lines, you could make them comments by putting a pound sign at the beginning of each.)

Then delete the line that begins “RECOVER DATABASE…” toward the end of the file and save the file.

Step 2: Restore the Lost Datafile and Apply Archived Redo Logs

At this point you should restore the lost datafile from a backup and apply what archived redo logs you have in order to roll the datafile contents forward as far as possible. You will have to stop at the first missing archived redo log. In our example, we restored the ordtab03.dbf datafile from the July 4, 2004 hot backup and applied the archived redo logs that we had available.

If you were to try to open the database normally at this point, you would get an ORA-01589 error:

        ORA-01589: must use RESETLOGS or NORESETLOGS option for database open

If you then tried an ALTER DATABASE OPEN RESETLOGS command, you would get an ORA-01195 error:

        ORA-01195: online backup of file %s needs more recovery to be consistent

This is where Oracle is taking its hard-line approach. The datafile restored from the backup was not recovered to a point in time consistent with the other datafiles in the database. Therefore data corruption may exist and Oracle will not let you open the database normally.

Step 3: Set an Undocumented Instance Parameter and Open the Database

At this point we leave the world of standard practice and cross the line into unsupported activity. It’s time to edit the database’s init.ora file or spfile. First, you’ll want to set job_queue_processes to 0 if it’s not already, as you don’t need jobs running while you’re extracting data. Then you need to set the parameter:

        _allow_resetlogs_corruption=TRUE

This parameter is a “hidden” or undocumented parameter-one of those which you’re never supposed to use unless told to do so by Oracle Support. We probably should have pointed out earlier that you shouldn’t be doing this entire exercise unless you’ve failed with everything Oracle Support has told you to do. The description of this parameter reads “allow resetlogs even if it will cause corruption”. Some of the caveats to be found on Metalink regarding this parameter include:

Steps as mentioned here beneath are only applicable in a situation that no restore/recovery can be performed of the database in question. Applying beneath steps means that data will be exported from a database being in an inconsistent state (no instance recovery can be performed), this is true for user data as well for the data dictionary.

You should NOT use unsupported parameters without the advice from support.

Since you are using these undocumented parameters without specific instruction from Oracle Support, we can’t support this database. If someone did instruct you to set these parameters, then you may want to continue discussion on this issue with that individual.

If you are able to startup this database, it would be only to do a full export and recreate it. If you want to pursue this possibility, please log an itar. This issue cannot be handled through this forum.

That said, if you want to continue, after changing the parameter, then move to the directory where you saved your newcontrol.sql script in the first step. (If you didn’t move the script, it will still be in your user_dump_dest directory.) Connect to the database as SYSDBA and run the newcontrol.sql script.

Your database is now open (though NOT consistent). You can run queries and sometimes everything will appear perfectly normal:

        SQL> SELECT COUNT(*) FROM OE.orders;

          COUNT(*)
        ----------
            403439

The database is in an inconsistent state and you are skating on thin ice. The above query worked fine because Oracle didn’t need to access any data blocks that were corrupt or referenced inconsistent undo entries. The query could have just as easily failed with an ORA-00600 error such as:

        SQL> SELECT COUNT(*) FROM OE.orders;

        SELECT COUNT(*) FROM OE.orders
                                *
        ERROR at line 1:
        ORA-00600: internal error code, arguments: [kcfrbd_2], [14],
        [19081], [8], [1280], [1280], [], []

Step 4: Do Exports and Selects to Retrieve Data

As we saw in the previous step, some queries will work fine and some will fail right away. Still others will return partial results, quitting when an inconsistency is found:

        SQL> SELECT order_id FROM OE.orders
          2  WHERE entry_date > TO_DATE ('04-JUL-2004');

          ORDER_ID
        ----------
            496103
            496104
            496105
               ...
            511325
            511326
            511327
            511328
        ERROR:
        ORA-00600: internal error code, arguments: [kcfrbd_2], [14],
        [19081], [8], [1280], [1280], [], []

        15225 rows selected.

We can use this ability to “pick around” problem data blocks:

        SQL> SELECT order_id FROM OE.orders
          2  WHERE order_id > 511400;

          ORDER_ID
        ----------
            511401
            511402
            511403
              ...
            513398
            513399
            513400
            513401

        2001 rows selected.

Database exports are now possible too, though some errors will occur:

        About to export specified users ...
        . exporting object type definitions for user OE
        EXP-00090: cannot pin type "OE"."ORDER_ELEMENT_TYPE"
        EXP-00056: ORACLE error 22303 encountered
        OCI-22303: type "OE"."ORDER_ELEMENT_TYPE" not found
          ...
        . about to export OE's tables via Conventional Path ...
        . . exporting table     BATCH_JOBSS       4382 rows exported
          ...
        . . exporting table  CUSTOMER_TYPES
        EXP-00056: ORACLE error 600 encountered
        ORA-00600: internal error code, arguments: [4146], [45144], [45124], [], [], [], [], []
        . . exporting table  DEFAULT_VALUES        391 rows exported

Even for the tables with errors, some data will likely be extracted and written to the export file. Moreover you can easily determine which tables you’re able to export all rows from, so you won’t have to make further extraction efforts with them.

Step 5: Restore the Database from Backup

This step, along with the next two, is optional. Together these three steps present another approach that may allow you to retrieve more of your data. Restoring the database from a backup at this point effectively undoes any damage caused by the use of the undocumented _allow_resetlogs_corruption instance parameter. This time through, we will not make any attempt to recover the lost datafile.

Step 6: Take the Damaged Datafile Offline

In this step you take the damaged datafile offline. The purpose here is to get the database to a point where everything is completely consistent, and the data that would be inconsistent is simply deemed unavailable.

This is fairly straightforward:

        ALTER DATABASE DATAFILE '/u07/oradata/PRD/ordtab03.dbf' OFFLINE;

Step 7: Do Exports and Selects to Retrieve Additional Data

At this point you may be able to retrieve additional data for salvage that you were not able to get at earlier. For example, you may be able to fetch useful data from indexes belonging to tables that are damaged. If you inadvertently try to access the damaged datafile, you’ll get an ORA-00376 error:

        ORA-00376: file 39 cannot be read at this time
        ORA-01110: data file 39: '/u07/oradata/PRD/ordtab03.dbf'

Step 8: Restore the Database from Backup

Now you restore the database from backup for the last time. This step officially rolls the database back to a point in time before the use of the undocumented instance parameter, and therefore returns the database to a supported state. Note that if you restored the database from backup in step 5 and have not updated any data in the database since then, you may be able to skip this step.

Step 9: Drop the Problem Tablespace

First you’ll need to determine whether there are any referential integrity constraints from tables outside the problem tablespace which refer to primary or unique keys of tables inside the problem tablespace. You can use a query such as the following:

        SELECT CR.constraint_name
        FROM   dba_constraints CR, dba_constraints CP,
               dba_tables TP, dba_tables TR
        WHERE  CR.r_owner = CP.owner
        AND    CR.r_constraint_name = CP.constraint_name
        AND    CR.constraint_type = 'R'
        AND    CP.constraint_type IN ('P', 'U')
        AND    CP.table_name = TP.table_name
        AND    CP.owner = TP.owner
        AND    CR.table_name = TR.table_name
        AND    CR.owner = TR.owner
        AND    TR.tablespace_name <> 'ORDTAB'
        AND    TP.tablespace_name = 'ORDTAB';

If there were any such constraints, you would need to create scripts to recreate them (if you don’t already have them). If you are using export dumps to rebuild the data (in step 11), the constraints can possibly be restored from the export files.

Drop the tablespace containing the damaged datafile with a statement like:

        DROP TABLESPACE ordtab INCLUDING CONTENTS CASCADE CONSTRAINTS;

Step 10: Recreate the Problem Tablespace

Here you simply want to recreate the tablespace so that in the next step you can repopulate it with the data you extracted earlier. You could consult an old export file in order to recreate the tablespace exactly as it was, or you could take this opportunity to switch to a locally managed tablespace, enable the autoallocate feature, adjust storage parameters, and so on.

Step 11: Rebuild the Data in the Problem Tablespace

At this time you are ready to reload the tablespace with the data that you salvaged earlier. If you used the export utility to extract data from some or all tables, you can use import to restore the schema objects and data. If export encountered an error while reading a table, import should still be able to recreate the rows that were successfully extracted before the error occurred. If you used queries to extract data into flat files, then you can use SQL*Loader or any number of other tools to put the data back into the database.

Now you will have a functional database with as much data as you were able to extract from the damaged database. If you are lucky you were able to get almost all of your data. If not, at least you were probably able to get some data and get your database working again.

The first thing you should do immediately after repopulating the tablespace is take a backup of this repaired database. The second thing you should do is implement proper procedures for backing up the database, storing the backups and archived redo logs for appropriate retention periods, and testing the backup/recovery process on a regular basis. Your data

Categories: DBA Tags: