Here is my presentation about Logminer from IOUG 2003 conference
LOGMINER CONCEPTS
Oracle Server utilizes redo log files for database recovery. It records all the necessary information about various database activities in redo log files and uses this information for database recovery in the case of various user, instance or media failures. But this information, stored in redo log files, represents history of database activities and can potentially be used for various other purposes like:
• Track and correct changes made to information stored in a database. Changes can be tracked based on transaction, user, table, time, etc. After finding selected changes in redo log files we can undo them in the database.
• Perform fine-grained recovery at the transaction level. If we take into account existing dependencies, we can perform a table-based undo operation to roll back a set of changes.
• Detect incorrect modifications in the database and use this information to perform logical recovery at the application level instead of the database level.
• Collect information for tuning and capacity planning. Information from redo log files can be used for various historical analyses to determine trends and data access patterns.
• Retrieve critical information for debugging complex applications.
• Performing post-auditing. Using information from redo log files we can track any DML and DDL statement executed on the database, with order of their execution and database user who executed them.
• Replicate transactions from one database into another database.
But to be able to use information from redo log files we need a tool which will enable us to select information from those files based on various criteria. Oracle Corporation recognized a need for such a tool and introduced LogMiner, which enables SQL-based access to the contents of redo log files from command line interface. LogMiner is first introduced with 8i release of Oracle Server. It is a fully relational tool that allows us to use SQL to read, analyze and interpret contents of redo log files. LogMiner provides extensible auditing capabilities without introducing any overhead. With LogMiner we can view any valid redo log file, online or archived, from release 8.0 of Oracle server onwards. LogMiner is a feature of both standard and enterprise editions of Oracle Server. It consists of two PL/SQL database packages and a few dynamic views. With each new release of Oracle server new functionality for LogMiner is introduced and support for additional data types and operations in the database.
LogMiner in release 9.2 of Oracle Server is also used to support capture process for Oracle Streams. Oracle Streams is a new feature that enables sharing of data and events in a data stream, either within a database or from one database to another.
To use LogMiner user must specify, see Figure 1, the list of online and archived log files that he wants to analyze. LogMiner can run in release 8i database or later but it can analyze redo logs generated in release 8.0 database or later. With LogMiner options user can limit search for information from redo logs based on SCN or time range of values. LogMiner must use database dictionary to translate object ids from redo logs to object names. When user starts LogMiner session he can select rows from V$LOGMNR_CONTENTS view and get DML SQL statements that were executed in the database in the past. Besides original DML SQL statement we can also get undo SQL statement that we can use to undo DML operation that was executed in the database.
To further simplify usage of LogMiner, with release 9i Oracle introduced LogMiner Viewer that is a part of Oracle Enterprise Manager and provides ability to use LogMiner functionality in GUI environment. You can run it from Oracle Enterprise Manager standalone or through Management Server. In both cases you can use LogMiner Viewer to analyze redo logs and store your queries in Enterprise Manager repository.
Figure 1 – Diagram of LogMiner functionality
CONFIGURING DATABASE FOR LOGMINER
When you want to run LogMiner session in an Oracle database you have to prepare it in a following way:
SET DATABASE PARAMETERS
• Specify value for UTL_FILE_DIR parameter if LogMiner will use dictionary from a flat file. LogMiner requires access to a database dictionary to fully translate the contents of redo log files.
• Specify value for LOGMNR_MAX_PERSISTENT_SESSION parameter. With a value of this parameter we specify maximum number of concurrent LogMiner sessions that we can run on this database. Default value is 1.
ADD PL/SQL SUPPORT IN THE DATABASE
To add PL/SQL support you have to run script catproc.sql. When you add PL/SQL support all the necessary objects for LogMiner are also created.
LogMiner uses PL/SQL packages DBMS_LOGMNR and DBMS_LOGMNR_D and dynamic views V$LOGMNR_CONTENTS, V$LOGMNR_DICTIONARY, V$LOGMNR_LOGS AND V$LOGMNR_PARAMETERS.
ENABLE SUPPLEMENTAL LOGGING
If you enable supplemental logging Oracle server automatically stores additional information in redo log files that can be useful for LogMiner. With release 9.2 of Oracle server we can use database or table supplemental logging.
When we enable database supplemental logging we can choose between minimal and identification logging. With minimal supplemental logging Oracle server stores in redo log files minimal amount of information that is needed for LogMiner to identify, group and merge the REDO operations associated with DML changes. With this type of logging LogMiner has all information that is necessary to support chained rows and various storage arrangements (clustered tables). To enable or disable database supplemental logging you have to connect to the database as user with DBA privileges. To enable minimal database supplemental logging you have to execute the following statement:
SQL> ALTER DATABASE ADD SUPPLEMENTAL LOG DATA;
With database identification logging enabled, Oracle server stores in redo log files additional information that enables database-wide before-image logging of primary keys or unique indexes (if primary key is not defined) for all updates. With this type of logging updated rows can be identified logically instead by ROWID. This enables us to transfer selected update statements from our database and execute updates in some other database. Oracle Corporation recommends that all or most of the tables should have primary key or unique index defined because if none is defined when you use supplemental logging all the columns except LONG and LOB are logged. To enable identification key logging you have to execute the following statement:
SQL> ALTER DATABASE ADD SUPPLEMENTAL LOG DATA (PRIMARY KEY, UNIQUE INDEX) COLUMNS;
Oracle Corporation recommends using database supplemental logging because it does not produce significant overhead and does not affect database performance.
To disable database supplemental logging when you do not need it any more you have to execute the following statement:
SQL> ALTER DATABASE DROP SUPPLEMENTAL LOG DATA;
Thereafter Oracle server will stop storing supplemental data into redo log files.
To see status of database supplemental logging in your database you must query V$DATABASE view.
Second type of supplemental logging is table supplemental logging where you can decide to store in redo log files additional information for selected tables. Table supplemental logging uses log groups to log supplemental information. There are two types of log groups:
• Unconditional log groups – The before images of specified columns are logged whenever the table is updated
• Conditional log groups – The before images of specified columns are logged only if at least one of the columns in the group is updated.
To use table supplemental logging with unconditional log groups you have to execute ALTER TABLE statement like in the following example:
SQL> ALTER TABLE test ADD SUPPLEMENTAL LOG GROUP group1 (col1, col2) ALWAYS;
With this statement we have specified that for each update of table test columns col1 and col2 will be logged in redo log files.
If we want to use table supplemental logging with conditional log groups we would define such a group with a statement like in a following example:
SQL> ALTER TABLE test ADD SUPPLEMENTAL LOG GROUP group1 (col1, col2);
If we do not specify keyword ALWAYS that means that we want to create conditional log group. In this case for each update of table test only if update modifies any of the columns in the group group1, columns col1 and col2 will be logged in redo log files. You should be very careful when you use table supplemental logging because if you use it on heavily accessed tables for a longer period of time it can produce a significant overhead and negatively affect database performance. You should use table supplemental logging sparingly, only on the tables that you want to analyze with LogMiner and for a limited period of time.
If you do not need table supplemental logging for a particular table you should disable it by dropping supplemental log group with the statement like in the following example:
SQL> ALTER TABLE test DROP SUPPLEMENTAL LOG GROUP group1;
Thereafter Oracle server will not generate and store supplemental information in redo log files for group group1.
To see the status of your table supplemental log groups you should query data dictionary views DBA_LOG_GROUPS and DBA_LOG_GROUP_COLUMNS or equivalent views with ALL or USER prefixes.
USE SEPARATE TABLESPACE FOR LOGMINER OBJECTS
By default all LogMiner tables are created in SYSTEM tablespace. If you use it to select information from large number of online and archived redo log files, these tables can occupy a lot of database space. Therefore Oracle Corporation recommends using a separate tablespace for LogMiner tables. To specify that you want to use another tablespace for LogMiner tables you have to create this tablespace and execute the following procedure:
SQL> execute DBMS_LOGMNR_D.SET_TABLESPACE (‘LOGMNR_TS’);
This statement will recreate all LogMiner tables in LOGMNR_TS tablespace.
CHOOSING LOGMINER DICTIONARY TYPE
LogMiner requires access to a database dictionary to fully translate contents of redo log files. Without a dictionary LogMiner returns internal object IDs instead of object names and presents data as hex data bytes. It is therefore recommended to always use LogMiner with database dictionary. Because LogMiner can analyze redo log files from different databases, not only from database in which it is started, database dictionary must be created in the same database in which redo log files to be analyzed are generated. LogMiner validates the dictionary against selected redo logs and reports any inconsistencies.
If LogMiner is started in a target database and analyzes redo logs from a source database then before you start a LogMiner session you have to:
• Create database dictionary in a source database
• Copy selected redo log files and a dictionary file, if it is stored in a flat file, from a source to a target database server
Thereafter LogMiner can analyze redo logs from source database in a target database.
LogMiner can use one of the following database dictionary types:
• Dictionary stored in a flat file.
• Dictionary stored in the redo logs
• Online catalog of a database in which LogMiner is started.
To extract database dictionary to a flat file or to the redo logs you have to connect to the database as user with SYSDBA privileges.
EXTRACTING THE DICTIONARY TO A FLAT FILE
To extract database dictionary to a flat file you have to use procedure DBMS_LOGMNR_D.BUILD with STORE_IN_FLAT_FILE option. Because LogMiner stores dictionary on a file system you have to specify database parameter UTL_FILE_DIR to point to the existing directory on a file system where dictionary file will be created.
If you want to extract database dictionary for a release 8.0 Oracle database you have to perform this specific steps because LogMiner can not be directly started in release 8.0 database:
• Copy dbmslmd.sql script from $ORACLE_HOME/rdbms/admin directory on a database server where 8i release of Oracle Server is installed to the same directory on a database server where 8.0 release of database Server is installed.
• Connect as sys user to opened release 8.0 database and execute:
SQL> @dbmslmd
To extract database dictionary to a flat file you have to execute the following procedure:
SQL> execute DBMS_LOGMNR_D.BUILD (‘dict.ora’, -
2 ‘/oracle/dbs’, -
3 OPTIONS => DBMS_LOGMNR_D.STORE_IN_FLAT_FILE);
To successfully extract dictionary to a flat file you have to specify a database parameter UTL_FILE_DIR = /oracle/dbs.
Keep in mind that dictionary file is an ASCII file when you copy it to a target database server.
EXTRACTING THE DICTIONARY TO THE REDO LOGS
To extract dictionary to the redo log files database must be open and in ARCHIVELOG mode. While the dictionary is being extracted to the redo stream, no DDL statements can be executed. This guarantees that extracted directory is consistent. In contrast to that, when you extract database dictionary to a flat file there is no guarantee that extracted dictionary is consistent because DDL statements can be executed.
To export the LogMiner dictionary to the redo log files, the following restrictions exist:
• DBMS_LOGMNR_D.BUILD must be executed on an Oracle9i database
• The COMPATIBLE parameter value must be set to 9.0.X
• The dictionary must belong to the same database as the redo logs to be analyzed
To extract database dictionary to the redo logs you have to use procedure DBMS_LOGMNR_D.BUILD with STORE_IN_REDO_LOGS option. For example:
SQL> execute DBMS_LOGMNR_D.BUILD ( -
2 OPTIONS => DBMS_LOGMNR_D.STORE_IN_REDO_LOGS );
Extracting database dictionary to the redo logs consume database resources so you should schedule it at off-peak hours. It is faster than extracting dictionary to a flat file.
Extracted dictionary can be contained in multiple redo log files so we have to find out which redo log files contain it. By querying view V$ARCHIVED_LOG you can find which redo log files contain the start and the end of an extracted dictionary. To find out this information you have to execute the following statements:
SQL> SELECT NAME FROM V$ARCHIVED_LOG WHERE DICTIONARY_BEGIN=’YES’;
SQL> SELECT NAME FROM V$ARCHIVED_LOG WHERE DICTIONARY_END=’YES’;
You will get names of two redo log files that contain begin or end of extracted dictionary. For your LogMiner session you have to use at least all redo log files between begin and end of extracted dictionary because this guarantees that your dictionary is complete.
USING THE ONLINE CATALOG
You can choose to use online catalog of database on which LogMiner is started as database dictionary. To specify that you want to use online catalog you have to execute procedure DBMS_LOGMNR.START_LOGMNR with DICT_FROM_ONLINE_CATALOG option:
SQL> execute DBMS_LOGMNR.START_LOGMNR ( -
2 OPTION => DBMS_LOGMNR.DICT_FROM_ONLINE_CATALOG);
With this option you start LogMiner session that will use online catalog as database dictionary.
If you use this option you can analyze only redo log files, online and archived, that are generated on the database on which LogMiner is started. This is also the fastest way to start analyzing your redo log files.
SELECTING REDO LOGS AND ARCHIVE LOGS
When you run LogMiner you must specify the names of redo log files that you want to analyze. Based on options that you specify when you start LogMiner session, it retrieves information from those redo log files and returns it through V$LOGMNR_CONTENTS view.
You can run LogMiner on release 8i or later of Oracle Server but you can analyze redo log files from release 8.0 or later. To run LogMiner you must connect to the database as user with SYSDBA privileges. When you start LogMiner session in a database you can analyze redo log files, online and archived, from this database or some other database using release 8.0 or later of Oracle Server.
You should keep in mind the following when selecting redo logs for your LogMiner session:
• The redo logs must be from a release 8.0 or later Oracle database.
• LogMiner features that are available depend on Oracle database release in which redo log files are created.
• Support for LONG and LOB datatypes is available only if you run LogMiner session on release 9.2 Oracle database and analyze redo logs generated on release 9.2.
• Database on which redo logs are generated must use a database character set that is compatible with database character set of the database on which LogMiner is running.
• LogMiner can analyze only redo logs that are generated on the same hardware platform on which LogMiner is running. For example LogMiner running on Windows platform can only analyze redo logs generated on Windows platforms and not on UNIX platforms.
• To analyze redo logs LogMiner uses dictionary that is generated in the same database in which analyzed redo logs are generated.
• If you use dictionary from a flat file or stored in redo logs, then you can analyze redo logs either from the database on which LogMiner is running or from some other database.
• If you use online catalog as LogMiner dictionary then you can analyze only redo logs from the database on which LogMiner is running.
Database in which you want to start LogMiner session must be opened and you must connect to it as user with SYSDBA privileges. To specify which redo log files you want to analyze in your LogMiner session you have to use procedure DBMS_LOGMNR.ADD_LOGFILE as follows:
• To specify first redo log file to be analyzed you have to use this procedure with NEW option. For example:
SQL> execute DBMS_LOGMNR.ADD_LOGFILE( -
2 LOGFILENAME => ‘/oracle/oradata/log1Prod.dbf ’, -
3 OPTIONS => DBMS_LOGMNR.NEW );
• Each subsequent redo log file you add using the same procedure with option ADDFILE. For example:
SQL> execute DBMS_LOGMNR.ADD_LOGFILE( -
2 LOGFILENAME => ‘/oracle/oradata/log2Prod.dbf ‘, -
3 OPTIONS => DBMS_LOGMNR.ADDFILE );
In this case you can even omit option DBMS_LOGMNR.ADDFILE because it’s the default.
• If you want to remove redo log file from your LogMiner session you can do it with REMOVE option. For example:
SQL> execute DBMS_LOGMNR.ADD_LOGFILE( -
2 LOGFILENAME => ‘/oracle/oradata/log2Prod.dbf ‘, -
3 OPTIONS => DBMS_LOGMNR.REMOVEFILE );
Of course you can only remove redo log files which you have previously added, otherwise LogMiner will return an error.
Once you finish adding redo log files to your LogMiner session you can see a list of files which you have specified by querying view V$LOGMNR_LOGS.
START A LOGMINER SESSION
When you specify redo log files that you want to analyze and decide about database dictionary type you want to use you are ready to start LogMiner session. To start your LogMiner session you have to connect to the database as user with SYSDBA privileges. Specifying redo log files that you want to analyze must be done in the same database session in which you start LogMiner session. To start LogMiner session you have to use DBMS_LOGMNR.START_LOGMNR procedure. In this procedure you can specify type of database dictionary that you want to use, select information from redo log files based on a range between start and end SCN or between start and end time. If you use this range options you can limit information that LogMiner will select from redo log files into V$LOGMNR_CONTENTS view to a selected range of SCN or time. Besides that with this procedure you can specify various options for LogMiner session. Syntax of DBMS_LOGMNR.START_LOGMNR procedure is as follows:
DBMS_LOGMNR.START_LOGMNR(
startScn IN NUMBER default 0,
endScn IN NUMBER default 0,
startTime IN DATE default ’01-jan-1988’,
endTime IN DATE default ’01-jan-2988’,
DictFileName IN VARCHAR2 default ’’,
Options IN BINARY_INTEGER default 0 );
With parameters startScn and endScn you can specify a range of SCN to limit search for information from redo log files that satisfies a range of transactions. With parameters startTime and endTime you can specify a time range to limit search for information from redo log files that are created in given time range. If you specify both startScn-endEcn and startTime-endTime ranges then LogMiner will use startScn-endScn range to select rows from selected redo logs. If you omit these range parameters then LogMiner will select all the information contained in selected redo log files.
For example we can specify a range of SCN values:
SQL> execute DBMS_LOGMNR.START_LOGMNR ( -
2 STARTSCN => 12368, -
3 ENDSCN => 32862, -
4 OPTIONS => DBMS_LOGMNR.DICT_FROM_ONLINE_CATALOG);
In this session LogMiner will select from redo logs only rows that are part of transactions between 12368 and 32862 and those rows will be available in V$LOGMNR_CONTENTS view.
With parameter DictFileName you must specify the name of your dictionary file if you use dictionary stored in a flat file.
Besides that you can specify one or more of the following Options:
• COMMITED_DATA_ONLY – If set LogMiner will return only rows from redo logs that represent committed DML operations. If not set LogMiner will return rows for committed, rolled back and in-progress transactions. If this option is set LogMiner groups together all DML operations that belong to the same transaction. Transactions are returned in the order in which they were committed.
• SKIP_CORRUPTION – With this option we define that LogMiner should skip any corruptions in a redo log being analyzed and continue processing. This option works only if redo log block is corrupted and not the redo log file header. To check which redo blocks LogMiner has skipped we can check INFO column in V$LOGMNR_CONTENTS view.
• DDL_DICT_TRACKING – If LogMiner uses dictionary stored in a flat file or in redo logs, this option ensures that LogMiner will update its internal dictionary if any DDL statement is executed during a session. This option cannot be used together with DICT_FROM_ONLINE_CATALOG option.
• DICT_FROM_ONLINE_CATALOG – With this option we specify that LogMiner will use online database catalog as its dictionary. This option cannot be used together with DDL_DICT_TRACKING option.
• DICT_FROM_REDO_LOGS – With this option we specify that LogMiner will use dictionary which is stored in redo log files being analyzed.
• NO_SQL_DELIMITER – If set, the SQL delimiter (semicolon) is not placed at the end of reconstructed SQL statements.
• PRINT_PRETTY_SQL - If set, LogMiner formats reconstructed SQL statements for easier reading but these statements cannot be executed.
• CONTINUOUS_MINE – If set you only need to register one archived redo log. LogMiner automatically adds and analyzes any subsequent archived redo logs and also online catalog. This option is useful if you start LogMiner in a same instance that is generating redo logs.
To specify multiple LogMiner options when we start a session we must use syntax like in the following example:
SQL> execute DBMS_LOGMNR.START_LOGMNR ( -
2 OPTIONS => DBMS_LOGMNR.DICT_FROM_ONLINE_CATALOG + -
3 DBMS_LOGMNR.COMMITTED_DATA_ONLY );
With ‘+’ sign we can add multiple LogMiner options for one LogMiner session.
You can query V$LOGMNR_PARAMETERS views to see which LogMiner options you have set.
QUERYING REDO LOG DATA
When we start LogMiner session, it returns information from selected redo logs based on options that we have specified. Thereafter we can select returned information from V$LOGMNR_CONTENTS view. We can select information from V$LOGMNR_CONTENTS only in the same database session in which LogMiner session was started. That is because LogMiner is session based. V$LOGMNR_CONTENTS view contains historical information about changes made to the database:
• The type of change made to the database (INSERT, UPDATE, DELETE or DDL).
• The SCN at which a change was made (SCN column).
• The SCN at which a change was committed (COMMIT_SCN column).
• The transaction to which a change belongs (XIDUSN, XIDSLT, and XIDSQN columns).
• The table and schema name of the modified object (SEG_NAME and SEG_OWNER columns).
• The name of the user who issued the DDL or DML statement to make the change (USERNAME column).
• Reconstructed SQL statements showing SQL that is equivalent to the SQL used to generate the redo records (SQL_REDO column).
• Reconstructed SQL statements showing the SQL statements needed to undo the change (SQL_UNDO column).
When we execute SQL select statement against V$LOGMNR_CONTENTS view, LogMiner reads redo logs sequentially. LogMiner returns all rows in SCN order unless we specify COMMITED_DATA_ONLY option.
When you query V$LOGMNR_CONTENTS view you can restrict you query to select rows based on different segment owners, segment names, usernames and types of operations performed in the database.
For example you can issue the following select statement:
SQL> SELECT OPERATION, SQL_REDO, SQL_UNDO
2 FROM V$LOGMNR_CONTENTS
3 WHERE SEG_OWNER = ’APPL1’ AND SEG_NAME = ’CUSTOMERS’ AND
4 OPERATION = ’DELETE’ AND USERNAME = ’ACCOUNT1’;
With this statement you can see which rows from table APPL1.CUSTOMERS has user ACCOUNT1 deleted.
In SQL_REDO and SQL_UNDO columns you will get SQL statements that you can use to redo or undo statement that was originally executed. This statement will contain ROWID information about row that was changed by the operation unless database identification supplemental logging was set at the time when analyzed redo logs were generated. If this logging is set Oracle inserts additional information into redo logs to identify each change in the database with primary key or unique index columns of a row that was changed. In that case SQL_REDO and SQL_UNDO columns will contain SQL statements that identify changed row with primary key or unique index columns instead of ROWID. Only such statements can be executed in another databases and not those that contain ROWID information. Statements that contain only ROWID information can be executed only in the database in which redo logs were generated.
EXTRACTING DATA VALUES FROM REDO LOGS
With LogMiner you can make queries based on actual data values. This can be achieved with DBMS_LOGMNR.MINE_VALUE function. Function has two arguments, with first you specify whether to mine redo (REDO_VALUE) or undo (UNDO_VALUE) portion of the data. With second argument you must specify a fully-qualified name of the column to be mined. Fully-qualified means that you must specify table owner, table name and column name. For example is you want to find out all update statements that have increased value in a specified column for amount higher than 100 you can specify a following query:
SQL> SELECT SQL_REDO FROM V$LOGMNR_CONTENTS
2 WHERE
3 SEG_NAME=’TAB1’ AND
4 SEG_OWNER=’TEST’ AND
5 OPERATION=’UPDATE’ AND
6 DBMS_LOGMNR.MINE_VALUE(REDO_VALUE,’TEST.TAB1.COL1’) >
7 100 + DBMS_LOGMNR.MINE_VALUE(UNDO_VALUE,’TEST.TAB1.COL1’);
With this query we will select only those rows where new value in column COL1 (REDO_VALUE) is for more than 100 greater than old value in column COL1 (UNDO_VALUE).
If the MINE_VALUE function returns a NULL value, it can mean either:
• The specified column is not present in the redo or undo portion of the data.
• The specified column is present and has a null value.
To distinguish between these two cases, use the DBMS_LOGMNR.COLUMN_PRESENT function that returns a 1 if the column is present in the redo or undo portion of the data. Otherwise, it returns a 0.
The following usage rules apply to the MINE_VALUE and COLUMN_PRESENT functions:
• They can only be used within a LogMiner session.
• They must be invoked in the context of a select operation from the V$LOGMNR_CONTENTS view.
• They do not support LONG, LOB, ADT, or COLLECTION datatypes.
• When the column argument is of type DATE, the string that is returned is formatted in canonical form (DD-MON-YYYY HH24:MI:SS.SS) regardless of the date format of the current session.
ENDING LOGMINER SESSION
To end LogMiner session you have to execute DBMS_LOGMNR.END_LOGMNR procedure:
SQL> execute DBMS_LOGMNR.END_LOGMNR;
This procedure closes all analyzed redo logs and releases all database and system resource that LogMiner was using. If this procedure is not executed LogMiner retains all allocated resource until database session, in which LogMiner session was started, is disconnected. It is particularly important to use this procedure if DDL_DICT_TRACKING or DICT_FROM_REDO_LOGS option was used.
EXAMPLES OF LOGMINER USE
One of the possible areas where LogMiner can be used is when you want to check how many times each of your tables were modified in a specified time range. To get this information you can use the following query:
SQL> SELECT SEG_OWNER, SEG_NAME, COUNT(*) AS Hits FROM
2 V$LOGMNR_CONTENTS WHERE SEG_NAME NOT LIKE '%$' GROUP BY
3 SEG_OWNER, SEG_NAME;
You will get number of hits for each of your tables and can use this information for performance analysis or capacity planning.
For example, with LogMiner, you can get information about delete operations performed by a particular user on a particular table. If you want to find out all delete operations performed by user CLERK on a table ORDERS you must issue the following query:
SQL> SELECT OPERATION, SQL_REDO, SQL_UNDO
2 FROM V$LOGMNR_CONTENTS
3 WHERE SEG_OWNER = ’SALES’ AND SEG_NAME = ’ORDERS’ AND
4 OPERATION = ’DELETE’ AND USERNAME = ’CLERK’;
Using information retrieved in SQL_UNDO column you can undo any delete operations if necessary.
NEW FEATURES IN ORACLE9I DATABASE RELEASE2
With release 9.2 of Oracle server some new features of LogMiner are introduced and some default behaviors are changed:
• Added support for LONG and LOB datatypes for redo log files generated on release 9.2 of Oracle server
• Database supplemental logging is turned off by default. In release 9.0.1 logging was turned on by default.
• Database and table supplemental logging are both available
• Two new options have been added for the formatting of returned data: DBMS_LOGMNR.NO_SQL_DELIMITER and DBMS_LOGMNR.PRINT_PRETTY_SQL.
• New option DBMS_LOGMNR.CONTINUOUS_MINE was added that enables automatic adding and mining of any newly archived redo log file after LogMiner session was started.
• Option DBMS_LOGMNR.NO_DICT_RESET_ONSELECT is no longer necessary
• New procedure DBMS_LOGMNR_D.SET_TABLESPACE that recreates all LogMiner tables in a specified tablespace and not in SYSTEM tablespace that is a default.
All LogMiner features described in this paper are based on functionality available with release 9.2 of Oracle server.
RESTRICTIONS
The following restrictions apply when you use LogMiner:
• The following are not supported:
– Simple and nested abstract datatypes (ADTs)
– Collections (nested tables and VARRAYs)
– Object Refs
– Index organized tables (IOTs)
– CREATE TABLE AS SELECT of a table with a clustered key
• LogMiner runs only on databases of release 8.1 or higher, but you can use it to analyze redo logs from release 8.0 databases. However, the information that LogMiner is able to retrieve from a redo log depends on the version of the database where redo log is generated, not the version of the database where LogMiner is started.
• The following features require that supplemental logging be turned on:
– Support for index clusters, chained rows, and migrated rows
– Support for direct-path inserts (database must use ARCHIVELOG mode).
– Extracting the data dictionary into the redo logs.
– DDL tracking.
– Generating SQL_REDO and SQL_UNDO with primary key information for updates.
– LONG and LOB datatypes are supported only if supplemental logging is enabled.
USING LOGMINER VIEWER
LogMiner Viewer is first introduced with release 9i of Oracle Server. It is part of Oracle Enterprise Manager and enables mining information from redo logs from a GUI environment. This Viewer uses the same PL/SQL procedures and dynamic views as command line version of LogMiner does. It can be started from Oracle Enterprise Manager console in either standalone or management server mode. Viewer uses Enterprise Manager repository to store queries that user creates so that this queries can be used again in the future. Basic LogMiner Viewer screen is Create Query screen as you can see it in Figure 2.
Figure 2 LogMiner Viewer Create Query screen
With LogMiner Viewer you can:
• Create Query with graphical or textual filter, execute and store it in a repository
• Specify the number of rows to be retrieved (maximum is 5000)
• Specify time or SCN ranges for mining session
• Specify redo log files to be analyzed
• Generate and set database dictionary
• Choose columns, from v$logmnr_contents view that you want to select in your query
• Specify various LogMiner options (show committed transactions only, skip redo log file corruptions, include SQL delimiter, use LogMiner to format statements, update internal dictionary with DDL changes)
• Store values in SQL_REDO or SQL_UNDO columns in all or selected retrieved rows to a flat file.
• View values in SQL_REDO and SQL_UNDO columns in each retrieved row
• Store all rows to a flat file in HTML, comma separated values or text format
To sum up LogMiner Viewer provides all LogMiner functionality in a user-friendlier environment. It adds some more features like the ability to save queries in a repository and to store query results to a flat file.
BIBLIOGRAPHY
1. Oracle9i Database Administrator’s Guide Release 2 (9.2), Part No. A96521-01
2. Oracle9i Supplied PL/SQL Packages and Types Reference Release 2 (9.2), Part No. A96612-01
3. Oracle9i SQL Reference Release 2 (9.2), Part No. A96540-01
4. Oracle9i Database Reference Release 2 (9.2), Part No. A96536-01
5. Oracle9i LogMiner Database Analysis Tool, Features Overview
6. LogMiner Viewer, Online Help
Saturday, April 10, 2010
LOADING AND UNLOADING DATA USING EXTERNAL TABLES
In this post I want to publish my presentation from IOUG 2005 conference about external tables in Oracle database
EXTERNAL TABLES CONCEPTS
INTRODUCTION
External tables feature was first introduced with Oracle9i Server. This feature enables accessing data sources outside Oracle database in the same way as regular database tables. External tables are defined as tables with data source outside of Oracle database. Data in external source can be in any format supported by access drivers available in Oracle Server. Oracle 10G Server currently provides two access drivers: ORACLE_LOADER that is suitable for loading data from text files into regular tables in Oracle database and ORACLE_DATAPUMP that is suitable for both unloading data to and loading data from binary files. To create external table in Oracle database we must specify column definitions and access parameters that are used to describe mapping between external data and Oracle column definitions. External table definition neither describes data stored in the database nor how are the data stored in external source. Part of metadata information for external table is access driver that is responsible to transform data from external source and map them to external table column types. Metadata information about external table is stored in Oracle data dictionary and is used to enable accessing data from this table with standard SELECT statements. Writing data to external tables is only possible if they are created with ORACLE_DATAPUMP access driver and only with CREATE TABLE statement. Once the external table is created and initially populated with data no further modification of data is possible.
Loading data from external table refers to reading data from external source (external datafile) and loading them into a table in the database. Unloading data into external tables refers to reading data from database table and inserting them into external table.
Data in external tables can be queried directly with SELECT statements. We can perform select, join and sort operations with data from external tables and this operations can be parallelized to improve performance. Data can be initially inserted into external table only in CREATE TABLE .. AS SELECT statement with ORACLE_DATAPUMP access driver. Thereafter no further DML operations are allowed on external tables. Synonyms and views can be created for external tables but no indexes.
PRIVILEGES REQUIRED TO CREATE EXTERNAL TABLE
Access driver runs inside database server and must have access to the operating system files:
• Server must have read access to all the dump files with external table data that we want to select or load into database tables
• Server must have write access to operating system directory to create the logfile, bad file, discard file (with ORACLE_LOADER access driver) and dump file (with ORACLE_DATAPUMP access driver)
Prerequisite for creating external table is that a database user with CREATE ANY DIRECTORY must create a directory object as an alias for a directory on a server where external table data will be stored. User SYS owns all the directory objects but he can grant other users the privilege to create directory objects. User who creates a directory object in Oracle database is automatically granted READ and WRITE privileges on a directory and can grant these privileges to other database users that need to access or create files in this directory. Operating system directory to which Oracle directory object refers to must exist before directory object can be created. Besides that system administrator must set appropriate read/write privileges on this operating system directory so that Oracle Database processes can access it.
For example to create Oracle directory object EXTTABDIR as an alias for operating system directory /oracle/exttabs we will use the following command:
CREATE DIRECTORY EXTTABDIR AS ‘/oracle/exttabs’;
If a user JONES wants to load data from external files in EXTTABDIR directory we must grant him READ access to this directory with the command:
GRANT READ ON EXTTABDIR TO JONES;
If a user JONES wants to unload data to external files (with ORACLE_DATAPUMP driver) in EXTTABDIR directory we must grant him WRITE access to this directory with the command:
GRANT WRITE ON EXTTABDIR TO JONES;
Now user JONES can create external tables provided that CREATE TABLE system privilege has been granted to him. External tables are created with CREATE TABLE … ORGANIZATION EXTERNAL command and detailed syntax will be described for ORACLE_LOADER and ORACLE_DATAPUMP drivers in the following chapters.
CREATING EXTERNAL TABLES
Create external table statement consists of the following typical parts:
CREATE TABLE schema.name
(column definitions)
ORGANIZATION EXTERNAL
(TYPE access driver – ORACLE_LOADER or ORACLE_DATAPUMP
DEFAULT DIRECTORY directory
ACCESS PARAMETERS (access parameter values)
LOCATION (‘file1.dat’,’file2.dat’…’filen.dat’)
)
PARALLEL
REJECT LIMIT UNLIMITED
AS SELECT … FROM … WHERE … -- this can be specified only with ORACLE_DATAPUMP access driver
In column definitions part we specify column names and datatypes like with any other Oracle relational table.
ORGANIZATION_EXTERNAL clause defines that we want to create external table
With TYPE we must specify which access driver we are going to use to access data in external table.
DEFAULT DIRECTORY clause specifies operating system directory where data for external table will be stored.
Each of access drivers has ACCESS PARAMETERS that define how to access data from external files
In LOCATION clause we must specify names, one or more, of files where data will be stored.
The PARALLEL clause enables parallel query of data in external table.
REJECT LIMIT clause specifies that there is no limit on errors that occur when we query data from external table.
With AS SELECT clause, if we use ORACLE_DATAPUMP access driver, data can be initially inserted into external table.
When we create external table we must observe the following restrictions:
• external table cannot be a temporary table
• you cannot specify a constraint on external table
• an external table cannot have object type, varray or LONG columns
ALTERING EXTERNAL TABLES
With ALTER TABLE command we can add, modify and delete columns in external tables. besides that with alter table we can change: PARALLEL, DEFAULT DIRECTORY, ACCESS PARAMETERS, LOCATION, REJECT LIMIT and PROJECT COLUMN clauses.
PROJECT COLUMN clause defines how access driver validates rows of data selected from external tables. Default value is PROJECT COLUMN REFERENCED and that means that access driver processes only columns explicitly specified in select list. The other possibility is to specify PROJECT COLUMNS ALL when access processes all the columns in a row and rejects row if any column value raises an error during data conversion.
The following restrictions apply when we alter external table:
• we cannot add LONG, LOB or object type column nor can we change existing column datatypes to any of the specified types
• we cannot modify storage parameters
DROPPING EXTERNAL TABLES
When you drop external table this only removes metadata from data dictionary but it does not remove external dump datafiles.
DATA DICTIONARY VIEWS FOR EXTERNAL TABLES
In data dictionary there are some additional dictionary views that contain information that is specific for external tables:
• DBA_ (ALL_ , USER_) EXTERNAL_TABLES – attributes specific to external tables
• DBA_ (ALL_ , USER_) EXTERNAL_LOCATIONS – list of data sources for external tables
COLLECTING STATISTICS FOR EXTERNAL TABLES
Statistics for external tables are not collected with GATHER_SCHEMA_STATS and GATHER_DATABASE_STATS procedures. With a procedure GATHER_TABLE_STATS, statistics for individual external tables can be collected but sampling on external tables is not supported so ESTIMATE_PERCENT must be set to NULL.
DATATYPE CONVERSION
When data is loaded or unloaded with external tables it is possible that the same column will have a different datatype in:
• The database
o source when data is unloaded into an external table
o destination when data is loaded from an external table
• The external table
o during unload data from the database is converted to match the column datatype in the external table
o during load data from an external table is converted to match the column datatype in the database
• The external datafile
o during unload datatypes of fields in the datafile exactly match datatypes of fields in the external table
o during load datatypes of fields in the datafile are converted to datatypes of fields in the external files. If there is a conversion error then row containing this column is not loaded
If conversion error occurs between datafile and external table then row with an error will be ignored. The whole operation will terminate unsuccessfully if conversion error occurs between external table and the column in the database.
DUMP FILE FORMAT
Dump files that contain data for external table are stored in Oracle proprietary binary format. You cannot edit this file with text editor but if you try to select readable strings from file with UNIX command strings you can see there is an XML definition of table name, character set, column names and datatypes followed by rows of data. It looks something like this:
NEW FEATURES IN 10G
INTRODUCTION
External table feature was first introduced in 9i version of Oracle Server. In this version only ORACLE_LOADER access driver was available for external tables that have enabled read-only access to the data in external text files with standard SELECT statements. With this driver it was only possible to select data from such external tables and insert them in regular database tables.
DATA PUMP TECHNOLOGY
In Oracle 10g external table features are extended thanks to the new Data Pump mechanism that was first introduced in this version. Data Pump technology enables very fast movement of data and metadata between different Oracle databases. This technology is a basis for new database utilities expdp and impdp that have similar function as export and import utilities but data and metadata movement with expdp and impdp is orders of magnitude faster enabling movement of huge quantities of data in a very short time. Expdp and impdp are server based database utilities that are using jobs and Data Pump API for data and metadata movement. Expdp creates dump file in Oracle proprietary binary format that can only impdp read. Expdp and impdp have the similar command line parameters as export and import utilities to specify data and metadata filters when we want to move only a subset of data in the database.
Oracle Data Pump can use direct path or external tables path to load or unload data and metadata in Oracle database. Data Pump uses external tables access path in the following cases:
• Loading and unloading very large tables and partitions with parallel SQL
• Loading tables with global or domain indexes defined on them, including partitioned object tables
• Loading tables with active triggers or clustered tables
• Loading and unloading tables with encrypted columns
• Loading tables with fine-grained access control enabled for inserts
• Loading tables that are partitioned differently at load time and unload time
EXTERNAL TABLE ENHANCEMENTS
External table feature benefits also from this new Data Pump technology because new ORACLE_DATAPUMP access driver is introduced that is based on Data Pump API. With this new access driver functionality of external tables is extended with the capability to unload data from database tables into external table. So it is now possible to select data from arbitrary number of database tables using complex query conditions and write them into external table. External table data are stored in a set of operating system files that Oracle creates in a proprietary binary format. This set of files can thereafter be transferred to a different database server, possibly with a different operating system. Based on this transferred files external table can be created in a new database and data can be loaded from this external table into regular database tables. This feature is particularly useful for data warehouse type of applications because you can transform data from source database, store it in external table, move external table files to warehouse database and load them into database tables.
CONFIGURING EXTERNAL TABLES WITH ORACLE_LOADER ACCESS DRIVER
Record parsing of external tables created with ORACLE_LOADER access driver is similar with SQL*Loader but performance reasons can determine which of these methods we are going to use in a particular situation.
For best performance we will use external table in the following situations:
• you want to transform the data during data load operation
• you want to use transparent parallel processing (no need to split external data)
Table 1 Differences between external tables and SQL*Loader
SQL*Loader performance is better in the following situations:
• loading remote data
• data transformation and parallel processing are not required
• additional indexing of staging table is required
If table is created with this access driver we can query data that are stored in external text datafile and load them into database table. When we create external table we must define access parameters to describe how access driver will access external table data. Access parameter definition contains information about record and field formatting of data stored in an external text datafile. Syntax of access parameters for external table is similar to SQL*Loader control file with differences specified in Table 1.
With access parameters we can specify the following characteristics:
• comments
• record_format_info – information about the record, such as its format, the character set of the data, load exclusion rules
• field_definitions – describes fields in the datafile
• column_transforms – define mapping between columns in the external table and columns in the datafile
With ORACLE_LOADER access driver the following record formats can be processed:
• FIXED length – all records are having the same length defined with length parameter
• VARIABLE size – records are variable size and size parameter defines the number of bytes from the beginning of the record that contain record length in bytes
• DELIMITED BY string – string indicate the characters that identify the end of a record
External table with fixed length records can be created like in the following example:
CREATE TABLE actors
…
ACCESS PARAMETERS (RECORDS FIXED 20 FIELDS
(first_name CHAR(7), last_name CHAR(8), year_of_birth CHAR(4)))
…
Dump file:
Clint Eastwood1930
James Stewart 1908
John Wayne 1907
External table with variable length records can be created like in the following example:
CREATE TABLE actors
…
ACCESS PARAMETERS (RECORDS VARIABLE 2 FIELDS TERMINATED BY ','
(first_name CHAR(7), last_name CHAR(8), year_of_birth CHAR(4)))
…
Dump file:
21Clint,Eastwood,1930,
20James,Stewart,1908,
17John,Wayne,1907,
External table with delimited records can be created like in the following example:
CREATE TABLE actors
…
ACCESS PARAMETERS (RECORDS DELIMITED BY '' FIELDS TERMINATED BY ','
(first_name CHAR(7), last_name CHAR(8), year_of_birth CHAR(4)))
…
Dump file:
Clint,Eastwood,1930James,Stewart,1908John,Wayne,1907
If the field_definitions clause is omitted, then:
• The fields are assumed to be delimited by ','
• The fields are assumed to be character type
• The maximum length of the field is assumed to be 255
• The order of the fields in the datafile is the order in which the fields were defined in the external table
• No blanks are trimmed from the field
With field_definitions we can specify the following field characteristics:
• delim_spec clause is used to identify how all fields are terminated in the record.
• The trim_spec clause specifies the type of whitespace trimming to be performed by default on all character fields.
• MISSING FIELD VALUES ARE NULL indicates that if there is not enough data in a record for all fields, then those fields with missing data values are set to NULL.
• REJECT ROWS WITH ALL NULL FIELDS indicates that a row will not be loaded into the external table if all referenced fields in the row are null.
• The field_list clause identifies the fields in the datafile and their datatypes.
CONFIGURING EXTERNAL TABLES WITH ORACLE_DATAPUMP ACCESS DRIVER
For external tables created with ORACLE_DATAPUMP access driver you can specify the following access parameters:
• comments
• LOGFILE NOLOGFILE – specify name for logfile where messages are generated when dump files are accesses. With NOLOGFILE we can specify that no logfile will be generated.
• VERSION – we must specify the version of the database where data from dump file will be read if this version is different than the version of the database where data were unloaded.
Data can be inserted into external table only with CREATE TABLE … AS SELECT statement when we create table. After that no further DML operations on data in external table are allowed. In this case Oracle will raise an error like in the following example:
DELETE FROM sales_results WHERE product_id=586243;
*
ERROR at line 1:
ORA-30657: operation not supported on external organized table
LOADING AND UNLOADING DATA
With ORACLE_DATAPUMP driver data can be unloaded from one database and loaded into another database. This can be achieved with the following steps:
1. Create external table in source database and populate it with data that we want to unload.
SQL> CREATE TABLE sales_results
2 ORGANIZATION EXTERNAL
3 (
4 TYPE ORACLE_DATAPUMP
5 DEFAULT DIRECTORY pump_dir
6 LOCATION ('sales.dmp')
7 )
8 AS SELECT * FROM sales_figures;
Once external table is created and populated with data no further DML operations are allowed.
2. Transfer sales.dmp dump file to target database server.
3. In a target database create external table with sales.dmp as dump file
SQL> CREATE TABLE sales_results
2 (
3 product_id NUMBER(6),
4 price NUMBER(3),
5 quantity_sold NUMBER(8)
6 )
7 ORGANIZATION EXTERNAL
8 (
9 TYPE ORACLE_DATAPUMP
10 DEFAULT DIRECTORY dumpdir
11 LOCATION ('sales.dmp')
12 );
Table created.
4. In target database data from external table can only be selected and inserted in other database tables
INSERT INTO sales_summary SELECT * FROM sales_results;
COMBINING DUMP FILES
Dump files loaded with data for one external table can be used as data source for another external table. We can unload data from different production databases into separate dump files and use all this files to create external table in data warehouse database and load data into database tables. Using this feature we can unload data from multiple data sources and load them into one target database.
To use multiple dump file in the LOCATION clause of external table the following must be true:
• metadata in all dump files must be exactly the same
• character set, time zone, schema name, table name and column names must all match
• columns must be defined in the same order with exactly the same datatypes
For example, we want to unload data from oe.inventories table in two source databases and load them into one target database. Table oe.inventories has the following definition in the first and the second source databases
SQL> DESCRIBE inventories_xt
Name Null? Type
----------------------------------------- -------- ----------------------------
PRODUCT_ID NOT NULL NUMBER(6)
WAREHOUSE_ID NOT NULL NUMBER(3)
QUANTITY_ON_HAND NOT NULL NUMBER(8)
This can be done when we do the following:
1. In first source database we create external table
SQL> CREATE TABLE all_inventories
2 ORGANIZATION EXTERNAL
3 (
4 TYPE ORACLE_DATAPUMP
5 DEFAULT DIRECTORY dir1
6 LOCATION (all_inv1.dmp')
7 )
8 AS SELECT * FROM oe.inventories WHERE warehouse_id =1;
Table created.
2. In second source database we create external table:
SQL> CREATE TABLE all_inventories
2 ORGANIZATION EXTERNAL
3 (
4 TYPE ORACLE_DATAPUMP
5 DEFAULT dir2
6 LOCATION ('all_inv2.dmp')
7 )
8 AS SELECT * FROM oe.inventories WHERE warehouse_id = 2;
Table created.
3. Transfer dump files all_inv1.dmp and all_inv2.dmp to the target database and create external table
SQL> CREATE TABLE ext_inventories
2 (
3 PRODUCT_ID NUMBER(6),
4 WAREHOUSE_ID NUMBER(3),
5 QUANTITY_ON_HAND NUMBER(8)
6 )
7 ORGANIZATION EXTERNAL
8 (
9 TYPE ORACLE_DATAPUMP
10 DEFAULT DIRECTORY wh_dir
11 LOCATION ('all_inv1.dmp','all_inv2.dmp')
12 );
Table created.
4. Now table ext_inventories contains all the data selected from oe.inventories tables in the first and the second source databases. Data can only be selected from this table and inserted into target database tables. This is just a simple example but data can be transformed both during unload, in source databases, and load, in target database, making this feature particularly useful for data warehouse applications.
CONCLUSIONS
External tables are useful feature to load and unload data stored in external datafiles. In version 9i it was only possible to load data from external text datafiles into database tables using ORACLE_LOADER access driver. This feature is substantially enhanced in version 10g with the ability to unload data from database tables and write them to external datafiles. New functionality is enabled with introduction of new Data Pump technology so that for external table creation we can use new ORACLE_DATAPUMP access driver. Data warehouse applications are very suitable for external tables use. Using them we can transform and unload data from multiple tables in multiple production databases and store them in external datafile. Data are stored in an external file in Oracle proprietary binary format that is platform independent. That means that external datafile created on one platform can be used as datasource on a different platform with different endian and character set. When we transfer external datafiles from multiple production databases to a data warehouse target database, all these files can be used as a data source for external table. We can combine multiple external datafiles for one new external table provided that metadata table definition is the same in all the datafiles. When we create external table in a warehouse database we can select data from this table, transform them and insert them into database tables.
BIBLIOGRAPHY
1. Oracle Database Administrator’s Guide, 10g Release 1 (10.1), Part No. B10739-01
2. Oracle Database Concepts, 10g Release 1 (10.1), Part No. B10743-01
3. Oracle Database Data Warehousing Guide, 10g Release 1 (10.1), Part No. B10736-01
4. Oracle Database New Features Guide, 10g Release 1 (10.1), Part No. B10750-01
5. Oracle Database SQL Reference, 10g Release 1 (10.1), Part No. B10759-01
6. Oracle Database Utilities, 10g Release 1 (10.1), Part No. B10825-01
EXTERNAL TABLES CONCEPTS
INTRODUCTION
External tables feature was first introduced with Oracle9i Server. This feature enables accessing data sources outside Oracle database in the same way as regular database tables. External tables are defined as tables with data source outside of Oracle database. Data in external source can be in any format supported by access drivers available in Oracle Server. Oracle 10G Server currently provides two access drivers: ORACLE_LOADER that is suitable for loading data from text files into regular tables in Oracle database and ORACLE_DATAPUMP that is suitable for both unloading data to and loading data from binary files. To create external table in Oracle database we must specify column definitions and access parameters that are used to describe mapping between external data and Oracle column definitions. External table definition neither describes data stored in the database nor how are the data stored in external source. Part of metadata information for external table is access driver that is responsible to transform data from external source and map them to external table column types. Metadata information about external table is stored in Oracle data dictionary and is used to enable accessing data from this table with standard SELECT statements. Writing data to external tables is only possible if they are created with ORACLE_DATAPUMP access driver and only with CREATE TABLE statement. Once the external table is created and initially populated with data no further modification of data is possible.
Loading data from external table refers to reading data from external source (external datafile) and loading them into a table in the database. Unloading data into external tables refers to reading data from database table and inserting them into external table.
Data in external tables can be queried directly with SELECT statements. We can perform select, join and sort operations with data from external tables and this operations can be parallelized to improve performance. Data can be initially inserted into external table only in CREATE TABLE .. AS SELECT statement with ORACLE_DATAPUMP access driver. Thereafter no further DML operations are allowed on external tables. Synonyms and views can be created for external tables but no indexes.
PRIVILEGES REQUIRED TO CREATE EXTERNAL TABLE
Access driver runs inside database server and must have access to the operating system files:
• Server must have read access to all the dump files with external table data that we want to select or load into database tables
• Server must have write access to operating system directory to create the logfile, bad file, discard file (with ORACLE_LOADER access driver) and dump file (with ORACLE_DATAPUMP access driver)
Prerequisite for creating external table is that a database user with CREATE ANY DIRECTORY must create a directory object as an alias for a directory on a server where external table data will be stored. User SYS owns all the directory objects but he can grant other users the privilege to create directory objects. User who creates a directory object in Oracle database is automatically granted READ and WRITE privileges on a directory and can grant these privileges to other database users that need to access or create files in this directory. Operating system directory to which Oracle directory object refers to must exist before directory object can be created. Besides that system administrator must set appropriate read/write privileges on this operating system directory so that Oracle Database processes can access it.
For example to create Oracle directory object EXTTABDIR as an alias for operating system directory /oracle/exttabs we will use the following command:
CREATE DIRECTORY EXTTABDIR AS ‘/oracle/exttabs’;
If a user JONES wants to load data from external files in EXTTABDIR directory we must grant him READ access to this directory with the command:
GRANT READ ON EXTTABDIR TO JONES;
If a user JONES wants to unload data to external files (with ORACLE_DATAPUMP driver) in EXTTABDIR directory we must grant him WRITE access to this directory with the command:
GRANT WRITE ON EXTTABDIR TO JONES;
Now user JONES can create external tables provided that CREATE TABLE system privilege has been granted to him. External tables are created with CREATE TABLE … ORGANIZATION EXTERNAL command and detailed syntax will be described for ORACLE_LOADER and ORACLE_DATAPUMP drivers in the following chapters.
CREATING EXTERNAL TABLES
Create external table statement consists of the following typical parts:
CREATE TABLE schema.name
(column definitions)
ORGANIZATION EXTERNAL
(TYPE access driver – ORACLE_LOADER or ORACLE_DATAPUMP
DEFAULT DIRECTORY directory
ACCESS PARAMETERS (access parameter values)
LOCATION (‘file1.dat’,’file2.dat’…’filen.dat’)
)
PARALLEL
REJECT LIMIT UNLIMITED
AS SELECT … FROM … WHERE … -- this can be specified only with ORACLE_DATAPUMP access driver
In column definitions part we specify column names and datatypes like with any other Oracle relational table.
ORGANIZATION_EXTERNAL clause defines that we want to create external table
With TYPE we must specify which access driver we are going to use to access data in external table.
DEFAULT DIRECTORY clause specifies operating system directory where data for external table will be stored.
Each of access drivers has ACCESS PARAMETERS that define how to access data from external files
In LOCATION clause we must specify names, one or more, of files where data will be stored.
The PARALLEL clause enables parallel query of data in external table.
REJECT LIMIT clause specifies that there is no limit on errors that occur when we query data from external table.
With AS SELECT clause, if we use ORACLE_DATAPUMP access driver, data can be initially inserted into external table.
When we create external table we must observe the following restrictions:
• external table cannot be a temporary table
• you cannot specify a constraint on external table
• an external table cannot have object type, varray or LONG columns
ALTERING EXTERNAL TABLES
With ALTER TABLE command we can add, modify and delete columns in external tables. besides that with alter table we can change: PARALLEL, DEFAULT DIRECTORY, ACCESS PARAMETERS, LOCATION, REJECT LIMIT and PROJECT COLUMN clauses.
PROJECT COLUMN clause defines how access driver validates rows of data selected from external tables. Default value is PROJECT COLUMN REFERENCED and that means that access driver processes only columns explicitly specified in select list. The other possibility is to specify PROJECT COLUMNS ALL when access processes all the columns in a row and rejects row if any column value raises an error during data conversion.
The following restrictions apply when we alter external table:
• we cannot add LONG, LOB or object type column nor can we change existing column datatypes to any of the specified types
• we cannot modify storage parameters
DROPPING EXTERNAL TABLES
When you drop external table this only removes metadata from data dictionary but it does not remove external dump datafiles.
DATA DICTIONARY VIEWS FOR EXTERNAL TABLES
In data dictionary there are some additional dictionary views that contain information that is specific for external tables:
• DBA_ (ALL_ , USER_) EXTERNAL_TABLES – attributes specific to external tables
• DBA_ (ALL_ , USER_) EXTERNAL_LOCATIONS – list of data sources for external tables
COLLECTING STATISTICS FOR EXTERNAL TABLES
Statistics for external tables are not collected with GATHER_SCHEMA_STATS and GATHER_DATABASE_STATS procedures. With a procedure GATHER_TABLE_STATS, statistics for individual external tables can be collected but sampling on external tables is not supported so ESTIMATE_PERCENT must be set to NULL.
DATATYPE CONVERSION
When data is loaded or unloaded with external tables it is possible that the same column will have a different datatype in:
• The database
o source when data is unloaded into an external table
o destination when data is loaded from an external table
• The external table
o during unload data from the database is converted to match the column datatype in the external table
o during load data from an external table is converted to match the column datatype in the database
• The external datafile
o during unload datatypes of fields in the datafile exactly match datatypes of fields in the external table
o during load datatypes of fields in the datafile are converted to datatypes of fields in the external files. If there is a conversion error then row containing this column is not loaded
If conversion error occurs between datafile and external table then row with an error will be ignored. The whole operation will terminate unsuccessfully if conversion error occurs between external table and the column in the database.
DUMP FILE FORMAT
Dump files that contain data for external table are stored in Oracle proprietary binary format. You cannot edit this file with text editor but if you try to select readable strings from file with UNIX command strings you can see there is an XML definition of table name, character set, column names and datatypes followed by rows of data. It looks something like this:
NEW FEATURES IN 10G
INTRODUCTION
External table feature was first introduced in 9i version of Oracle Server. In this version only ORACLE_LOADER access driver was available for external tables that have enabled read-only access to the data in external text files with standard SELECT statements. With this driver it was only possible to select data from such external tables and insert them in regular database tables.
DATA PUMP TECHNOLOGY
In Oracle 10g external table features are extended thanks to the new Data Pump mechanism that was first introduced in this version. Data Pump technology enables very fast movement of data and metadata between different Oracle databases. This technology is a basis for new database utilities expdp and impdp that have similar function as export and import utilities but data and metadata movement with expdp and impdp is orders of magnitude faster enabling movement of huge quantities of data in a very short time. Expdp and impdp are server based database utilities that are using jobs and Data Pump API for data and metadata movement. Expdp creates dump file in Oracle proprietary binary format that can only impdp read. Expdp and impdp have the similar command line parameters as export and import utilities to specify data and metadata filters when we want to move only a subset of data in the database.
Oracle Data Pump can use direct path or external tables path to load or unload data and metadata in Oracle database. Data Pump uses external tables access path in the following cases:
• Loading and unloading very large tables and partitions with parallel SQL
• Loading tables with global or domain indexes defined on them, including partitioned object tables
• Loading tables with active triggers or clustered tables
• Loading and unloading tables with encrypted columns
• Loading tables with fine-grained access control enabled for inserts
• Loading tables that are partitioned differently at load time and unload time
EXTERNAL TABLE ENHANCEMENTS
External table feature benefits also from this new Data Pump technology because new ORACLE_DATAPUMP access driver is introduced that is based on Data Pump API. With this new access driver functionality of external tables is extended with the capability to unload data from database tables into external table. So it is now possible to select data from arbitrary number of database tables using complex query conditions and write them into external table. External table data are stored in a set of operating system files that Oracle creates in a proprietary binary format. This set of files can thereafter be transferred to a different database server, possibly with a different operating system. Based on this transferred files external table can be created in a new database and data can be loaded from this external table into regular database tables. This feature is particularly useful for data warehouse type of applications because you can transform data from source database, store it in external table, move external table files to warehouse database and load them into database tables.
CONFIGURING EXTERNAL TABLES WITH ORACLE_LOADER ACCESS DRIVER
Record parsing of external tables created with ORACLE_LOADER access driver is similar with SQL*Loader but performance reasons can determine which of these methods we are going to use in a particular situation.
For best performance we will use external table in the following situations:
• you want to transform the data during data load operation
• you want to use transparent parallel processing (no need to split external data)
Table 1 Differences between external tables and SQL*Loader
SQL*Loader performance is better in the following situations:
• loading remote data
• data transformation and parallel processing are not required
• additional indexing of staging table is required
If table is created with this access driver we can query data that are stored in external text datafile and load them into database table. When we create external table we must define access parameters to describe how access driver will access external table data. Access parameter definition contains information about record and field formatting of data stored in an external text datafile. Syntax of access parameters for external table is similar to SQL*Loader control file with differences specified in Table 1.
With access parameters we can specify the following characteristics:
• comments
• record_format_info – information about the record, such as its format, the character set of the data, load exclusion rules
• field_definitions – describes fields in the datafile
• column_transforms – define mapping between columns in the external table and columns in the datafile
With ORACLE_LOADER access driver the following record formats can be processed:
• FIXED length – all records are having the same length defined with length parameter
• VARIABLE size – records are variable size and size parameter defines the number of bytes from the beginning of the record that contain record length in bytes
• DELIMITED BY string – string indicate the characters that identify the end of a record
External table with fixed length records can be created like in the following example:
CREATE TABLE actors
…
ACCESS PARAMETERS (RECORDS FIXED 20 FIELDS
(first_name CHAR(7), last_name CHAR(8), year_of_birth CHAR(4)))
…
Dump file:
Clint Eastwood1930
James Stewart 1908
John Wayne 1907
External table with variable length records can be created like in the following example:
CREATE TABLE actors
…
ACCESS PARAMETERS (RECORDS VARIABLE 2 FIELDS TERMINATED BY ','
(first_name CHAR(7), last_name CHAR(8), year_of_birth CHAR(4)))
…
Dump file:
21Clint,Eastwood,1930,
20James,Stewart,1908,
17John,Wayne,1907,
External table with delimited records can be created like in the following example:
CREATE TABLE actors
…
ACCESS PARAMETERS (RECORDS DELIMITED BY '' FIELDS TERMINATED BY ','
(first_name CHAR(7), last_name CHAR(8), year_of_birth CHAR(4)))
…
Dump file:
Clint,Eastwood,1930James,Stewart,1908John,Wayne,1907
If the field_definitions clause is omitted, then:
• The fields are assumed to be delimited by ','
• The fields are assumed to be character type
• The maximum length of the field is assumed to be 255
• The order of the fields in the datafile is the order in which the fields were defined in the external table
• No blanks are trimmed from the field
With field_definitions we can specify the following field characteristics:
• delim_spec clause is used to identify how all fields are terminated in the record.
• The trim_spec clause specifies the type of whitespace trimming to be performed by default on all character fields.
• MISSING FIELD VALUES ARE NULL indicates that if there is not enough data in a record for all fields, then those fields with missing data values are set to NULL.
• REJECT ROWS WITH ALL NULL FIELDS indicates that a row will not be loaded into the external table if all referenced fields in the row are null.
• The field_list clause identifies the fields in the datafile and their datatypes.
CONFIGURING EXTERNAL TABLES WITH ORACLE_DATAPUMP ACCESS DRIVER
For external tables created with ORACLE_DATAPUMP access driver you can specify the following access parameters:
• comments
• LOGFILE NOLOGFILE – specify name for logfile where messages are generated when dump files are accesses. With NOLOGFILE we can specify that no logfile will be generated.
• VERSION – we must specify the version of the database where data from dump file will be read if this version is different than the version of the database where data were unloaded.
Data can be inserted into external table only with CREATE TABLE … AS SELECT statement when we create table. After that no further DML operations on data in external table are allowed. In this case Oracle will raise an error like in the following example:
DELETE FROM sales_results WHERE product_id=586243;
*
ERROR at line 1:
ORA-30657: operation not supported on external organized table
LOADING AND UNLOADING DATA
With ORACLE_DATAPUMP driver data can be unloaded from one database and loaded into another database. This can be achieved with the following steps:
1. Create external table in source database and populate it with data that we want to unload.
SQL> CREATE TABLE sales_results
2 ORGANIZATION EXTERNAL
3 (
4 TYPE ORACLE_DATAPUMP
5 DEFAULT DIRECTORY pump_dir
6 LOCATION ('sales.dmp')
7 )
8 AS SELECT * FROM sales_figures;
Once external table is created and populated with data no further DML operations are allowed.
2. Transfer sales.dmp dump file to target database server.
3. In a target database create external table with sales.dmp as dump file
SQL> CREATE TABLE sales_results
2 (
3 product_id NUMBER(6),
4 price NUMBER(3),
5 quantity_sold NUMBER(8)
6 )
7 ORGANIZATION EXTERNAL
8 (
9 TYPE ORACLE_DATAPUMP
10 DEFAULT DIRECTORY dumpdir
11 LOCATION ('sales.dmp')
12 );
Table created.
4. In target database data from external table can only be selected and inserted in other database tables
INSERT INTO sales_summary SELECT * FROM sales_results;
COMBINING DUMP FILES
Dump files loaded with data for one external table can be used as data source for another external table. We can unload data from different production databases into separate dump files and use all this files to create external table in data warehouse database and load data into database tables. Using this feature we can unload data from multiple data sources and load them into one target database.
To use multiple dump file in the LOCATION clause of external table the following must be true:
• metadata in all dump files must be exactly the same
• character set, time zone, schema name, table name and column names must all match
• columns must be defined in the same order with exactly the same datatypes
For example, we want to unload data from oe.inventories table in two source databases and load them into one target database. Table oe.inventories has the following definition in the first and the second source databases
SQL> DESCRIBE inventories_xt
Name Null? Type
----------------------------------------- -------- ----------------------------
PRODUCT_ID NOT NULL NUMBER(6)
WAREHOUSE_ID NOT NULL NUMBER(3)
QUANTITY_ON_HAND NOT NULL NUMBER(8)
This can be done when we do the following:
1. In first source database we create external table
SQL> CREATE TABLE all_inventories
2 ORGANIZATION EXTERNAL
3 (
4 TYPE ORACLE_DATAPUMP
5 DEFAULT DIRECTORY dir1
6 LOCATION (all_inv1.dmp')
7 )
8 AS SELECT * FROM oe.inventories WHERE warehouse_id =1;
Table created.
2. In second source database we create external table:
SQL> CREATE TABLE all_inventories
2 ORGANIZATION EXTERNAL
3 (
4 TYPE ORACLE_DATAPUMP
5 DEFAULT dir2
6 LOCATION ('all_inv2.dmp')
7 )
8 AS SELECT * FROM oe.inventories WHERE warehouse_id = 2;
Table created.
3. Transfer dump files all_inv1.dmp and all_inv2.dmp to the target database and create external table
SQL> CREATE TABLE ext_inventories
2 (
3 PRODUCT_ID NUMBER(6),
4 WAREHOUSE_ID NUMBER(3),
5 QUANTITY_ON_HAND NUMBER(8)
6 )
7 ORGANIZATION EXTERNAL
8 (
9 TYPE ORACLE_DATAPUMP
10 DEFAULT DIRECTORY wh_dir
11 LOCATION ('all_inv1.dmp','all_inv2.dmp')
12 );
Table created.
4. Now table ext_inventories contains all the data selected from oe.inventories tables in the first and the second source databases. Data can only be selected from this table and inserted into target database tables. This is just a simple example but data can be transformed both during unload, in source databases, and load, in target database, making this feature particularly useful for data warehouse applications.
CONCLUSIONS
External tables are useful feature to load and unload data stored in external datafiles. In version 9i it was only possible to load data from external text datafiles into database tables using ORACLE_LOADER access driver. This feature is substantially enhanced in version 10g with the ability to unload data from database tables and write them to external datafiles. New functionality is enabled with introduction of new Data Pump technology so that for external table creation we can use new ORACLE_DATAPUMP access driver. Data warehouse applications are very suitable for external tables use. Using them we can transform and unload data from multiple tables in multiple production databases and store them in external datafile. Data are stored in an external file in Oracle proprietary binary format that is platform independent. That means that external datafile created on one platform can be used as datasource on a different platform with different endian and character set. When we transfer external datafiles from multiple production databases to a data warehouse target database, all these files can be used as a data source for external table. We can combine multiple external datafiles for one new external table provided that metadata table definition is the same in all the datafiles. When we create external table in a warehouse database we can select data from this table, transform them and insert them into database tables.
BIBLIOGRAPHY
1. Oracle Database Administrator’s Guide, 10g Release 1 (10.1), Part No. B10739-01
2. Oracle Database Concepts, 10g Release 1 (10.1), Part No. B10743-01
3. Oracle Database Data Warehousing Guide, 10g Release 1 (10.1), Part No. B10736-01
4. Oracle Database New Features Guide, 10g Release 1 (10.1), Part No. B10750-01
5. Oracle Database SQL Reference, 10g Release 1 (10.1), Part No. B10759-01
6. Oracle Database Utilities, 10g Release 1 (10.1), Part No. B10825-01
Subscribe to:
Posts (Atom)