Search This Blog

Saturday, April 10, 2010

LOADING AND UNLOADING DATA USING EXTERNAL TABLES

In this post I want to publish my presentation from IOUG 2005 conference about external tables in Oracle database

EXTERNAL TABLES CONCEPTS
INTRODUCTION

External tables feature was first introduced with Oracle9i Server. This feature enables accessing data sources outside Oracle database in the same way as regular database tables. External tables are defined as tables with data source outside of Oracle database. Data in external source can be in any format supported by access drivers available in Oracle Server. Oracle 10G Server currently provides two access drivers: ORACLE_LOADER that is suitable for loading data from text files into regular tables in Oracle database and ORACLE_DATAPUMP that is suitable for both unloading data to and loading data from binary files. To create external table in Oracle database we must specify column definitions and access parameters that are used to describe mapping between external data and Oracle column definitions. External table definition neither describes data stored in the database nor how are the data stored in external source. Part of metadata information for external table is access driver that is responsible to transform data from external source and map them to external table column types. Metadata information about external table is stored in Oracle data dictionary and is used to enable accessing data from this table with standard SELECT statements. Writing data to external tables is only possible if they are created with ORACLE_DATAPUMP access driver and only with CREATE TABLE statement. Once the external table is created and initially populated with data no further modification of data is possible.
Loading data from external table refers to reading data from external source (external datafile) and loading them into a table in the database. Unloading data into external tables refers to reading data from database table and inserting them into external table.
Data in external tables can be queried directly with SELECT statements. We can perform select, join and sort operations with data from external tables and this operations can be parallelized to improve performance. Data can be initially inserted into external table only in CREATE TABLE .. AS SELECT statement with ORACLE_DATAPUMP access driver. Thereafter no further DML operations are allowed on external tables. Synonyms and views can be created for external tables but no indexes.
PRIVILEGES REQUIRED TO CREATE EXTERNAL TABLE
Access driver runs inside database server and must have access to the operating system files:
• Server must have read access to all the dump files with external table data that we want to select or load into database tables
• Server must have write access to operating system directory to create the logfile, bad file, discard file (with ORACLE_LOADER access driver) and dump file (with ORACLE_DATAPUMP access driver)
Prerequisite for creating external table is that a database user with CREATE ANY DIRECTORY must create a directory object as an alias for a directory on a server where external table data will be stored. User SYS owns all the directory objects but he can grant other users the privilege to create directory objects. User who creates a directory object in Oracle database is automatically granted READ and WRITE privileges on a directory and can grant these privileges to other database users that need to access or create files in this directory. Operating system directory to which Oracle directory object refers to must exist before directory object can be created. Besides that system administrator must set appropriate read/write privileges on this operating system directory so that Oracle Database processes can access it.
For example to create Oracle directory object EXTTABDIR as an alias for operating system directory /oracle/exttabs we will use the following command:
CREATE DIRECTORY EXTTABDIR AS ‘/oracle/exttabs’;
If a user JONES wants to load data from external files in EXTTABDIR directory we must grant him READ access to this directory with the command:
GRANT READ ON EXTTABDIR TO JONES;
If a user JONES wants to unload data to external files (with ORACLE_DATAPUMP driver) in EXTTABDIR directory we must grant him WRITE access to this directory with the command:
GRANT WRITE ON EXTTABDIR TO JONES;
Now user JONES can create external tables provided that CREATE TABLE system privilege has been granted to him. External tables are created with CREATE TABLE … ORGANIZATION EXTERNAL command and detailed syntax will be described for ORACLE_LOADER and ORACLE_DATAPUMP drivers in the following chapters.
CREATING EXTERNAL TABLES
Create external table statement consists of the following typical parts:
CREATE TABLE schema.name
(column definitions)
ORGANIZATION EXTERNAL
(TYPE access driver – ORACLE_LOADER or ORACLE_DATAPUMP
DEFAULT DIRECTORY directory
ACCESS PARAMETERS (access parameter values)
LOCATION (‘file1.dat’,’file2.dat’…’filen.dat’)
)
PARALLEL
REJECT LIMIT UNLIMITED
AS SELECT … FROM … WHERE … -- this can be specified only with ORACLE_DATAPUMP access driver
In column definitions part we specify column names and datatypes like with any other Oracle relational table.
ORGANIZATION_EXTERNAL clause defines that we want to create external table
With TYPE we must specify which access driver we are going to use to access data in external table.
DEFAULT DIRECTORY clause specifies operating system directory where data for external table will be stored.
Each of access drivers has ACCESS PARAMETERS that define how to access data from external files
In LOCATION clause we must specify names, one or more, of files where data will be stored.
The PARALLEL clause enables parallel query of data in external table.
REJECT LIMIT clause specifies that there is no limit on errors that occur when we query data from external table.
With AS SELECT clause, if we use ORACLE_DATAPUMP access driver, data can be initially inserted into external table.
When we create external table we must observe the following restrictions:
• external table cannot be a temporary table
• you cannot specify a constraint on external table
• an external table cannot have object type, varray or LONG columns
ALTERING EXTERNAL TABLES
With ALTER TABLE command we can add, modify and delete columns in external tables. besides that with alter table we can change: PARALLEL, DEFAULT DIRECTORY, ACCESS PARAMETERS, LOCATION, REJECT LIMIT and PROJECT COLUMN clauses.
PROJECT COLUMN clause defines how access driver validates rows of data selected from external tables. Default value is PROJECT COLUMN REFERENCED and that means that access driver processes only columns explicitly specified in select list. The other possibility is to specify PROJECT COLUMNS ALL when access processes all the columns in a row and rejects row if any column value raises an error during data conversion.
The following restrictions apply when we alter external table:
• we cannot add LONG, LOB or object type column nor can we change existing column datatypes to any of the specified types
• we cannot modify storage parameters
DROPPING EXTERNAL TABLES
When you drop external table this only removes metadata from data dictionary but it does not remove external dump datafiles.
DATA DICTIONARY VIEWS FOR EXTERNAL TABLES
In data dictionary there are some additional dictionary views that contain information that is specific for external tables:
• DBA_ (ALL_ , USER_) EXTERNAL_TABLES – attributes specific to external tables
• DBA_ (ALL_ , USER_) EXTERNAL_LOCATIONS – list of data sources for external tables
COLLECTING STATISTICS FOR EXTERNAL TABLES
Statistics for external tables are not collected with GATHER_SCHEMA_STATS and GATHER_DATABASE_STATS procedures. With a procedure GATHER_TABLE_STATS, statistics for individual external tables can be collected but sampling on external tables is not supported so ESTIMATE_PERCENT must be set to NULL.
DATATYPE CONVERSION
When data is loaded or unloaded with external tables it is possible that the same column will have a different datatype in:
• The database
o source when data is unloaded into an external table
o destination when data is loaded from an external table
• The external table
o during unload data from the database is converted to match the column datatype in the external table
o during load data from an external table is converted to match the column datatype in the database
• The external datafile
o during unload datatypes of fields in the datafile exactly match datatypes of fields in the external table
o during load datatypes of fields in the datafile are converted to datatypes of fields in the external files. If there is a conversion error then row containing this column is not loaded
If conversion error occurs between datafile and external table then row with an error will be ignored. The whole operation will terminate unsuccessfully if conversion error occurs between external table and the column in the database.
DUMP FILE FORMAT
Dump files that contain data for external table are stored in Oracle proprietary binary format. You cannot edit this file with text editor but if you try to select readable strings from file with UNIX command strings you can see there is an XML definition of table name, character set, column names and datatypes followed by rows of data. It looks something like this:



NEW FEATURES IN 10G
INTRODUCTION

External table feature was first introduced in 9i version of Oracle Server. In this version only ORACLE_LOADER access driver was available for external tables that have enabled read-only access to the data in external text files with standard SELECT statements. With this driver it was only possible to select data from such external tables and insert them in regular database tables.
DATA PUMP TECHNOLOGY
In Oracle 10g external table features are extended thanks to the new Data Pump mechanism that was first introduced in this version. Data Pump technology enables very fast movement of data and metadata between different Oracle databases. This technology is a basis for new database utilities expdp and impdp that have similar function as export and import utilities but data and metadata movement with expdp and impdp is orders of magnitude faster enabling movement of huge quantities of data in a very short time. Expdp and impdp are server based database utilities that are using jobs and Data Pump API for data and metadata movement. Expdp creates dump file in Oracle proprietary binary format that can only impdp read. Expdp and impdp have the similar command line parameters as export and import utilities to specify data and metadata filters when we want to move only a subset of data in the database.
Oracle Data Pump can use direct path or external tables path to load or unload data and metadata in Oracle database. Data Pump uses external tables access path in the following cases:
• Loading and unloading very large tables and partitions with parallel SQL
• Loading tables with global or domain indexes defined on them, including partitioned object tables
• Loading tables with active triggers or clustered tables
• Loading and unloading tables with encrypted columns
• Loading tables with fine-grained access control enabled for inserts
• Loading tables that are partitioned differently at load time and unload time
EXTERNAL TABLE ENHANCEMENTS
External table feature benefits also from this new Data Pump technology because new ORACLE_DATAPUMP access driver is introduced that is based on Data Pump API. With this new access driver functionality of external tables is extended with the capability to unload data from database tables into external table. So it is now possible to select data from arbitrary number of database tables using complex query conditions and write them into external table. External table data are stored in a set of operating system files that Oracle creates in a proprietary binary format. This set of files can thereafter be transferred to a different database server, possibly with a different operating system. Based on this transferred files external table can be created in a new database and data can be loaded from this external table into regular database tables. This feature is particularly useful for data warehouse type of applications because you can transform data from source database, store it in external table, move external table files to warehouse database and load them into database tables.
CONFIGURING EXTERNAL TABLES WITH ORACLE_LOADER ACCESS DRIVER
Record parsing of external tables created with ORACLE_LOADER access driver is similar with SQL*Loader but performance reasons can determine which of these methods we are going to use in a particular situation.
For best performance we will use external table in the following situations:
• you want to transform the data during data load operation
• you want to use transparent parallel processing (no need to split external data)



Table 1 Differences between external tables and SQL*Loader

SQL*Loader performance is better in the following situations:
• loading remote data
• data transformation and parallel processing are not required
• additional indexing of staging table is required
If table is created with this access driver we can query data that are stored in external text datafile and load them into database table. When we create external table we must define access parameters to describe how access driver will access external table data. Access parameter definition contains information about record and field formatting of data stored in an external text datafile. Syntax of access parameters for external table is similar to SQL*Loader control file with differences specified in Table 1.
With access parameters we can specify the following characteristics:
• comments
• record_format_info – information about the record, such as its format, the character set of the data, load exclusion rules
• field_definitions – describes fields in the datafile
• column_transforms – define mapping between columns in the external table and columns in the datafile
With ORACLE_LOADER access driver the following record formats can be processed:
• FIXED length – all records are having the same length defined with length parameter
• VARIABLE size – records are variable size and size parameter defines the number of bytes from the beginning of the record that contain record length in bytes
• DELIMITED BY string – string indicate the characters that identify the end of a record
External table with fixed length records can be created like in the following example:
CREATE TABLE actors

ACCESS PARAMETERS (RECORDS FIXED 20 FIELDS
(first_name CHAR(7), last_name CHAR(8), year_of_birth CHAR(4)))

Dump file:
Clint Eastwood1930
James Stewart 1908
John Wayne 1907
External table with variable length records can be created like in the following example:
CREATE TABLE actors

ACCESS PARAMETERS (RECORDS VARIABLE 2 FIELDS TERMINATED BY ','
(first_name CHAR(7), last_name CHAR(8), year_of_birth CHAR(4)))

Dump file:
21Clint,Eastwood,1930,
20James,Stewart,1908,
17John,Wayne,1907,
External table with delimited records can be created like in the following example:
CREATE TABLE actors

ACCESS PARAMETERS (RECORDS DELIMITED BY '' FIELDS TERMINATED BY ','
(first_name CHAR(7), last_name CHAR(8), year_of_birth CHAR(4)))

Dump file:
Clint,Eastwood,1930James,Stewart,1908John,Wayne,1907

If the field_definitions clause is omitted, then:
• The fields are assumed to be delimited by ','
• The fields are assumed to be character type
• The maximum length of the field is assumed to be 255
• The order of the fields in the datafile is the order in which the fields were defined in the external table
• No blanks are trimmed from the field
With field_definitions we can specify the following field characteristics:
• delim_spec clause is used to identify how all fields are terminated in the record.
• The trim_spec clause specifies the type of whitespace trimming to be performed by default on all character fields.
• MISSING FIELD VALUES ARE NULL indicates that if there is not enough data in a record for all fields, then those fields with missing data values are set to NULL.
• REJECT ROWS WITH ALL NULL FIELDS indicates that a row will not be loaded into the external table if all referenced fields in the row are null.
• The field_list clause identifies the fields in the datafile and their datatypes.

CONFIGURING EXTERNAL TABLES WITH ORACLE_DATAPUMP ACCESS DRIVER
For external tables created with ORACLE_DATAPUMP access driver you can specify the following access parameters:
• comments
• LOGFILE NOLOGFILE – specify name for logfile where messages are generated when dump files are accesses. With NOLOGFILE we can specify that no logfile will be generated.
• VERSION – we must specify the version of the database where data from dump file will be read if this version is different than the version of the database where data were unloaded.
Data can be inserted into external table only with CREATE TABLE … AS SELECT statement when we create table. After that no further DML operations on data in external table are allowed. In this case Oracle will raise an error like in the following example:
DELETE FROM sales_results WHERE product_id=586243;
*
ERROR at line 1:
ORA-30657: operation not supported on external organized table

LOADING AND UNLOADING DATA
With ORACLE_DATAPUMP driver data can be unloaded from one database and loaded into another database. This can be achieved with the following steps:
1. Create external table in source database and populate it with data that we want to unload.
SQL> CREATE TABLE sales_results
2 ORGANIZATION EXTERNAL
3 (
4 TYPE ORACLE_DATAPUMP
5 DEFAULT DIRECTORY pump_dir
6 LOCATION ('sales.dmp')
7 )
8 AS SELECT * FROM sales_figures;

Once external table is created and populated with data no further DML operations are allowed.

2. Transfer sales.dmp dump file to target database server.
3. In a target database create external table with sales.dmp as dump file
SQL> CREATE TABLE sales_results
2 (
3 product_id NUMBER(6),
4 price NUMBER(3),
5 quantity_sold NUMBER(8)
6 )
7 ORGANIZATION EXTERNAL
8 (
9 TYPE ORACLE_DATAPUMP
10 DEFAULT DIRECTORY dumpdir
11 LOCATION ('sales.dmp')
12 );
Table created.
4. In target database data from external table can only be selected and inserted in other database tables
INSERT INTO sales_summary SELECT * FROM sales_results;
COMBINING DUMP FILES
Dump files loaded with data for one external table can be used as data source for another external table. We can unload data from different production databases into separate dump files and use all this files to create external table in data warehouse database and load data into database tables. Using this feature we can unload data from multiple data sources and load them into one target database.
To use multiple dump file in the LOCATION clause of external table the following must be true:
• metadata in all dump files must be exactly the same
• character set, time zone, schema name, table name and column names must all match
• columns must be defined in the same order with exactly the same datatypes
For example, we want to unload data from oe.inventories table in two source databases and load them into one target database. Table oe.inventories has the following definition in the first and the second source databases
SQL> DESCRIBE inventories_xt
Name Null? Type
----------------------------------------- -------- ----------------------------
PRODUCT_ID NOT NULL NUMBER(6)
WAREHOUSE_ID NOT NULL NUMBER(3)
QUANTITY_ON_HAND NOT NULL NUMBER(8)
This can be done when we do the following:
1. In first source database we create external table
SQL> CREATE TABLE all_inventories
2 ORGANIZATION EXTERNAL
3 (
4 TYPE ORACLE_DATAPUMP
5 DEFAULT DIRECTORY dir1
6 LOCATION (all_inv1.dmp')
7 )
8 AS SELECT * FROM oe.inventories WHERE warehouse_id =1;
Table created.

2. In second source database we create external table:
SQL> CREATE TABLE all_inventories
2 ORGANIZATION EXTERNAL
3 (
4 TYPE ORACLE_DATAPUMP
5 DEFAULT dir2
6 LOCATION ('all_inv2.dmp')
7 )
8 AS SELECT * FROM oe.inventories WHERE warehouse_id = 2;
Table created.
3. Transfer dump files all_inv1.dmp and all_inv2.dmp to the target database and create external table
SQL> CREATE TABLE ext_inventories
2 (
3 PRODUCT_ID NUMBER(6),
4 WAREHOUSE_ID NUMBER(3),
5 QUANTITY_ON_HAND NUMBER(8)
6 )
7 ORGANIZATION EXTERNAL
8 (
9 TYPE ORACLE_DATAPUMP
10 DEFAULT DIRECTORY wh_dir
11 LOCATION ('all_inv1.dmp','all_inv2.dmp')
12 );
Table created.
4. Now table ext_inventories contains all the data selected from oe.inventories tables in the first and the second source databases. Data can only be selected from this table and inserted into target database tables. This is just a simple example but data can be transformed both during unload, in source databases, and load, in target database, making this feature particularly useful for data warehouse applications.
CONCLUSIONS
External tables are useful feature to load and unload data stored in external datafiles. In version 9i it was only possible to load data from external text datafiles into database tables using ORACLE_LOADER access driver. This feature is substantially enhanced in version 10g with the ability to unload data from database tables and write them to external datafiles. New functionality is enabled with introduction of new Data Pump technology so that for external table creation we can use new ORACLE_DATAPUMP access driver. Data warehouse applications are very suitable for external tables use. Using them we can transform and unload data from multiple tables in multiple production databases and store them in external datafile. Data are stored in an external file in Oracle proprietary binary format that is platform independent. That means that external datafile created on one platform can be used as datasource on a different platform with different endian and character set. When we transfer external datafiles from multiple production databases to a data warehouse target database, all these files can be used as a data source for external table. We can combine multiple external datafiles for one new external table provided that metadata table definition is the same in all the datafiles. When we create external table in a warehouse database we can select data from this table, transform them and insert them into database tables.
BIBLIOGRAPHY
1. Oracle Database Administrator’s Guide, 10g Release 1 (10.1), Part No. B10739-01
2. Oracle Database Concepts, 10g Release 1 (10.1), Part No. B10743-01
3. Oracle Database Data Warehousing Guide, 10g Release 1 (10.1), Part No. B10736-01
4. Oracle Database New Features Guide, 10g Release 1 (10.1), Part No. B10750-01
5. Oracle Database SQL Reference, 10g Release 1 (10.1), Part No. B10759-01
6. Oracle Database Utilities, 10g Release 1 (10.1), Part No. B10825-01

4 comments:

  1. I WILL FOLLOW ZORAN!! BUT I NEED JAVA

    ReplyDelete
  2. I WILL FOLLOW ZORAN--NO PROBLEM. I WORKING THAT
    IN TWITTER.
    BUT I NEED JAVA --PLEASE SUGEST ME TIPE!!
    7--26?

    ReplyDelete
  3. Nice Post! I appreciate to you for this post. Really you are the best. Oracle Java Dumps

    ReplyDelete
  4. The Grand Victoria Casino Hotel - MapyRO
    The Grand Victoria Casino Hotel is 대구광역 출장샵 a 4 star casino 계룡 출장샵 located 제천 출장안마 in the suburb of M-5 at the intersection of Sq. 양주 출장마사지 24 Boulevard, QA, QA, and Sq. 인천광역 출장마사지 25,  Rating: 2.6 · ‎9 votes

    ReplyDelete