Oracle 12c – Load and Unload Data with Data Pump

Oracle Data Pump utility is high speed movement of data and metadata between DB.  It replaces export/import utility in previous versions.

 

Three parts of Oracle Data Pump:

  1. EXPDP and IMPDP – command line utility that uses DBMS_DATAPUMP package to execute export and import operations.
  2. DBMS_DATAPUMP – package used to all or part of DB from one DB to another.
  3. DBMS_METADATA – facility to extract, manipulation, and recreation of dictionary metadata.

 

MASTER TABLE – is used by data pump to track jobs while in progress.  It is implemented as a master table within database. Users must have CREATE TABLE privilege.  Master Table will have the same name as the job that created it and cannot have the same name as an existing table or view.  Data in Master table can be used to restart the job.

 

MASTER PROCESS – this process is created for each export/import job.  It controls the entire job (communication, creating and running worker processes and logging).

 

WORKER PROCESSES – executed by Master Processes to perform actual processing.  Can have multiple worker processes running in parallel.

 

 

Monitor a Data Pump Job

You can use export/import utilities to connect to a running job.   In logging mode it displays the real time processes.  In interactive command mode you can request job status.  Optionally log file can be written during the job.  log file will contain: progress, errors, and completion status.   You can also see status and information through Data Pump views:

DBA_DATAPUMP_JOBS – It shows all active DP jobs and master tables not associated with active jobs.

DATA_DATAPUMP_SESSIONS – IDs user sessions attached to DP jobs.  Helpful in determining why DP job is stopped.

V$SESSION_LONGOPS –  DP jobs transferring data maintain an entry to give progress.  Gives estimated transfer size and size transferred.

 

V$SESSION_LONGOPS columns that are important to DP jobs are:

USERNAME- job owner

OPNAME – Job name

TARGET_DESC – Job operation

SOFAR – size data transferred thus far

TOTALWORK – size estimated to transfer.

UNITS – size of measurements (MB, GB, TB, etc)

MESSAGE – formatted message of type ‘job_nam: operation_name: nnn out of mmm MB done.’

 

EXPDP – moves data outside of the database and has multiple.   The utility can be executed in a command line fashion, by a parameter file, or interactive-command mode.

Command-line – Enables you to specify most of the export parameters

Parameter File – Allows you to specify command line parameters  in a PARFILE

Interactive-Command – Displays an export prompt and you enter various commands.

 

Data Pumps jobs manage:

Dump File – Contains data and metadata being moved.

Log File – Records log of operation.

SQL files – Records output of SQLFILE operation.

Data files – Used with transportable import and displays files using DATA_FILES

 

Different EXPDP modes:

Full – Entire database is unloaded.  Need DATAPUMP_EXP_FULL_DATABASE role. FULL parameter

SCHEMA – Default export mode.  If you have DATAPUMP_EXP_FULL_DATABASE role you give list of schemas if not only your own.   SCHEMAS parameter.

TABLE – Only specific tables, partitions and dependent objects are unloaded. TABLES parameter.

TABLESPACES – All contents of a tablespaces are unloaded.  If you have full privilege you get everything if not only your own objects.   TABLESPACES parameter.

TRANSPORTABLE TABLESPACE – Only metadata of tables and associated objects are unloaded.  Data is moved in another operation.  TRANSPORT_TABLESPACES parameter.

 

 

EXPDP Parameter

ATTACH – Attaches the client session to an existing export job and automatically places you in the interactive-command interface.

CONTENT – Enables you to filter what Export unloads: data only, metadata only, or both.

DIRECTORY – Specifies the default location to which Export can write the dump file set and the log file. DUMPFILE – Specifies the names, and optionally, the directory objects of dump files for an export job. ESTIMATE – Specifies the method that Export will use to estimate how much disk space each table in the export job will consume (in bytes).

ESTIMATE_ONLY – Instructs Export to estimate the space that a job would consume, without actually performing the export operation.

EXCLUDE – Enables you to filter the metadata that is exported by specifying objects and object types to be excluded from the export operation.

FILESIZE – Specifies the maximum size of each dump file.

FULL – Specifies that you want to perform a full database mode export.

INCLUDE – Enables you to filter the metadata that is exported by specifying objects and object types for the current export mode.

JOB_NAME – Used to identify the export job in subsequent actions.

LOGFILE – Specifies the name, and optionally, a directory, for the log file of the export job.

PARFILE – Specifies the name of an export parameter file.

QUERY – Allows you to specify a query clause that is used to filter the data that gets exported.

SCHEMAS – Specifies that you want to perform a schema-mode export.

TABLES – Specifies that you want to perform a table-mode export.

TABLESPACES – Specifies a list of tablespace names to be exported in tablespace mode. TRANSPORT_TABLESPACES — Specifies that you want to perform an export in transportable-tablespace mode. This parameter is used to specify a list of tablespace names for which object metadata will be exported from the source database into the target database.

TRANSPORTABLE — When set to ALWAYS, it indicates the transportable option should be used during the export. The default is NEVER.

 

impdp Parameters

REUSE_DATAFILES – Specifies whether the import job should reuse existing data files for tablespace creation. SQLFILE – Specifies a file into which all of the SQL DDL that Import would have executed, based on other parameters, is written.

STATUS – Specifies the frequency at which the job status display is updated.

TABLE_EXISTS_ACTION – Tells import what to do if the table it is trying to create already exists. REMAP_DATAFILE – Changes the name of the source data file to the target data file name in all SQL statements where the source data file is referenced.

REMAP_SCHEMA – Loads all objects from the source schema into a target schema.

REMAP_TABLE – Allows you to rename tables during an import operation.

REMAP_TABLESPACE – Remaps all objects selected for import with persistent data in the source tablespace to be created in the target tablespace.

 

The following commands are applicable when using Interactive mode:

ADD_FILE – Add additional dump files.

CONTINUE_CLIENT – Exit interactive mode and enter logging mode.

EXIT_CLIENT – Stop the import or export client session, but leave the job running.

KILL_JOB – Detach all currently attached client sessions and terminate the current job.

PARALLEL – Increase or decrease the number of active worker processes for the current job.

START_JOB – Restart a stopped job to which you are attached.

STATUS – Display detailed status for the current job and/ or set status interval.

STOP_JOB – Stop the current job for later restart.

 

Network-based Data Pump operations

NETWORK_LINK parameter is used with impdp as part of import operation.  Data is moved directly using SQL.  Select statement moves data from remote database over the link and INSERT clause inserts into local database.  No dump files are involved.  Current user database links are not supported.   Only PUBLIC, FIXED user or connected user Database Links can be used.

 

Restrictions of Network Link:

  1. Tables with LONG or LONG RAW that reside in admin tablespaces are not supported.
  2. Auditing cannot be enabled on admin tablespaces when performing full transportable export.
  3. Source and Target database cannot be more than two versions off.

 

SQL*Loader

SQL*loader is a very good for loading data from non-Oracle databases, it is very flexible and can read large range of flat files.  SQL*Loader can perform the following functions:

  1. Load data across network if files on different machine.
  2. Load data from multiple files in same session.
  3. Load data into multiple tables in same session.
  4. Specify character set.
  5. Selectively load data by values.
  6. Manipulate data before loading.
  7. Generate unique sequential keys.
  8. Use OS to access data files.
  9. Load data from disk, tape or named pipe.
  10. Load arbitrarily complex object-related data.
  11. Use secondary data files for loading LOBs and collections.
  12. Use convential, direct path or external tables loads.

 

SQL*Loader can be used with or without control file

SQL*Loader makes use of the following files:

Control – Defines data format and control behaviors.

Data – One or more data files containing info to be loaded.

Log – Logs performance and errors.

Bad – Contains all records not loaded because of errors.

Discard – Contains all records discarded by control file identified bypass.

 

SQLLDR executable is used to invoke SQL*Loader.  you use parameters in the control its action in a PARFILE or as options.  If parameter appears in PARFILE and command line, command line is used.

 

Three sections of control files: PARFILE.

  1. Session wide info: global options, input data file location, data to be loaded.
  2. Contains INTO TABLE blocks (one or more)
  3. Optional third section can contain input data.

Two methods SQL*Loader can use are Conventional Path and Direct Load options.

Conventional Path – SQL*Loader creates INSERT statement for data to be loaded and passes it to SQL Parser.

Direct Load Path – SQL*Loader bypasses the SQL and parser and loads data straight into the target table.

 

Direct Path is much faster than Conventional path, but conventional path is more flexible.

 

Direct Path restrictions:

  1. Cannot run concurrently with other transactions against a target table.
  2. Triggers on table do not fire.
  3. Data is written above high water mark, even if space is available below.
  4. Clustered tables are not supported.
  5. Foreign Key constraints are disabled during load.

 

Larry Catt

OCP