Tuesday, September 15, 2015

SQL LOADER - SQL Engine to load CSV files to database tables

SQL LOADER

SQL Loader loads data from external files into tables of an Oracle database. It has a powerful data parsing engine that puts little limitation on the format of the data in the data file.


A typical SQL Loader session takes as input a control file, which controls the behavior of SQL Loader, and one or more data files. The output of SQL Loader is an Oracle database (where the data is loaded), a log file, a bad file, and potentially, a discard file. An example of the flow of a SQL Loader session is shown


The above figure shows SQL Loader receiving input data files and a SQL Loader control file as input.

SQL Loader then outputs a log file, bad files, and discards files. Also, the figure shows that the database into which SQL Loader loaded the input data now contains tables and indexes.

Options in SQL Loader while loading the data.


(a) INSERT: Specifies that you are loading into an empty table. SQL Loader will abort the load if the table contains data to start with. This is the default.


(b) APPEND: If we want to load the data into a table which is already containing some rows.


(c) REPLACE: Specifies that, we want to replace the data in the table before loading. Will 'DELETE' all the existing records and replace them with new.


(d) TRUNCATE: This is same as 'REPLACE', but SQL Loader will use the 'TRUNCATE' command instead of 'DELETE' command.

Valid Keywords:

1.             userid --  ORACLE username/password          
2.            control -- control file name                 
3.            log -- log file name                     
4.            bad -- bad file name                     
5.            data -- data file name                    
6.            discard -- discard file name                 
7.            discardmax -- number of discards to allow          (Default all)
8.            skip -- number of logical records to skip    (Default 0)
9.            load -- number of logical records to load    (Default all)
10.          errors -- number of errors to allow            (Default 50)
11.          rows -- number of rows in conventional path bind array or between direct path data saves
               (Default: Conventional path 64, Direct path all)
12.          bindsize -- size of conventional path bind array in bytes  (Default 256000)
13.          silent -- suppress messages during run (header,feedback,errors,discards,partitions)
14.          direct -- use direct path                      (Default FALSE)
15.          parfile -- parameter file: name of file that contains parameter specifications
16.          parallel -- do parallel load                     (Default FALSE)
17.          file -- file to allocate extents from     
18.          skip_unusable_indexes -- disallow/allow unusable indexes or index partitions
(Default FALSE)
19.          skip_index_maintenance -- do not maintain indexes, mark affected indexes as unusable             (Default FALSE)
20.         commit_discontinued -- commit loaded rows when load is discontinued  (Default FALSE)
21.          readsize -- size of read buffer (Default 1048576)
22.         external_table -- use external table for load; NOT_USED, GENERATE_ONLY, EXECUTE (Default NOT_USED)
23.         columnarrayrows -- number of rows for direct path column array  (Default 5000)
24.         streamsize -- size of direct path stream buffer in bytes  (Default 256000)
25.         multithreading -- use multithreading in direct path 
26.         resumable -- enable or disable resumable for current session  (Default FALSE)
27.         resumable_name -- text string to help identify resumable statement
28.         resumable_timeout -- wait time (in seconds) for RESUMABLE  (Default 7200)
29.         date_cache -- size (in entries) of date conversion cache  (Default 1000)


How to use SQL Loader in Windows

1.  Create a table where data needs to be loaded. In the below example, I have created Emp table in HR schema.

create table HR.Emp(emp_id integer,fname varchar(15),lname varchar(15))

2.  Below is the snapshot of the data file(filename – Book1.csv) which I am using in the below example




3.  Create a Control File (.ctl ) as below.  I have saved the file as controlfile.ctl
----------------------------------------------
OPTIONS
   ( skip=1                                               
   , rows=10000                                           
   , errors=10                                            
-- , load=5                                               
   , direct=false                                         
   , parallel=false                                       
   )
----------------------------------------------
LOAD DATA
   INFILE        'Book1.csv'                         
   BADFILE       'LoadResults_BAD.log'                    
   DISCARDFILE   'LoadResults_DISCARDED.log'             
   APPEND  
   INTO TABLE HR.Emp
----------------------------------------------   
Fields Terminated    by ","                               
Optionally Enclosed  by '"'                               
Trailing Nullcols                                        
----------------------------------------------
( emp_id
   ,fname                                                  
   ,lname
  )
----------------------------------------------
4.  Create a bat file so that we can execute SQLLDR command, the below command can also be run in the command prompt but I prefer to have it done through the batch file. Created goload.bat file with the below commands in it.

@echo off
echo.
sqlldr  'system/admin@localhost'  control='ControlFile.ctl'   log='LoadResults.log'
pause


One More Scenario where I need to load the Date field from the file. Below is the process to achieve that. 

1.  Create a table where data needs to be loaded. In the below example, I have created Emp table in HR schema.

create table HR.Emp(emp_id integer,fname varchar(15),lname varchar(15),hire_date Date)

2.  Below is the snapshot of the data file with Hire date (filename – Book1.csv) which I am using in the below example



3.  Create a Control File (.ctl ) as below.  I have saved the file as controlfile.ctl

----------------------------------------------
OPTIONS
   ( skip=1                                               
   , rows=10000                                           
   , errors=10                                            
-- , load=5                                               
   , direct=false                                         
   , parallel=false                                       
   )
----------------------------------------------
LOAD DATA
   INFILE        'Book1.csv'                         
   BADFILE       'LoadResults_BAD.log'                    
   DISCARDFILE   'LoadResults_DISCARDED.log'             
   APPEND  
   INTO TABLE HR.Emp
----------------------------------------------   
Fields Terminated    by ","                               
Optionally Enclosed  by '"'                               
Trailing Nullcols                                        
----------------------------------------------
( emp_id
   ,fname                                                  
   ,lname
  )
----------------------------------------------

4.  STEP 4 will remain the same as above example.

OUTPUT
1.  The value should be in the tables.
2.  Below is the data in the logResults.log file.

SQL*Loader: Release 11.2.0.2.0 - Production on Wed Sep 16 19:11:59 2015

Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.

Control File:   ControlFile.ctl
Data File:      Book1.csv
  Bad File:     LoadResults_BAD.log
  Discard File: LoadResults_DISCARDED.log
 (Allow all discards)

Number to load: ALL
Number to skip: 1
Errors allowed: 10
Bind array:     10000 rows, maximum of 256000 bytes
Continuation:    none specified
Path used:      Conventional

Table HR.EMP, loaded from every logical record.
Insert option in effect for this table: APPEND
TRAILING NULLCOLS option in effect

   Column Name                  Position   Len  Term Encl Datatype
------------------------------ ---------- ----- ---- ---- ---------------------
EMP_ID                              FIRST     *   ,  O(") CHARACTER           
FNAME                                NEXT     *   ,  O(") CHARACTER           
LNAME                                NEXT     *   ,  O(") CHARACTER            
HIRE_DATE                            NEXT     *   ,  O(") DATE MM/DD/YYYY     

value used for ROWS parameter changed from 10000 to 248

Table HR.EMP:
  35 Rows successfully loaded.
  0 Rows not loaded due to data errors.
  0 Rows not loaded because all WHEN clauses were failed.
  0 Rows not loaded because all fields were null.


Space allocated for bind array:                 255936 bytes(248 rows)
Read   buffer bytes: 1048576

Total logical records skipped:          1
Total logical records read:            35
Total logical records rejected:         0
Total logical records discarded:        0

Run began on Wed Sep 16 19:11:59 2015
Run ended on Wed Sep 16 19:12:02 2015

Elapsed time was:     00:00:02.55
CPU time was:         00:00:00.12