Apache Sqoop(TM) is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases in both directions.
Sqoop metastore
Runningsqoop-metastore
launches a shared HSQLDB database instance on
the current machine. Clients can connect to this metastore and create jobs
which can be shared between users for execution. The location of the metastore’s
files on disk is controlled by the --create <job-id> Define a new saved job with the specified job-id (jobname)
A second Sqoop command-line, separated by a --
should be
specified; this defines the saved job
--delete <job-id> Delete a saved job
--exec <job-id> Given a job defined with --create, run the saved job
--show <job-id> Show the parameters for a saved job
--list List all saved jobs
--meta-connect <jdbc-uri> Specifies the JDBC connect string used to
connect to the metastore --exec <job-id> Given a job defined with --create, run the saved job
--show <job-id> Show the parameters for a saved job
--list List all saved jobs
Example
This is an example of job execution:
$
sqoop job --meta-connect <metastore-url>
--exec
<jobName>
Import all tables for specific schema
The import-all-tables tool imports a set of tables from an RDBMS to HDFS. Data from each table is stored in a separate directory in HDFS.
sqoop job --create imp-ora-hr --meta-connect jdbc:hsqldb:hsql://localhost:16000/sqoop -- import-all-tables --connect jdbc:oracle:thin:@<server>/XE --username <username> --password <password> --hive-import -m 1 --verbose
Argument
|
Description
|
--connect
<jdbc-uri> |
Specify JDBC connect string
|
--username
<username> |
Set authentication username
|
--password
<password> |
Set authentication password
|
--hive-import |
Import tables into Hive (Uses Hive’s default delimiters if none are
set)
|
--verbose |
Print
more information while working
|
--num-mappers or -m
|
Control
the number of map tasks, which is the degree of parallelism used
|
Import full table
sqoop job --create imp-ora-hr-jobs --meta-connect
jdbc:hsqldb:hsql://localhost:16000/sqoop -- import --connect
jdbc:oracle:thin:@10.33.6.58:1521/XE --table HR.JOBS --username SYSTEM
--password SYS --hive-import --hive-table HR.JOBS -m 1 --verbose
Import table with incompatible columns
sqoop job --create
imp-ora-DBSNMP-BASELINES --meta-connect jdbc:hsqldb:hsql://localhost:16000/sqoop
-- import --connect jdbc:oracle:thin:@10.10.10.10:1521/XE --table
DBSNMP.MGMT_BSLN_BASELINES --columns BSLN_GUID,TARGET_UID,NAME,TYPE,STATUS
--username SYSTEM --password SYS --hive-import --hive-table DBSNMP.BASELINES -m
1 --map-column-hive TARGET_UID=String,BSLN_GUID=String
Import by query
sqoop job --create imp-ora-query
--meta-connect jdbc:hsqldb:hsql://localhost:16000/sqoop -- import --connect
jdbc:oracle:thin:@10.33.6.58:1521/XE --query "SELECT E.EMAIL,
D.DEPARTMENT_NAME, L.CITY, L.STREET_ADDRESS FROM HR.DEPARTMENTS D, HR.LOCATIONS
L, HR.EMPLOYEES E WHERE D.LOCATION_ID = L.LOCATION_ID and D.MANAGER_ID =
E.MANAGER_ID AND \$CONDITIONS" --username SYSTEM --password SYS
--hive-import --hive-table MANAGERS -m 1 --target-dir /bigdata/managers
Export
sqoop job --create export-by-procedure --meta-connect jdbc:hsqldb:hsql://localhost:16000/sqoop -- export --connect jdbc:oracle:thin:@10.10.10.10:1521/XE --username GOSALESDW --password GOSALESDWPW --call FILL_BURST_TABLE --export-dir /apps/hive/warehouse/burst_table --input-fields-terminated-by '\001' --input-lines-terminated-by '\012' --input-null-string '\\N' --input-null-non-string '\\N'
Export with procedure
sqoop job --create export-by-procedure --meta-connect jdbc:hsqldb:hsql://localhost:16000/sqoop -- export --connect jdbc:oracle:thin:@10.33.6.58:1521/XE --username GOSALESDW --password GOSALESDWPW --call FILL_BURST_TABLE --export-dir /apps/hive/warehouse/burst_table --input-fields-terminated-by '\001' --input-lines-terminated-by '\012' --input-null-string '\\N' --input-null-non-string '\\N'
Procedure:
create or
replace PROCEDURE
FILL_BURST_TABLE
(
RECIPIENTS_IN IN VARCHAR2,
COUNTRY_CODE_IN IN NUMBER
) AS
BEGIN
INSERT INTO BURST_TABLE (COUNTRY_CODE,
RECIPIENTS) VALUES (COUNTRY_CODE_IN, RECIPIENTS_IN);
END
FILL_BURST_TABLE;