DevBlog: April 2015

Apache Sqoop(TM) is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases in both directions.

Sqoop metastore

Running sqoop-metastore launches a shared HSQLDB database instance on the current machine. Clients can connect to this metastore and create jobs which can be shared between users for execution. The location of the metastore’s files on disk is controlled by the

Argument Description
--create <job-id> Define a new saved job with the specified job-id (jobname)

A second Sqoop command-line, separated by a -- should be

specified; this defines the saved job

--delete <job-id> Delete a saved job
--exec <job-id>   Given a job defined with --create, run the saved job
--show <job-id>   Show the parameters for a saved job
--list              List all saved jobs

--meta-connect <jdbc-uri> Specifies the JDBC connect string used to connect to the metastore

Example

This is an example of job execution:

$
sqoop job --meta-connect <metastore-url>

 --exec
<jobName>

Import all tables for specific schema

The import-all-tables tool imports a set of tables from an RDBMS to HDFS. Data from each table is stored in a separate directory in HDFS.

sqoop job --create imp-ora-hr --meta-connect jdbc:hsqldb:hsql://localhost:16000/sqoop -- import-all-tables --connect jdbc:oracle:thin:@<server>/XE --username <username> --password <password> --hive-import -m 1 --verbose

Argument	Description
`--connect <jdbc-uri>`	Specify JDBC connect string
`--username <username>`	Set authentication username
`--password <password>`	Set authentication password
`--hive-import`	Import tables into Hive (Uses Hive’s default delimiters if none are set)
`--verbose`	Print more information while working
--num-mappers or -m	Control the number of map tasks, which is the degree of parallelism used

Import full table

sqoop job --create imp-ora-hr-jobs --meta-connect jdbc:hsqldb:hsql://localhost:16000/sqoop -- import --connect jdbc:oracle:thin:@10.33.6.58:1521/XE --table HR.JOBS --username SYSTEM --password SYS --hive-import --hive-table HR.JOBS -m 1 --verbose

Import table with incompatible columns

sqoop job --create imp-ora-DBSNMP-BASELINES --meta-connect jdbc:hsqldb:hsql://localhost:16000/sqoop -- import --connect jdbc:oracle:thin:@10.10.10.10:1521/XE --table DBSNMP.MGMT_BSLN_BASELINES --columns BSLN_GUID,TARGET_UID,NAME,TYPE,STATUS --username SYSTEM --password SYS --hive-import --hive-table DBSNMP.BASELINES -m 1 --map-column-hive TARGET_UID=String,BSLN_GUID=String

Import by query

sqoop job --create imp-ora-query --meta-connect jdbc:hsqldb:hsql://localhost:16000/sqoop -- import --connect jdbc:oracle:thin:@10.33.6.58:1521/XE --query "SELECT E.EMAIL, D.DEPARTMENT_NAME, L.CITY, L.STREET_ADDRESS FROM HR.DEPARTMENTS D, HR.LOCATIONS L, HR.EMPLOYEES E WHERE D.LOCATION_ID = L.LOCATION_ID and D.MANAGER_ID = E.MANAGER_ID AND \$CONDITIONS" --username SYSTEM --password SYS --hive-import --hive-table MANAGERS -m 1 --target-dir /bigdata/managers

Export

sqoop job --create export-by-procedure --meta-connect jdbc:hsqldb:hsql://localhost:16000/sqoop -- export --connect jdbc:oracle:thin:@10.10.10.10:1521/XE --username GOSALESDW --password GOSALESDWPW --call FILL_BURST_TABLE --export-dir /apps/hive/warehouse/burst_table --input-fields-terminated-by '\001' --input-lines-terminated-by '\012' --input-null-string '\\N' --input-null-non-string '\\N'

Export with procedure

sqoop job --create export-by-procedure --meta-connect jdbc:hsqldb:hsql://localhost:16000/sqoop -- export --connect jdbc:oracle:thin:@10.33.6.58:1521/XE --username GOSALESDW --password GOSALESDWPW --call FILL_BURST_TABLE --export-dir /apps/hive/warehouse/burst_table --input-fields-terminated-by '\001' --input-lines-terminated-by '\012' --input-null-string '\\N' --input-null-non-string '\\N'

Procedure:

create or replace PROCEDURE FILL_BURST_TABLE

(

RECIPIENTS_IN IN VARCHAR2,

COUNTRY_CODE_IN IN NUMBER

) AS

BEGIN

INSERT INTO BURST_TABLE (COUNTRY_CODE, RECIPIENTS) VALUES (COUNTRY_CODE_IN, RECIPIENTS_IN);

END FILL_BURST_TABLE;

References:

http://sqoop.apache.org/docs/1.4.4/SqoopUserGuide.html

DevBlog

Tuesday, April 14, 2015

SQOOP common flow examples