Wednesday, May 14, 2014

Sqoop 2 investigation



I want to share my sqoop2 investigation experience. 

I tried configuring sqoop2 on hortonworks 2.1 VM. I got success result with starting server side, but for some reason I cannot connect client sqoop2 application to server.
I had used 2 instructions:


This one is archived, but firstly I did notice that. I had done everything as in manual and got worked server. Unfortunately manual does not contain client side details.


This one is latest (Jun 09, 2013), but looks like it is not up to date.
Building, creating binaries, Installing, staring/stopping server parts are correct, but start client and examples is out of day.

I think if I spent some more time I would get success result. But as it’s not main goal I decided work with sqoop2 that installed in latest cloudera VM (hue 5.0).

Also I want notice that web-ui that present in cloudera VM it’s not part of sqoop2 original distributive, it parts of cloudera hue.
Original sqoop2 client is java command line application, but it more user friendly than sqoop 1 interface.

So I downloaded latest cloudera VM from:
Based on instruction I had configured sqoop2 there (by default it stopped) and created import sqoop job based on example that they suggested.
 

Some why hue did not run job. I found sqoop2 logs in cloudera VM (they located in folder ‘/var/log/sqoop2’), but I did not find there any information that explain the reason of job failure.
Also I tried configure and run sqoop2 job based on original command line client and got the same issue without any details.

So, my next step was investigating different Hadoop logs and actually finds the reason why it fail.
First interest place is HUE Server logs:
 
 There I found description of error why sqoop2 job had not run. For me it was incorrectly configured sqoop2 connector. I want recommend use ‘http://www.jsoneditoronline.org/’ for analyzing error. As messages from soop2 comes in json format.

BUT, job still did not work.((( 

I did some brain storming and figure out one more important log for sqoop2: http://localhost:8088/cluster/apps


There I at last found the last issue (for me): sqoop2 user has restricted permissions and it cannot write into folder ‘/user/cloudera/emp’
Solution:   change folder ‘/user/sqoop2/emp’
Also I tried sqoop2 java client API it works fine for me, except errors and exceptions they could be more detailed.
Some details that I found about sqoop2 api:

 Summary:

Sqoop 2 looks interesting, but currently it has less functionality then sqoop 1 (for ex.: I did not see how import data into hive as sqoop 1.4.4 can do and so on). Also informative log is also a problem.
So for now I would prefer use sqoop 1.4.4.