I want to share my sqoop2 investigation experience.
I tried configuring sqoop2 on hortonworks 2.1 VM.
I got success result with starting server side, but for some reason I cannot
connect client sqoop2 application to server.
I had used 2 instructions:
This one is archived, but firstly I did notice that. I had
done everything as in manual and got worked server. Unfortunately manual does
not contain client side details.
This one is
latest (Jun 09, 2013), but looks like it is not up to date.
Building, creating binaries, Installing, staring/stopping server
parts are correct, but start client and examples is out of day
.
I think if I spent some more time I would get success
result. But as it’s not main goal I decided work with sqoop2 that installed in
latest cloudera VM (hue 5.0).
Also I
want notice that web-ui that present in cloudera VM it’s not part of sqoop2
original distributive, it parts of cloudera hue.
Original sqoop2 client is java command line application,
but it more user friendly than sqoop 1 interface.
So I downloaded latest cloudera VM from:
Based on instruction I had configured sqoop2 there (by
default it stopped) and created import sqoop job based on example that they
suggested.
Some why hue
did not run job. I found sqoop2 logs in cloudera VM (they located in folder
‘/var/log/sqoop2’), but I did not find there any information that explain the
reason of job failure.
Also I tried configure and run sqoop2 job based on original
command line client and got the same issue without any details.
So, my next step was investigating different Hadoop logs and
actually finds the reason why it fail.
First interest place is HUE Server logs:
There I found description of error why sqoop2 job had not
run. For me it was incorrectly configured sqoop2 connector. I want recommend
use ‘http://www.jsoneditoronline.org/’ for analyzing error. As messages from
soop2 comes in json format.
BUT, job still did not work.(((
I did some brain storming and figure out one more important log
for sqoop2: http://localhost:8088/cluster/apps
There I at last found the last issue (for me): sqoop2 user has restricted permissions and it
cannot write into folder ‘/user/cloudera/emp’
Solution: change
folder ‘/user/sqoop2/emp’
Also I tried sqoop2
java client API it works fine for me, except errors and exceptions they could
be more detailed.
Some details that I found about sqoop2 api:
Summary:
Sqoop 2 looks interesting, but currently it has less functionality then sqoop 1 (for ex.: I did not see how import data into hive as sqoop 1.4.4 can do and so on). Also informative log is also a problem.
So for now I would prefer use sqoop 1.4.4.