Exporting and Importing table data from Oracle database to Hive and vice-versa is one of the most common activities in the world of Hadoop. It is essential to get sorted out on a few basics for seamless first time integration so as to avoid various parsing and loading errors.
We will be doing the below activities sequentially so as to cover all the integration points between Oracle database, Sqoop, HDFS and Hive.
Step 1: Extract data from a source Oracle database table to Hadoop file system using Sqoop
Step 2: Load the above Sqoop extracted data into a Hive table
Step 3: Use Hive query to generate a file extract in the Hadoop file system
Step 4: Load the generated file in Step 3 to a new target Oracle database table
Step 1: Sqoop import data from Oracle database to Hive table
Our first task is to identify our source Oracle database table, and then use Sqoop to fetch the data from this table to HDFS using Sqoop.
It is interesting to observe that we need to identify a primary key for the source Oracle database table. Else we will get the error “Error during import: No primary key could be found for table”. If we want to skip assigning a key we can include the highlighted parameter -m 1.
Step 2: Load the above Sqoop extracted data to a Hive table
Assuming we already have a table created in Hive, we will load the file created in Step 1 into the Hive table using the below syntax.
Step 3: Export a file using Hive query to be consumed by Sqoop
Now that we have the data in our Hive table, we will use the below command to create a file using a custom Hive query, in the green highlighted path. The delimiter highlighted in yellow can be changed according to our requirement – but accordingly it must be changed in Step 4 also where it’s highlighted in yellow.
Step 4: Load data from Hive table exported file to Oracle database table
The below command will use the above Hive exported file (from the same green highlighted path) to load our target Oracle database table.
The orange and blue highlighted sections above helps reading the records while exporting to the target database table. Else we might sometimes encounter the error “Can’t parse input data”.
Thus we have successfully loaded a table from Oracle database to Hive and again back from Hive to Oracle database, using Sqoop. We can query our Oracle and Hive databases to check the data if it is loaded fine. How do you prefer to load your data between these two systems?