What is Sqoop for database imports in Hadoop?

Hadoopdata~5 mins

Sqoop for database imports in Hadoop

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Introduction

Sqoop helps move data from databases into Hadoop easily. It saves time and avoids manual copying.

You want to analyze data stored in a MySQL or Oracle database using Hadoop tools.

You need to import large tables from a database into HDFS for big data processing.

You want to automate regular data transfers from a database to Hadoop.

You want to combine data from databases with other big data sources in Hadoop.

You want to export processed data back from Hadoop to a database.

Syntax

Hadoop

sqoop import \
  --connect jdbc:mysql://hostname:port/dbname \
  --username user \
  --password pass \
  --table tablename \
  --target-dir /hdfs/path \
  [other options]

Use --connect to specify the database connection string.

--table tells Sqoop which table to import.

Examples

Imports the customers table from MySQL database salesdb into HDFS directory /user/hadoop/customers.

Hadoop

sqoop import --connect jdbc:mysql://localhost:3306/salesdb --username admin --password secret --table customers --target-dir /user/hadoop/customers

Imports transactions table from PostgreSQL, splitting import by transaction_id for parallelism.

Hadoop

sqoop import --connect jdbc:postgresql://dbserver:5432/finance --username finance_user --password pass123 --table transactions --split-by transaction_id --target-dir /data/transactions

Imports only sales department employees using a custom query. $CONDITIONS is required for splitting.

Hadoop

sqoop import --connect jdbc:mysql://localhost:3306/hrdb --username hr --password hrpass --query 'SELECT * FROM employees WHERE department = "Sales" AND $CONDITIONS' --split-by employee_id --target-dir /user/hadoop/sales_employees

Sample Program

This command imports the employees table from the MySQL database testdb into the HDFS directory /user/hadoop/employees. It uses 2 mappers to speed up the import.

Hadoop

sqoop import \
  --connect jdbc:mysql://localhost:3306/testdb \
  --username testuser \
  --password testpass \
  --table employees \
  --target-dir /user/hadoop/employees \
  --num-mappers 2

OutputSuccess

Important Notes

Always use $CONDITIONS in custom queries to allow Sqoop to split the data for parallel import.

Use --num-mappers to control parallelism; more mappers can speed up import but use more resources.

Passwords can be passed securely using --password-file instead of plain text.

Summary

Sqoop imports data from databases into Hadoop automatically.

Use connection string, table name, and target directory to specify import.

Parallel imports speed up data transfer using split columns and mappers.