HadoopHow-ToBeginner · 3 min read

How to Use Sqoop for Data Import in Hadoop Easily

Use sqoop import command to transfer data from relational databases like MySQL into Hadoop HDFS. Specify connection details, target directory, and table name in the command to import data efficiently.

📐

Syntax

The basic syntax of the sqoop import command includes specifying the database connection, target directory in HDFS, and the table to import.

--connect: JDBC URL of the database
--username: Database username
--password: Database password
--table: Name of the table to import
--target-dir: HDFS directory to store imported data
--split-by: Column used to split data for parallel import

bash

sqoop import --connect jdbc:mysql://hostname:3306/dbname --username user --password pass --table tablename --target-dir /user/hadoop/tablename --split-by id

💻

Example

This example imports the employees table from a MySQL database into HDFS directory /user/hadoop/employees. It uses the emp_id column to split the import into parallel tasks.

bash

sqoop import \
  --connect jdbc:mysql://localhost:3306/companydb \
  --username root \
  --password rootpass \
  --table employees \
  --target-dir /user/hadoop/employees \
  --split-by emp_id

Output

INFO mapreduce.ImportJobBase: Beginning import of employees INFO mapreduce.ImportJobBase: Transferred 1000 records INFO mapreduce.ImportJobBase: Completed import of employees

⚠️

Common Pitfalls

Not specifying --split-by on a column with unique values can cause import to run in a single task, slowing down the process.
Using incorrect JDBC URL or credentials will cause connection failures.
For large tables, not setting --target-dir properly may overwrite existing data.
For password security, avoid using --password directly; use --password-file instead.

bash

Wrong way:
sqoop import --connect jdbc:mysql://localhost:3306/companydb --username root --password rootpass --table employees

Right way:
sqoop import --connect jdbc:mysql://localhost:3306/companydb --username root --password-file /path/to/passwordfile --table employees --target-dir /user/hadoop/employees --split-by emp_id

📊

Quick Reference

Option	Description
--connect	JDBC URL of the source database
--username	Database username
--password	Database password (avoid in scripts)
--password-file	File containing password for security
--table	Name of the table to import
--target-dir	HDFS directory to store imported data
--split-by	Column to split data for parallel import
--num-mappers	Number of parallel tasks (default 4)

✅

Key Takeaways

Use sqoop import with correct connection and table details to import data into Hadoop.

Always specify a unique column with --split-by for faster parallel import.

Avoid putting passwords directly in commands; use --password-file for security.

Set --target-dir to control where data lands in HDFS and prevent overwriting.

Check connection details carefully to avoid import failures.