0
0
Hadoopdata~5 mins

Sqoop for database imports in Hadoop

Choose your learning style9 modes available
Introduction

Sqoop helps move data from databases into Hadoop easily. It saves time and avoids manual copying.

You want to analyze data stored in a MySQL or Oracle database using Hadoop tools.
You need to import large tables from a database into HDFS for big data processing.
You want to automate regular data transfers from a database to Hadoop.
You want to combine data from databases with other big data sources in Hadoop.
You want to export processed data back from Hadoop to a database.
Syntax
Hadoop
sqoop import \
  --connect jdbc:mysql://hostname:port/dbname \
  --username user \
  --password pass \
  --table tablename \
  --target-dir /hdfs/path \
  [other options]

Use --connect to specify the database connection string.

--table tells Sqoop which table to import.

Examples
Imports the customers table from MySQL database salesdb into HDFS directory /user/hadoop/customers.
Hadoop
sqoop import --connect jdbc:mysql://localhost:3306/salesdb --username admin --password secret --table customers --target-dir /user/hadoop/customers
Imports transactions table from PostgreSQL, splitting import by transaction_id for parallelism.
Hadoop
sqoop import --connect jdbc:postgresql://dbserver:5432/finance --username finance_user --password pass123 --table transactions --split-by transaction_id --target-dir /data/transactions
Imports only sales department employees using a custom query. $CONDITIONS is required for splitting.
Hadoop
sqoop import --connect jdbc:mysql://localhost:3306/hrdb --username hr --password hrpass --query 'SELECT * FROM employees WHERE department = "Sales" AND $CONDITIONS' --split-by employee_id --target-dir /user/hadoop/sales_employees
Sample Program

This command imports the employees table from the MySQL database testdb into the HDFS directory /user/hadoop/employees. It uses 2 mappers to speed up the import.

Hadoop
sqoop import \
  --connect jdbc:mysql://localhost:3306/testdb \
  --username testuser \
  --password testpass \
  --table employees \
  --target-dir /user/hadoop/employees \
  --num-mappers 2
OutputSuccess
Important Notes

Always use $CONDITIONS in custom queries to allow Sqoop to split the data for parallel import.

Use --num-mappers to control parallelism; more mappers can speed up import but use more resources.

Passwords can be passed securely using --password-file instead of plain text.

Summary

Sqoop imports data from databases into Hadoop automatically.

Use connection string, table name, and target directory to specify import.

Parallel imports speed up data transfer using split columns and mappers.