0
0
Hadoopdata~10 mins

Sqoop for database imports in Hadoop - Step-by-Step Execution

Choose your learning style9 modes available
Concept Flow - Sqoop for database imports
Start Sqoop Import Command
Connect to Database
Run SQL Query to Extract Data
Convert Data to Hadoop Format
Store Data in HDFS
Import Complete
Sqoop imports data by connecting to a database, extracting data with SQL, converting it, and storing it in Hadoop.
Execution Sample
Hadoop
sqoop import \
  --connect jdbc:mysql://dbhost/dbname \
  --username user \
  --password pass \
  --table employees \
  --target-dir /user/hadoop/employees
This command imports the 'employees' table from MySQL into Hadoop's HDFS directory.
Execution Table
StepActionDetailsResult
1Start Sqoop ImportRun sqoop import commandSqoop process begins
2Connect to DatabaseConnect to jdbc:mysql://dbhost/dbnameConnection established
3AuthenticateUse username and passwordAuthentication successful
4Run SQL QuerySELECT * FROM employeesData rows fetched
5Convert DataConvert rows to Hadoop format (e.g., text files)Data converted
6Store DataWrite data to /user/hadoop/employees in HDFSData stored in HDFS
7Import CompleteClose connections and finishImport finished successfully
💡 Import finishes after data is stored in HDFS and connections close
Variable Tracker
VariableStartAfter Step 2After Step 4After Step 6Final
ConnectionNoneConnectedConnectedConnectedClosed
Data RowsNoneNoneFetchedStoredStored
HDFS DirectoryEmptyEmptyEmpty/user/hadoop/employees/user/hadoop/employees
Key Moments - 3 Insights
Why does Sqoop need a database connection before importing?
Sqoop must connect to the database (see Step 2 in execution_table) to access and extract the data.
What happens to the data after it is fetched from the database?
After fetching (Step 4), Sqoop converts the data into Hadoop-compatible format before storing it (Step 5 and 6).
Why is the target directory important in Sqoop import?
The target directory (tracked in variable_tracker) is where the imported data is saved in HDFS for Hadoop to use.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution_table, what is the result after Step 3?
AAuthentication successful
BData rows fetched
CConnection established
DImport finished successfully
💡 Hint
Check the 'Result' column for Step 3 in the execution_table.
According to variable_tracker, what is the state of 'Data Rows' after Step 6?
ANone
BStored
CConverted
DFetched
💡 Hint
Look at the 'Data Rows' row under 'After Step 6' in variable_tracker.
If the database connection fails at Step 2, what will happen to the import process?
AData will be fetched anyway
BData will be stored in HDFS empty
CImport will stop before fetching data
DImport will complete successfully
💡 Hint
Refer to execution_table Step 2 and Step 4 to understand the flow dependency.
Concept Snapshot
Sqoop Import Syntax:
sqoop import --connect <jdbc_url> --username <user> --password <pass> --table <table_name> --target-dir <hdfs_path>

Behavior:
Connects to DB, extracts data, converts it, stores in HDFS.

Key Rule:
Target directory must exist or be writable in HDFS.

Import stops if DB connection fails.
Full Transcript
Sqoop imports data from a database into Hadoop by running a command that connects to the database using JDBC. It authenticates with username and password, then runs a SQL query to fetch data from the specified table. The data is converted into a Hadoop-friendly format and saved into a target directory in HDFS. The process ends by closing connections. Variables like connection status, data rows, and HDFS directory change state step-by-step during the import. If the connection fails, the import stops early. The target directory is where the data is stored for Hadoop to use.