Consider the following Sqoop command to import data from a MySQL database into HDFS:
sqoop import \ --connect jdbc:mysql://localhost/employees \ --username user \ --password pass \ --table employees \ --target-dir /user/hadoop/employees_data \ --num-mappers 1
What will be the result of running this command?
Think about the default file format and the effect of '--num-mappers 1'.
The command imports the 'employees' table data into the specified HDFS directory as text files. The '--num-mappers 1' option means only one mapper is used, so the import is done in a single task. The default file format is text unless specified otherwise.
In Sqoop, you want to speed up data import by running multiple parallel tasks. Which command-line option controls how many parallel mappers Sqoop uses during import?
Look for the option that specifies the number of mappers.
The '--num-mappers' option sets how many parallel map tasks Sqoop will use to import data. The other options do not exist in Sqoop.
Given this command:
sqoop import \ --connect jdbc:mysql://localhost:3306/employees \ --username user \ --password pass \ --table employees \ --target-dir /user/hadoop/employees_data
The command fails with a connection refused error. What is the most likely cause?
Connection refused usually means the server is unreachable.
A connection refused error means Sqoop cannot reach the MySQL server at the given host and port. This usually means the server is down or blocked by firewall. Other options would cause different errors.
Run this Sqoop import command:
sqoop import \ --connect jdbc:mysql://localhost/employees \ --username user \ --password pass \ --table employees \ --target-dir /user/hadoop/employees_data \ --num-mappers 4
Assuming the import succeeds, how many part files will be created in the target directory?
Each mapper writes one output file.
Sqoop uses one mapper per parallel task, and each mapper writes one part file in the target directory. So with 4 mappers, 4 part files are created.
You want to import only employees with salary greater than 50000 from the 'employees' table using Sqoop. Which option should you use to filter rows during import?
Look for the option that allows SQL WHERE clause filtering.
The '--where' option lets you specify a SQL WHERE clause to filter rows during import. The '--query' option requires a full query with a placeholder for splitting, so it is more complex to use.