Practice - 5 Tasks

Answer the questions below

1fill in blank

easy

Complete the code to create a SparkSession named spark.

Apache Spark

from pyspark.sql import SparkSession
spark = SparkSession.builder.appName([1]).getOrCreate()

Drag options to blanks, or click blank then click option'

A"MyApp"

BSparkContext

CExecutor

DClusterManager

Attempts:

3 left

2fill in blank

medium

Complete the code to get the SparkContext from the SparkSession.

Apache Spark

sc = spark.[1]

Drag options to blanks, or click blank then click option'

Acontext

BsparkContext

CSparkContext

DgetContext

Attempts:

3 left

3fill in blank

hard

Fix the error in the code to submit a job to the cluster manager.

Apache Spark

rdd = sc.parallelize([1, 2, 3, 4])
result = rdd.[1](lambda x: x * 2).collect()

Drag options to blanks, or click blank then click option'

AflatMap

Bfilter

Creduce

Dmap

Attempts:

3 left

4fill in blank

hard

Fill both blanks to create a dictionary of executor IDs and their memory usage.

Apache Spark

executor_info = {executor.[1]: executor.[2] for executor in sc._jsc.sc().getExecutorMemoryStatus().keySet()}

Drag options to blanks, or click blank then click option'

Aid

BmemoryUsed

Chost

DmemoryTotal

Attempts:

3 left

5fill in blank

hard

Fill all three blanks to filter executors with memory greater than 4GB and create a list of their hostnames.

Apache Spark

hosts = [executor.[1] for executor, memory in sc._jsc.sc().getExecutorMemoryStatus().items() if memory [2] [3]]

Drag options to blanks, or click blank then click option'

Ahost

C4 * 1024 * 1024 * 1024

Did

Attempts:

3 left