Practice - 5 Tasks
Answer the questions below
1fill in blank
easyComplete the code to read a streaming data source in Kappa architecture.
Hadoop
stream = spark.readStream.format([1]).load() Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using batch file formats like csv or json instead of streaming sources.
Confusing batch processing with streaming.
✗ Incorrect
In Kappa architecture, streaming data is often read from Kafka, so the format should be 'kafka'.
2fill in blank
mediumComplete the code to write streaming data to a sink in Kappa architecture.
Hadoop
query = stream.writeStream.format([1]).start() Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using batch sinks like parquet or jdbc which are not suitable for streaming output.
Choosing 'memory' which is for in-memory tables, not direct streaming output.
✗ Incorrect
For testing or debugging, writing streaming output to the console is common in Kappa architecture.
3fill in blank
hardFix the error in the code to correctly select the value field from Kafka streaming data.
Hadoop
values = stream.selectExpr("CAST([1] AS STRING)")
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Selecting 'key' instead of 'value' which contains the message.
Selecting metadata fields like 'topic' or 'timestamp' instead of the message.
✗ Incorrect
Kafka streaming data stores the actual message in the 'value' field, which needs to be cast to string.
4fill in blank
hardFill both blanks to filter streaming data for messages containing the word 'error'.
Hadoop
filtered = stream.filter([1].contains([2]))
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Filtering on 'key' instead of 'value'.
Using the wrong filter string like 'warning'.
✗ Incorrect
We filter the 'value' field for messages containing the string 'error'.
5fill in blank
hardFill all three blanks to create a streaming aggregation counting messages by key.
Hadoop
counts = stream.groupBy([1]).count().orderBy([2], [3])
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Ordering by 'key' instead of 'count'.
Using ascending order instead of descending.
✗ Incorrect
Group by 'key', count messages, then order by 'count' descending to see most frequent keys first.