We have a requirement where two incoming DataSet/DataFrame needs to go through multiple operations (like join, groupby, etc) to reach to a final state.
For example, the incoming dataframes are df1 and df2:
df3 = df1.groupby("key")
df4 = df3.join(df2)
...
Lets say that finally df7 is the dataframe that I need to send to writeStream.
Questions:
- Is there a way to achieve this in Structured Streaming?
- What is the major reason to not support this in straightforward manner?
PS: I came across this question and a possible solution using flatMapGroupWithState: Multiple aggregations in Spark Structured Streaming.
Can you please give an example how can the above scenario be done using flatMapGroupWithState for my first question and my second question is not part of the link above.
Thanks in advance