The linked resources describe two different scenarios.
- The blog post discusses an upsert
DataStream -> Table conversion.
- The documentation describes the inverse upsert
Table -> DataStream conversion.
The following discussion is based on Flink 1.4.0 (Jan. 2018).
Upsert DataStream -> Table Conversion
Converting a DataStream into a Table by upsert on keys is not natively supported but on the roadmap. Meanwhile, you can emulate this behavior using an append Table and a query with a user-defined aggregation function.
If you have an append Table Logins with the schema (user, loginTime, ip) that tracks logins of users, you can convert that into an upsert Table keyed on user with the following query:
SELECT user, LAST_VAL(loginTime), LAST_VAL(ip) FROM Logins GROUP BY user
The LAST_VAL aggregation function is a user-defined aggregation function that always returns the latest added value.
Native support for upsert DataStream -> Table conversion would work basically the same way, although providing a more concise API.
Upsert Table -> DataStream Conversion
Converting a Table into an upsert DataStream is not supported. This is also properly reflected in the documentation:
Please note that only append and retract streams are supported when converting a dynamic table into a DataStream.
We deliberately chose not to support upsert Table -> DataStream conversions, because an upsert DataStream can only be processed if the key attributes are known. These depend on the query and are not always straight-forward to identify. It would be the responsibility of the developer to make sure that the key attributes are correctly interpreted. Failing to do so would result in faulty programs. To avoid problems, we decided to not offer the upsert Table -> DataStream conversion.
Instead users can convert a Table into a retraction DataStream. Moreover, we support UpsertTableSink that writes an upsert DataStream to an external system, such as a database or key-value store.