I'm streaming data into a BigQuery table building an InsertAllRequest which is then inserted using the insertAll-method from com.google.cloud.bigquery.BigQuery. I git it all to work in the sense that I can insert data into the table but I'm out for a specific behavior: I'd like to implement some kind of a "composite key" in the table.
Here's what the table looks like:
Field name | Type | Mode
--------------------------------------
order_id | STRING | REQUIRED
modified_ts | TIMESTAMP | REQUIRED
order_sum | INTEGER | NULLABLE
order_reference | STRING | NULLABLE
So, I'd like the key to be order_id and modified_ts; with other words, I'd like to be able to track changes of an order over time. If an existing key is inserted again, I'd hope for some error - or just ignoring this new row (regarding it as a duplicate) would work fine for me as well.
Unfortunately, I didn't yet succeed in telling BigQuery to do so. Here's the code I tested:
String rowId = String.valueOf("order_id, modified_ts");
InsertAllRequest req = InsertAllRequest.newBuilder(ORDER)
.addRow(rowId, mapOrder(o, modifiedTs))
.build();
InsertAllResponse resp = bigQuery.insertAll(req);
log.info("response was: {}", resp.toString());
ORDER in newBuilder is a TableId-object and mapOrder(o, modifiedTs) maps the incoming object to a Map<String, Object>.
All works fine if I define rowId as String.valueOf("order_id") but obviously all updates of an order just update the existing row, not generating any history. The solution above with comma-separated column-names behaves the same way, simply ignoring modified_ts.
So, my question is simply: how can I get this to work? What I want is - somewhat simplified - the following:
order_id | modified_ts | data
------------------------------------------
1 | 2020-12-10 | some data
1 | 2020-12-15 | some changed data
2 | 2020-12-15 | some more data