Optimizing Stragglers in Google Cloud Dataflow

17/11/2017
One-minute read

I’m currently bench-marking Flink against Google Cloud Dataflow using the same Apache Beam pipeline for quantitative analytics. One observation I’ve seen with Flink is the tail latency associated with some shards.

Google Cloud Dataflow can optimise away stragglers in large jobs using “Dynamic Workload Rebalancing". As far as I know, Flink is currently unable to perform similar optimisations.

gcp data streaming performance cloud