KAFKA VS GOOGLE CLOUD PUBSUB: 2020 COMPARISON

FeatureConfluent Cloud KafkaGoogle Cloud PubsubNotes
Data RetentionSet retention per topic in Confluent Cloud, including unlimited retention with log compaction.Retains unacknowledged messages in persistent storage for 7 days from the moment of publication. There is no limit on the number of retained messages. Have to write custom subscriber/publisher to save beyond 7 days [L] + ongoing BAU [S]
ReplayA consumer request an “offset”, however the retention period is dictated by the broker config“Snapshots” can be created for later replay by these are limited to 7 days as per retention policy. As per above, a custom subscriber/publisher to save/replay message. [L] + ongoing BAU [S]
Message OrderingYes, within a partition, In general, messages are written to the broker in the same order that they are received by the producer client.No, Pub/Sub provides a highly-available, scalable message delivery service. The tradeoff for having these properties is that the order in which messages are received by subscribers is not guaranteed. While the lack of ordering may sound burdensome, there are very few use cases that actually require strict ordering.
Delivery SemanticsExactly-once delivery semanticsAt-least-once, exactly once possible with dataflow
LatencyAdvertised at being able to “Achieve sub 30 ms latency at scale”, no mention of this in SLA.Pub/Sub does not guarantee message delivery latency
Uptime“Service Level” Monthly Uptime Percentage of at least 99.95%.Is this 99.95% of GCP’s 99.95%?“Service Level Objective” Monthly Uptime Percentage to Customer of at least 99.95%Both offer credits to account if not met but this is unlikely to be suitable for an enterprise org
Schema RegistryYes for Avro schemas and very new (GA August 2019)No, Data Catalog is in beta and could be used to build one [XL]
IAM / ACL“Preview” for Role/SAMLACLs You are provided auth keys that you need to store/share/rotate. These could be stored in Cloud KMS, however, this would need to be automated [XL] + ongoing BAU [M]Standard Google IAM
EncryptionYes in transit and at rest with NO payload encryption. Clients are responsible for writing custom encryption/decryption connecting to (e.g) Cloud KMS. A custom library would need to be written that is used by everyone for publishing and subcribing [L]Cloud KMS(HSM/Software/BYOK/External Key Manager) with CMEK
VPC SecurityUnknown/No, can Confluent Cloud be made to respect VPC service controls?VPC Service Controls protection applies to all push and pull operations except existing Pub/Sub push subscriptions
Stream Processing“fully-managed KSQL”, no kafka streams, would have to run Kafka Streams/Storm cluster connecting to Confleunt Cloud which is likely to introduce latencyApache Beam / Cloud Dataflow fully managed
Costs per 130GB$37$39Based on example calc on confluent and google pricing calc
Priority QueuesYesNo, but can segregate by topic
Multi-zone high availabilityNot advertised “Contact Confluent”Yes

FINDING GOOGLE CLOUD IP RANGES: REFERENCE GUIDE

GCP Cloud IPs by region 

IP range with geolocation

More info here

MULTI-LANGUAGE PIPELINES WITH APACHE BEAM

Apache Beam is an open-source unified programming model and framework for defining and executing big data processing pipelines. It provides a way to write data processing code that is portable across different execution engines or runtimes, such as Apache Flink, Apache Spark, Google Cloud Dataflow, and more.

GETTING STARTED WITH TERRAFORM CLOUD DEVELOPMENT KIT

Terraform’s Cloud Development Kit (CDK) let’s you use other languages to define your cloud infra.

TERRAFORM 0.13: KEY FEATURES AND IMPROVEMENTS

No more copying and pasting modules.

DETECTING CRYPTO MINERS IN KUBEFLOW

“During April, we observed deployment of a suspect image from a public repository on many different clusters. The image is ddsfdfsaadfs/dfsdf:99. By inspecting the image’s layers, we can see that this image runs an XMRIG miner:” Source

OPTIMIZE DATAFLOW FOR REAL-TIME AND AGGREGATE DATA

A great way to split up your pipeline based on the urgency of results aggregate-data-with-dataflow

CALLING NATIVE LIBRARIES FROM JAVA

When you need to call native libraries from Java, there are several approaches available. Let’s explore each option with its pros and cons.

HOW TO CREATE YOUR OWN CRYPTOCURRENCY

To create your own cryptocurrency, you will need to:

  1. Create a blockchain. This is the underlying technology that will support your cryptocurrency. There are many different blockchain platforms available, such as Ethereum, Bitcoin, and EOS.
  2. Design your cryptocurrency. This includes deciding on the name, symbol, total supply, and distribution method. You will also need to create a mining algorithm.
  3. Create a wallet. This is where your cryptocurrency will be stored. There are many different wallets available, both hardware and software.
  4. Mine your cryptocurrency. This is the process of adding new blocks to the blockchain and earning rewards in the form of your cryptocurrency.
  5. List your cryptocurrency on an exchange. This will allow people to buy and sell your cryptocurrency.

Here are some of the steps involved in minting your own cryptocurrency:

FIXING GCP IAM PERMISSION ISSUES AFTER OUTAGES

After the recent GCP outage related to IAM, I found some odd behaviour with gsutil/gcloud. A script that had faithfully run for many months stopped working with: