Posts

Reverse engineering an existing GCP project with terraformer

It can be tough to try to reverse engineer an existing project that has never used terraform. Terraformer can look at an existing project and generate the corresponding terraform code for you. I tried it out on an existing legacy project which used Google Cloud Storage, BigQuery and various service accounts. The setup was a little tricky so I put together a script to simply things. The script assumes you have gcloud setup or a service account key/impersonation and you may need to adjust the –resources parameter.

Posts

AWS vs Azure vs GCP

Posts

Raspberry Pi/Raspbian - chromium/chromedriver crash after upgrade to 99.0.4844.51

Upgraded to chromedriver 99.0.4844.51 on Raspbian(bullseye) and seeing this in your chromedriver.log?

[0312/111354.689372:ERROR:egl_util.cc(74)] Failed to load GLES library: /usr/lib/chromium-browser/libGLESv2.so: /usr/lib/chromium-browser/libGLESv2.so: cannot open shared object file: No such file or directory [0312/111354.709636:ERROR:viz_main_impl.cc(188)] Exiting GPU process due to errors during initialization [0312/111354.735541:ERROR:gpu_init.cc(454)] Passthrough is not supported, GL is disabled, ANGLE is

Add “–disable-gpu” as an option when setting up the browser. e.g. for selenium/java:

ChromeOptions options = new ChromeOptions() options.addArguments("–disable-gpu")

It looks like the behaviour has changed as this “shouldn’t” be required. More about flags here

Posts

Undelete bigquery table

One hour ago:

bq cp mydataset.table@-3600000 mydataset.table_restored

Absolute (ms since UNIX epoch) GMT: Wednesday, 26 May 2021 13:41:53 = 1622036513000 https://www.epochconverter.com/

bq cp mydataset.table@1622036513000 mydataset.table_restored

Java 17 Features

Pseudo-Random Number Generators (PRNGs) are getting a major update in Java with the release of JEP 356. New interfaces and implementations make it easier to use different algorithms interchangeably and offer better support for stream-based programming. This is a great improvement for Java developers who require randomness in their applications.

The JDK is constantly evolving and improving, and part of that process is ensuring that internal APIs are properly encapsulated. JEP 403 represents a step in that direction, by removing the –illegal-access flag. This will prevent JDK users from accessing internal APIs, except for critical ones like sun.misc.Unsafe.

Posts

Bigquery row level security

To enable BigQuery row-level security, you can follow these steps:

Go to the BigQuery console and select the Datasets tab.
Click the name of the dataset that you want to enable row-level security for.
Click the Security tab.
Click the Row-level security tab.
Click the Create policy button.
In the Create policy dialog, enter a name for your policy.
In the Principals section, select the users or groups that you want to grant access to the data.
In the Filter section, enter a SQL filter that defines the rows that the users or groups will be able to access.
Click the Create button.

Once you have created a row-level security policy, users or groups that are not included in the policy will not be able to access the data in the table.

Posts

Confluent Cloud Kafka vs Google Cloud Pubsub Feature compare 2020

Feature	Confluent Cloud Kafka	Google Cloud Pubsub	Notes
Data Retention	Set retention per topic in Confluent Cloud, including unlimited retention with log compaction.	Retains unacknowledged messages in persistent storage for 7 days from the moment of publication. There is no limit on the number of retained messages. Have to write custom subscriber/publisher to save beyond 7 days [L] + ongoing BAU [S]
Replay	A consumer request an “offset”, however the retention period is dictated by the broker config	“Snapshots” can be created for later replay by these are limited to 7 days as per retention policy. As per above, a custom subscriber/publisher to save/replay message. [L] + ongoing BAU [S]
Message Ordering	Yes, within a partition, In general, messages are written to the broker in the same order that they are received by the producer client.	No, Pub/Sub provides a highly-available, scalable message delivery service. The tradeoff for having these properties is that the order in which messages are received by subscribers is not guaranteed. While the lack of ordering may sound burdensome, there are very few use cases that actually require strict ordering.
Delivery Semantics	Exactly-once delivery semantics	At-least-once, exactly once possible with dataflow
Latency	Advertised at being able to “Achieve sub 30 ms latency at scale”, no mention of this in SLA.	Pub/Sub does not guarantee message delivery latency
Uptime	“Service Level” Monthly Uptime Percentage of at least 99.95%.Is this 99.95% of GCP’s 99.95%?	“Service Level Objective” Monthly Uptime Percentage to Customer of at least 99.95%	Both offer credits to account if not met but this is unlikely to be suitable for an enterprise org
Schema Registry	Yes for Avro schemas and very new (GA August 2019)	No, Data Catalog is in beta and could be used to build one [XL]
IAM / ACL	“Preview” for Role/SAML, ACLs You are provided auth keys that you need to store/share/rotate. These could be stored in Cloud KMS, however, this would need to be automated [XL] + ongoing BAU [M]	Standard Google IAM
Encryption	Yes in transit and at rest with NO payload encryption. Clients are responsible for writing custom encryption/decryption connecting to (e.g) Cloud KMS. A custom library would need to be written that is used by everyone for publishing and subcribing [L]	Cloud KMS(HSM/Software/BYOK/External Key Manager) with CMEK
VPC Security	Unknown/No, can Confluent Cloud be made to respect VPC service controls?	VPC Service Controls protection applies to all push and pull operations except existing Pub/Sub push subscriptions
Stream Processing	“fully-managed KSQL”, no kafka streams, would have to run Kafka Streams/Storm cluster connecting to Confleunt Cloud which is likely to introduce latency	Apache Beam / Cloud Dataflow fully managed
Costs per 130GB	$37	$39	Based on example calc on confluent and google pricing calc
Priority Queues	Yes	No, but can segregate by topic
Multi-zone high availability	Not advertised “Contact Confluent”	Yes

Posts

GCP ip ranges

GCP Cloud IPs by region

IP range with geolocation

More info here

Posts

multi language beam pipelines

Apache Beam is an open-source unified programming model and framework for defining and executing big data processing pipelines. It provides a way to write data processing code that is portable across different execution engines or runtimes, such as Apache Flink, Apache Spark, Google Cloud Dataflow, and more.

Apache Beam’s portability framework allows you to write your data processing logic once and then run it on different execution engines without modifying the code. This eliminates the need to rewrite or refactor the code for each specific execution engine, saving time and effort.

Posts

Terraform Cloud Development Kit

Terraform’s Cloud Development Kit (CDK) let’s you use other languages to define your cloud infra.

https://github.com/hashicorp/terraform-cdk/blob/master/examples/