Posts

Running Flink in Production

This is a great watch for those beginning their journey with Flink.

Posts

Managing GCP projects with Terraform

An invaluable start on how to start managing GCP projects with Terraform. I wish I’d found this a year ago.

Posts

Terraform init in the real world

Rather than fully configuring your backend.tf in a file.```

terraform {
backend “gcs” {
bucket = “my-bucket-123”
prefix = “terraform/state”
}

I prefer to use the command line in order avoid polluting the code with any environment specific names.

terraform init \
-backend-config=“bucket=my-bucket-123” \
-backend-config=“prefix=terraform/state

Posts

Opinionated Google Cloud Platform projects

I’m glad Google are finally starting to embrace Terraform by creating their own modules. Version 0.1.0 of the project-factory looks really promising.

Posts

Managing Flink Jobs

The DA Platform is a huge step forward for running Flink at scale. I was lucky enough to see a demo and was really impressed. Far more advanced that the what can be achieved with Dataflow at the moment.

Posts

How to create an effective SRE culture

Here are some tips on how to create an effective SRE culture:

Start with the right mindset. SRE is a mindset that sees reliability as everyone’s responsibility, not just the responsibility of the SRE team. It is important to create a culture where everyone is empowered to take ownership of reliability and to make decisions that will improve the reliability of the systems they work on.
Embrace failure. Failure is inevitable, so it is important to create a culture where failure is seen as an opportunity to learn and improve. The SRE team should be empowered to experiment and to take risks, knowing that they will not be punished for failure.
Promote collaboration. SRE is a team sport, so it is important to create a culture where collaboration is encouraged. The SRE team should work closely with other teams, such as development, operations, and security, to ensure that the systems are reliable.
Automate everything. Automation is essential for SRE. By automating tasks, the SRE team can free up time to focus on more strategic work. It is also important to automate the collection of data so that the SRE team can have a clear understanding of the health of the systems.
Measure everything. SRE is data-driven, so it is important to measure everything. The SRE team should collect data on the performance of the systems, the number of incidents, and the time it takes to resolve incidents. This data can be used to identify areas where improvements can be made.
Celebrate successes. It is important to celebrate successes, both big and small. This will help to keep the SRE team motivated and to create a positive culture.

By following these tips, you can create an effective SRE culture that will help to improve the reliability of your systems.

Posts

How to deliver microservices

Here are some tips on how to deliver reliable, high-throughput, low-latency (micro)services:

Design your services for reliability. This means designing your services to be fault-tolerant, scalable, and resilient. You can do this by using techniques such as redundancy, load balancing, and caching.
Use the right tools and technologies. There are a number of tools and technologies that can help you to deliver reliable, high-throughput, low-latency microservices. These include messaging systems, load balancers, and caching solutions.
Automate your deployments. Automated deployments can help you to quickly and easily deploy new versions of your microservices. This can help to improve reliability by reducing the risk of human errors.
Monitor your services. It is important to monitor your services so that you can identify and address problems quickly. You can use a variety of monitoring tools to collect data on the performance of your services.
Respond to incidents quickly. When incidents occur, it is important to respond quickly to minimize the impact on your users. You should have a process in place for responding to incidents that includes identifying the root cause of the problem and taking steps to fix it.

By following these tips, you can deliver reliable, high-throughput, low-latency microservices.

Posts

Predict the stock market

The premise was simple. Use “big” data analytics and machine learning models to predict the movement of stock prices. However, we had really “dirty” data and our Data Scientists were stuggling to seperate the noise from the signals. We spent a lot of time cleaning the data and introducing good old principles like “how can I run the model somewhere over than a laptop?”. This was a true startup, a bunch of people in a room trying to get stuff working. No red tape, no calling the “helpdesk” to sort out your IT problems (I actually was the helpdesk).

Posts

Delta risk

QuantLib is a free and open-source software library for quantitative finance. It provides a wide range of functionality for pricing and risk-managing financial derivatives, including interest rate swaps.

To calculate the delta risk of an interest rate swap in Python using QuantLib, you can follow these steps:

Import the necessary QuantLib modules:

Python

import QuantLib as ql

Create a QuantLib YieldTermStructure object to represent the current interest rate curve:

Python

Posts

Taming the stragglers in Google Cloud Dataflow

I’m currently bench-marking Flink against Google Cloud Dataflow using the same Apache Beam pipeline for quantitative analytics. One observation I’ve seen with Flink is the tail latency associated with some shards.

Google Cloud Dataflow can optimise away stragglers in large jobs using “Dynamic Workload Rebalancing". As far as I know, Flink is currently unable to perform similar optimisations.