The premise was simple. Use “big” data analytics and machine learning models to predict the movement of stock prices. However, we had really “dirty” data and our Data Scientists were stuggling to seperate the noise from the signals. We spent a lot of time cleaning the data and introducing good old principles like “how can I run the model somewhere over than a laptop?”. This was a true startup, a bunch of people in a room trying to get stuff working. No red tape, no calling the “helpdesk” to sort out your IT problems (I actually was the helpdesk).
I helped with the design and development of critical components needed for data collection and signal generation. The models were taking ~8 hours to backtest which was a real bottleneck. Using the power of Google Cloud we got this down to 20mins. It could run even faster using higher-spec VM’s but this was a sweet spot for us in terms of cost.
The models were written in R and I quickly understood enough to see that the models were written in a gigantic for each loop. They needed to be modified to allow for parallellism (queue an install of Shiny and some throwbacks to understand this Lisp decendent).
I also coached colleagues in GCP, Excel, G-Suite, Git and Bash. Yes Excel. We were using a Bloomberg plugin for Excel that needed some TLC (Tender Loving Care) to get it working correctly.
We also used some Java and Spring Boot for the mini front/middle/back office work.