Stock Market Prediction With Machine Learning
We set out to build equity-trading signals using large, messy datasets and machine learning models. Initial prototypes ran on laptops and took eight hours to backtest a single strategy.
Contributions
- Designed cloud-native ingestion pipelines to clean and normalise vendor data (Bloomberg, alternative datasets).
- Ported R backtests to parallel execution using Google Cloud Compute Engine, reducing runtime from eight hours to ~20 minutes.
- Stood up supporting services in Java/Spring Boot for portfolio control, audit logging, and operations dashboards.
- Coached the team on Git, Bash, Google Workspace, and GCP fundamentals to improve day-to-day velocity.
Technology Stack
- Compute: Google Cloud Compute Engine, managed instance groups
- Analytics: R (parallel and Shiny packages), Python utilities for data prep
- Tooling: Git, Cloud Storage, Bloomberg Excel add-ins for data sanity checks
Lessons Learned
- Data hygiene beats model complexity; invest early in deduplication and feature validation pipelines.
- Parallelism only helps when models avoid shared state—refactor loops to be vectorised or embarrassingly parallel.
- Balance cost vs. speed: tune VM shapes to hit an optimal price/performance point before scaling out.
Compliance Reminder
Trading strategies and market predictions carry material financial risk. Coordinate with compliance, risk management, and legal teams before running models with live capital or distributing investment insights.