The premise was simple. Use “big” data analytics and machine learning models to predict the movement of stock prices. However, we had really “dirty” data and our Data Scientists were stuggling to seperate the noise from the signals.
In the context of share trading, signal generation is the process of identifying patterns in market data that suggest that a share price is likely to rise or fall in the near future. These patterns can be based on a variety of factors, such as historical price movements, volume data, and technical indicators.
Signal generators can be used to generate buy and sell signals for shares. A buy signal is generated when the signal generator identifies a pattern that suggests that a share price is likely to rise in the near future. A sell signal is generated when the signal generator identifies a pattern that suggests that a share price is likely to fall in the near future.
Signal generators can be used by traders to help them make informed decisions about when to buy and sell shares. However, it is important to note that signal generators are not always accurate. They can generate false signals, and they can be affected by market volatility.
Here are some of the most common signal generation techniques used in share trading:
- Moving averages: Moving averages are a simple but effective way to identify trends in market data. A moving average is a line that is calculated by averaging the closing prices of a share over a specified period of time. Moving averages can be used to generate buy and sell signals by identifying when a share price has crossed above or below a moving average.
- Bollinger bands: Bollinger bands are a technical indicator that is used to measure volatility. Bollinger bands are calculated by plotting a moving average around a standard deviation. Bollinger bands can be used to generate buy and sell signals by identifying when a share price has moved outside of the bands.
- Relative strength index (RSI): The RSI is a momentum indicator that is used to measure the strength of a trend. The RSI is calculated by comparing the magnitude of recent price gains to recent price losses. The RSI can be used to generate buy and sell signals by identifying when the RSI has reached overbought or oversold levels.
We spent a lot of time cleaning the data and introducing good old principles like “how can I run the model somewhere over than a laptop?”. This was a true startup, a bunch of people in a room trying to get stuff working. No calling the “helpdesk” to sort out your IT problems (I actually was the helpdesk).
I helped with the design and development of critical components needed for data collection and signal generation. The models were taking ~8 hours to backtest which was a real bottleneck. Using the power of Google Cloud we got this down to 20mins. It could run even faster using higher-spec VM’s but this was a sweet spot for us in terms of cost.
The models were written in R and I quickly understood enough to see that the models were written in a gigantic for each loop. They needed to be modified to allow for parallellism (queue an install of Shiny and some throwbacks to understand this Lisp decendent).
Here are some of the key differences between R and Java:
- Programming paradigm: R is an interpreted, dynamically typed, general-purpose programming language. Java is a compiled, statically typed, general-purpose programming language.
- Main purpose: R is primarily used for statistical computing and graphics. Java is a more general-purpose language that can be used for a wide variety of tasks, including web development, mobile development, and enterprise applications.
- Libraries: R has a wide range of statistical libraries, including packages for linear regression, time series analysis, and machine learning. Java also has a number of libraries for statistical computing, but the R libraries are generally considered to be more comprehensive and easier to use.
- Speed: R is generally slower than Java for most tasks. This is because R is an interpreted language, while Java is a compiled language.
- Community: Both have a large and active community of users and developers.
Here is a table summarizing the key differences between R and Java:
Feature | R | Java |
---|---|---|
Programming paradigm | Interpreted, dynamically typed | Compiled, statically typed |
Main purpose | Statistical computing and graphics | General-purpose |
Libraries | Wide range of statistical libraries | Statistical libraries, but not as comprehensive as R |
Speed | Slower than Java | Faster than R |
Community | Large and active | Large |
I also coached colleagues in GCP, Excel, G-Suite, Git and Bash. We used a Bloomberg plugin for Excel that needed some TLC (Tender Loving Care) to get it working correctly.
We also used some Java and Spring Boot for the mini front/middle/back office work.