This article will be of interest to algorithmic traders and those who want to figure out how to achieve stability of the results of test strategies when transferring them to real trading.
Can the parameters of a trading algorithm be reduced to 0?
Many people doubt that it is possible to create a strategy without parameters. We all know that market mechanisms are very complex - manual traders analyze a huge amount of information: company reports, correlations between markets, interest rates, order book parities, price levels, news, macroeconomic statistics, etc. Therefore, a strategy without parameters, at first glance, is a deliberately ineffective idea.
We do not take into account “pseudo-indicators”, which have no predictive power and indicators that are used incorrectly. Let's say that we ignore all of the above "parameters" of the strategy (which can be formalized) and some others. But does this ignorance mean that these parameters no longer affect the results of our trading?
Are all parameters essentially equal?
In theory, there should be some criterion showing objectively existing parameters that cannot be ignored in any way.
Examples of hidden parameters that are present in almost every strategy:
- time-frame, if we work according to time-frame candles (if we use other types of candles, then the number parameters may become even larger);
- indicators of risk management and money management. If you haven't heard anything about this, then you’re still using them, it is just that they’re chaotic and unformalized;
- if a trader, for example, draws levels, then they have a decent number of rigid parameters. Also applies to triangles and Fibonacci and all kinds of patterns.
- If you use volatility in your trading, then the choice of the volatility calculation method itself is already a parameter that requires analysis, if you do not know exactly which method works more efficiently. One way or another, all this will affect the results.
To avoid uncertainty, you need to choose the right strategy.
Below we will analyze one of the ways to determine:
- which parameter actually affects the result;
- what parameters maximize our profit;
- what ranges of parameters are preferable for us.
The first thing we need is a formalized strategy that generates equity. Such a strategy can be written for any strategy optimizer, which at the output level gives us the values of the equity curve, for example, WealthLab or TSLab.
Next, we need:
- universal metric of the success of a strategy, using which you can compare strategies with each other (at the choice of the reader);
- free RapidMiner or R language and environment for working with it.
We optimize the strategy:
- it is desirable that the optimization window include a large number of different market phases (flat, up-trend, down-trend);
- it is advisable to carry out brute-force optimization, if it is too expensive in terms of time and resources, then very detailed genetic or SWARM genetic optimization >= 70% of the results.
Next, we need to cut the parameters that least of all affect the result, and fix these parameters at the most stable and at the same time profitable area. Thus, we exclude emotions and speculation about the parameters and get numbers that can be trusted for sure.
As I wrote earlier, stability tables require strategy optimization data using 2 parameters, a third parameter is also possible, with a small spread of 3-10 values. In this case, we will be able to build several stability tables and select one of them, for example, based on the average value of the entire table.
So, let's go directly to the practice of forming the parameters.
Introduction to Data Mining
The most effective solution for determining the degree of influence of predictors on the result is to use special algorithms for dataMining and machine learning. Today we will analyze one of the most famous and effective ones, Random Forest.
So, RandomForest and similar algorithms will help us determine the importance of a particular strategy parameter.
Written in C#, it is much faster than R and has more parameters to configure, but with our data volumes (in the example, a million runs), the miner "crashes" when working with more than 60 trees. The pluses include that it loads all processor cores and you don't need to learn to program!
- Loading the data
The data should look like this:
2. Find RandomForest in the search (or another algorithm):
An example of a decision tree
3. Our main task is to identify the degree of influence of each of the parameters on the result, for this we use the function Weight by Importance.
4. Analyzing the result
In this example, we see that the first three parameters introduce minimal changes in the trading result into our strategy, that is, they are the least significant.
- R and RStudio
R only uses 13% of the CPU. For those who are not in a hurry, it is convenient, you can perform some other tasks in parallel. It is also possible to train 1000 trees, (tested), but you need to wait 3 days (on our data volume); as I mentioned above, Rapid Miner may not work correctly when using only 70 trees.
- Loading the data
2. Let's separate the independent parameters from the dependent ones and run Random Forest.
3. Consider the significance of the predictors. We get the following result:
Another plus of R is that Excel simply cannot process as many lines as in our example. As a rule, the data needs to be prepared, and here R helps us out.
Thus, it is convenient to use R to create a dataset, but it is faster to analyze in RapidMiner, they can be used in conjunction.
After we have determined which parameters are the least important in our case, we need to minimize them. RapidMiner is well suited for this task.
Minimization of parameters in Random Forest
The peculiarity of Random Forest in RapidMiner is that there are many settings on which the result depends:
There is a detailed help section on each of the items.
The parameters should be selected according to the task, surface analysis will reveal only the most obvious dependencies.
Since RapidMiner works very quickly, we can choose the settings so as to achieve the required depth.
As a result of the fact that our trees are random, and their number is small, you can make several runs to be sure and see the consolidated indicators!
We have identified which parameters are of great importance, and we can determine a stable positive area for each parameter, but first we will analyze the less important ones!
Stable positive parameters can be viewed in different ways, the simplest is the averages for all runs. Since this is a complete overkill, there is no need to be intimidated by the low average results; in the end we will single out a fairly large but profitable strategy area. The first parameter only has 2 values, so just see which parameter provided the best results on average!
The next parameter has 6 possible options. We look at the histogram of averages and see a normal distribution - this is good, since the market changes frequently, but the strategy will not start to underperform too much and too drastically.
Then there are 3 parameters.
From the 3rd parameter, you can take 10 values and build a stability table (HeatMap) with each of the parameters. Or, as I showed above, you can build a histogram based on the average values of the main metric and build a stability table with the best of them.
As a result, we defined the three parameters with the lowest importance index as constants, having previously selected the best values for each. For the two remaining parameters, which are of great importance, we will build a stability table and select a stable zone, which we will launch in real trading.
Strategy stability table (HeatMap)
So, today we:
– learned how to conduct research using algorithms for DATAMINING, using the example of RandomForest (decision trees) in 2 different ways (Rapid Miner and RStudio);
– indicated that the parameters are not equal. They have a different predictive ability and, accordingly, a different effect on the result;
– figured out how to reduce the number of parameters in such a way as to maximize the sustainability of the positive results of the strategy;
– learned how to identify the most stable parameters of a strategy using stability tables.