How Machine Learning Can Help Horse Racing Betting

Editor’s Note: This article is for educational and entertainment purposes only. If you wish to use the model presented for real money gambling, you do so at your own risk. Make sure it matches the terms and conditions of your bookmaker. Machine learning is widely used for the analysis and forecasting of many time series.

With massive amounts of historical data and processing power, machine learning models can sometimes provide extremely useful information and guidance when making sports betting decisions today. This article shows you how machine learning can help you design your horse racing betting strategy. We will use data from the home page of the Hong Kong Jockey Club, one of the oldest and largest horse racing institutions in the world. To avoid data leakage and estimate the actual performance of the model, we will only use matching data from early 2007 to fall 2019 to build the model and bet on new upcoming games.

We use this model and develop a unique investment strategy to place bets for a period of two months (2019/09 – 2019/11) and achieve positive returns in the experiment. As mentioned above, we will use all Hong Kong 2007-2019 games as training and testing kits.

And winter 2019 data for a set of tests to assess the overall performance of a betting portfolio

The training data contains 109 085 rows and 61 columns containing different information about each game: Race track (AWT does not have a specific race track description) Class: Race class of the game Distance Rfinishm: Target race time in centimeters (1/100 of a second) Rm1: End time of the 1st section of the race in centimeters of a second Rm2: time of the end of the second section of the race in centimeters of a second Rm3: time of the end of the third section of the race in centimeters of a second Rm4: time of the end of the 4th.

Race leg in centiseconds Rm5: End time of the fifth stage of the race in centiseconds Rm6: End time of the sixth stage of the race in centiseconds Horsenum: Horse ID Jname: JockeyTame: TrainerExweight: Handicapped weight carrying a horse Equipment for a horse Qualification: Horse qualification Qualification: Change of qualification of the horse with respect to the previous race. Upper weight: Weight of the horse compared to the previous race. Best time: Best time to finish a horse in a race at the same location, distance and route (in minutes). Second.

How Machine Learning Can Help Horse Racing Betting

Editor’s Note: This article is for educational and entertainment purposes only. If you wish to use the model presented for real money gambling, you do so at your own risk. Make sure it matches the terms and conditions of your bookmaker. Machine learning is widely used for the analysis and forecasting of many time series.

With massive amounts of historical data and processing power, machine learning models can sometimes provide extremely useful information and guidance when making sports betting decisions today. This article shows you how machine learning can help you design your horse racing betting strategy. We will use data from the home page of the Hong Kong Jockey Club, one of the oldest and largest horse racing institutions in the world. To avoid data leakage and estimate the actual performance of the model, we will only use matching data from early 2007 to fall 2019 to build the model and bet on new upcoming games.

We use this model and develop a unique investment strategy to place bets for a period of two months (2019/09 – 2019/11) and achieve positive returns in the experiment. As mentioned above, we will use all Hong Kong 2007-2019 games as training and testing kits.

Centi second Age

Horse’s age Priority: Horse’s priority in the race as indicated by Lastix’s trainer: Place in the previous 6 races Rank: Place in the current Match Positions: Place in each section of horses in the P1 race : Place in the 1st part of the horse P2: Place in Horse Section 2 P3: Horse Section 3 P4: Horse Section 4 Position P5: Horse Section 5 Position P6: Horse Section 6 Position P4 Horse M1: End time 1 leg of the horse in centimeters of a second M2: time of the end of the 2nd leg of the horse in centimeters of a second M3: time of the end of the third leg of the horse in centimeters of a second M4: time of the end of the 4th leg of the horse in centimeters of a second End of the 5th leg. The horse’s leg in centiseconds M6: The horse’s 6th leg finish time in centiseconds Finishm

The horse’s target time in centiseconds D1: The distance from the rank 1 horse in the 1st stage (0.25 means that the distance is within the distance of 1 Horse) D2: Distance from a Rank 1 horse in 2 sections (0.25 means that the distance is within 1 horse) D3: Distance from a Rank 1 horse in 3 sections (0.25 means that the distance is within the distance of 1 horse) D4: Distance from a rank 1 horse in the fourth segment (0.25 means that the distance is within 1 horse) D5.

Distance from a rank 1 horse in the fifth segment (0.25 means that the distance is within the distance of 1 horse)

D6: Distance from the horse ranked first in Section 6 (0.25 means distance within the distance of 1 horse). Date: The difference in dates between the horse’s previous game and the horse’s current game. Money: Cash prize of the race. Windist: distance from rank of horse 1. Win_t5: horse’s chances of winning 5 minutes before the race Win: horse’s final chances of winning Place_t5: horse’s chances of getting a place 5 minutes before the race) Ind_p la: Indicator of the top 3 places in the race (1 for the top 3, otherwise 0) The original data contains a lot of information.

We need to filter which ones are useful and try to create new functions from the data to help predict the results. I’m not going to go into details about function development, but here are some key points if you want to try it yourself. After you have made a relatively useful model prediction, list the best 1 and 3 probabilities for each race. I’ve spent a lot of time experimenting and exploring how models can generate positive returns.

The jumps are accompanied by a lot of uncertainty and human effort to eliminate possible unfair benefits. Betting strategy is becoming extremely important. After experimenting a lot with models in real games, I developed a strategy with three main concepts. Let’s talk about expected performance first. The most common and simple betting strategy is to set a return threshold, and only if the rate of return (win rate * win rate) is greater than the threshold.

We only need to calculate the odds and the rate.

Of return for each horse in a game based on the model’s predictions and bet if the rate of return is above a threshold. However, the choice of the threshold becomes a big problem. A low threshold generally results in aggressive rates and high capital gains / losses. Due to the great uncertainty in racing, the results vary greatly for both the lower and upper thresholds. The response speed is not enough, because in a racing game we also have to consider the performance of each horse compared to other horses in the same race.

In other words, we need to find the horse with the highest probability of winning compared to all other horses in the same game. To extend this concept, we were also able to find horses with the highest odds of winning not just in one game, but among all games in one day.

And we only trust these horses to significantly reduce risk. I have named these lower risk bets. You can find these horses by building another model based on the logarithmically transformed sum of the original forecast results. Now that we all have low risk horses and their returns, how much money should we invest in each horse? Turns out Kelly Criterion gives the best score. For single bets with two outcomes, one of which involves losing the entire bet and winning the bet multiplied by the payout percentage, the Kelly bet: Finally, we combine these three concepts together.

First, filter out all the low risk horses and calculate their performance.

Based on simulation of previous investment results to determine optimal performance threshold. If the return is above the threshold, use the Kelly test to determine what percentage of the fund to bet on. We spent two months and applied the final model and betting strategy to real games. The result is very satisfactory. We bet on 76% of all games and made a positive profit at the end of the two month period.

I won’t reveal the details of the implementation, but if you have any questions or are interested in my results, please leave a message below. Thanks for reading, I await your questions and thoughts. If you want to know more about data science and cloud computing, you can find me on Linkedin. Make studying your daily ritual.