A predictive model for analysing the starting pitchers' performance using time series classification methods

作者:Soto Valero Cesar*; Gonzalez Castellanos Mabel; Perez Morales Irvin
来源:International Journal of Performance Analysis in Sport, 2017, 17(4): 492-509.
DOI:10.1080/24748668.2017.1354544

摘要

Pitcher's performance is a key factor for winning or losing baseball games. Predicting when a starting pitcher will enter into an unfortunate pitching sequence is one of the most difficult decisionmaking problems for baseball managers. Since 2007, vast amounts of pitch-by-pitch records are available for free via the PITCHf/x system, but obtaining useful knowledge from this huge amount of data is a complex task. In this paper, we propose a novel model for analysing the performance of starting pitchers, determining when they should be removed from the game and replaced by a reliever. Our approach represents pitch-by-pitch sequences as time series data using baseball's linear runs and builds an instance-based model that learns from past experience using the k-Nearest Neighbours classification method. In order to compare time series of pitcher's performance, Dynamic Time Warping is used as the dissimilarity measure in conjunction with the Keogh's lower bound technique. We validate the proposed model using real data from 20 Major League Baseball starting pitchers during the 2009 regular season. The experimental results show a good performance of the predictive model for all pitchers; with values of Precision, Recall and F1 near to 0.9 when the outcomes of their last 10 throws are unknown.

  • 出版日期2017