Overcoming the Problem of Multicollinearity in Sports Performance Data: A Novel Application of Partial Least Squares Correlation Analysis

Weaving, D.; Jones, B.; Ireton, M.; Whitehead, S.; Till, K.; Beggs, C. B.

Objectives:
The aim of this study was to identify the training load (TL) variables that most influence “end fitness” in young rugby league players using a novel “leave one variable out” (LOVO) partial least squares correlation analysis (PLSCA) methodology. This approach was designed to address the challenge of multicollinearity often encountered when analyzing data sets with multiple highly correlated variables in a sporting context.

Methods:
Sixteen male professional youth rugby league players (mean age 17.7 ± 0.9 years) participated in a 6-week pre-season training period. TL variables, including data from global positioning systems (GPS), micro-electrical-mechanical systems (MEMS), and players’ session rating of perceived exertion (sRPE), were collected. Participants also underwent a 30–15 intermittent fitness test (30-15IFT) before and after the training period to assess “starting fitness” and “end fitness.” Multicollinearity issues in the data prevented stable multiple linear regression (MLR), so a novel LOVO PLSCA adaptation was developed to quantify the relative importance of predictor variables and refine the MLR process.

Results:
The LOVO PLSCA identified the distance accumulated at very-high speed (>7 m/s) as the most important TL variable influencing improvement in player fitness. This variable explained a significant portion of the variance in “end fitness” (73%) when included in a refined MLR model along with “starting fitness” as a covariate, effectively eliminating multicollinearity issues.

Conclusions:
The LOVO PLSCA technique proved to be a valuable tool for evaluating the relative importance of predictor variables in data sets with significant multicollinearity. By using LOVO PLSCA as a filtering tool, a MLR model was developed that demonstrated a significant relationship between “end fitness” and the predictor variable “accumulated distance at very-high speed,” when controlling for “starting fitness.” This approach may assist sport scientists and coaches in analyzing data obtained from GPS and MEMS technologies effectively.

View this research