Adding Science to Maximize Athletic Performance

Adding Science to Maximal Athletic Performance
Data-Driven Coaching

Adding Science to Maximal Athletic Performance

Why group-level research falls short for the individual, and how to use your own data to build a smarter training plan.

The underlying theme of this series is that we can significantly accelerate an individual's athletic development by taking a scientific approach. But what does "scientific" actually mean in this context? The answer might surprise you.

Too often, a "scientific" or "evidence-based" approach is treated as synonymous with "Is there a peer-reviewed paper that supports what I'm doing?" While published research can be a useful starting point, it is generally the wrong framework for optimizing an individual's training plan. In this post, I'll explain why, and, just as importantly, I'll walk through a different approach: one that uses the individual athlete's own data to refine their training over time.

The Central Question

What Does a "Scientific Approach" Actually Mean?

The Oxford English Dictionary defines science as the systematic and logical study of the natural world through observation, experimentation, and the testing of hypotheses. Encyclopedia Britannica's definition of the scientific method expands on this: observe, ask questions, develop hypotheses, conduct experiments, analyze data, draw conclusions.

Notice what's absent from those definitions: population sampling, statistical significance, and the other apparatus of traditional research methodology. Those concepts are tools of inferential statistics, the branch of statistics focused on making predictions about a population based on a sample. They're necessary for studying effects across large groups, but they can actively mislead you when working with a single athlete.

Traditional sports science research often runs counter to our first and most important principle: individualization. As we saw in the previous post, athletes exhibit a massive range of responses to identical training programs.

Variability in Training Response to a Standardized Program
Same program, radically different outcomes (HERITAGE Study)
0 -2 +3 +8 +13 ΔVO₂max (ml/kg/min) Mean ≈ +5 Individual participants (sorted by response)

While genetics plays a role in this variability, the bulk of it is not explained by genetic factors. Instead, it's been my experience that training response varies dramatically according to the type of training undertaken. Some athletes respond powerfully to one modality or zone while others respond to a completely different one.

Every healthy athlete is a training "responder" providing they are given the right type of training for them.

Looking at what works at the group level and assuming it will work for a given individual is going to offer limited help. We need a better approach.

Inferential statistics vs. machine learning

Inferential statistics focuses on drawing conclusions about a population from a sample. It works well if your athlete happens to match the population average. But if they're an outlier in any meaningful way, conclusions drawn from group data can be actively misleading.

Machine learning takes a fundamentally different approach. Rather than inferring population-level truths, it focuses on finding patterns in data and making predictions about new data. You feed it an individual athlete's training and performance history, it identifies patterns, and then it helps you predict what would happen if you changed the inputs.

That second paradigm is a far better fit for the modern coach. We're presented with lots of individual data, we want to find the patterns in it, and we want to make "what would happen if we did X" predictions. Let's walk through how to do this, step by step.

01
Step One

Don't Be Intimidated

"Predicting better than pure guesswork, even if not accurately, delivers real value. A hazy view of what's to come outperforms complete darkness by a landslide."

– Eric Siegel

While AI and machine learning can become complex at the fringes, most people underestimate the power of even the simplest approaches. Right now, there is very little competition because sports coaching remains largely a field of traditional "truths" handed down over generations. To this day, few coaches are actually measuring the efficacy of their approach. They cannot quantify, on any meaningful level, how right or wrong they are.

The first step: have the courage to call your shots. Make a prediction based on the data you have, assess the error, then see if you can get a little help from a machine to bring that error down.

02
Step Two

Choose Appropriate Tests

Before building any model, you need to answer a simple question: How will we assess that the training is working?

The answer varies by sport. For a sprint swimmer who competes regularly in a controlled environment, competition itself can serve as a practical and useful measure of progress. For an Ironman triathlete who might race only two or three times per year over wildly varied courses, you'll need intermediate field tests that correlate strongly with race performance.

An equally important consideration: does the test provide enough information to direct the training? Performance is often multi-faceted. For a sprint swimmer, it's not just "did the time improve?" but rather a breakdown: reaction time, breakout speed, first-half versus second-half splits, stroke rate versus stroke length. For an Ironman athlete, the factors multiply further: economy, aerodynamics, aerobic power, threshold, fueling, muscular endurance.

Factors Correlated with Ironman Performance
Based on 15 years of performance and lab data across a range of athletes
Low Moderate Very Strong Correlation with Race Performance → W at VT1 W at AeT W at LT2 VO₂max Power Economy Body Comp Fat Oxidation

For Ironman athletes, watts at the first Ventilatory Threshold stands out as both the strongest performance correlate and a highly repeatable test. The intensity is mild enough to be run frequently without disrupting training. In the absence of metabolic testing, watts at the Aerobic Threshold (first inflection on the lactate curve) also correlates strongly and provides an easy output measure to track regularly.

03
Step Three

Define Your Training Inputs

There are many ways to describe the training you prescribe, from broad markers of total training load down to specific set/rep/rest structures. More granularity means more prescriptive power (you might discover that 10-minute intervals work better than 3-minute intervals for a particular athlete), but it also means splitting your data into smaller and smaller buckets, requiring increasingly complex models and more data to feed them.

Rule of Thumb

Don't underestimate the power of a simple model. A simple model is always a good starting point for creating an initial benchmark, and you'll often surprise yourself with how useful it is, even with relatively basic training descriptors.

A solid starting point for inputs is Time in Zone: define a set of intensity categories and tally the time the athlete spends in each over a given training block. These zones could be based on heart rate, power, or pace, and could reference the athlete's personal thresholds or simply use fixed bins (e.g., every 25 watts).

04
Step Four

Put It All Together

For our hypothetical sprint swimmer, the setup might look like this. Six pace zones (defined in seconds per 100m), with race performance over 100m Long Course as the output. After six training blocks, each ending with a competition, you'd have a data table like the following:

Training History: Sprint Swimmer
Hours in each pace zone per training block, with competition results
Block Z0
>1:20
Z1
1:15-20
Z2
1:10-15
Z3
1:05-10
Z4
1:00-05
Z5
<60s
Total 100m
1252010820.565.555.4
228301210318454.2
32632149318553.8
430408841.591.553.2
5304510941.599.552.9
632479114210552.7
05
Step Five

Spot Patterns in the Data

Even just eyeballing the table above, a few patterns start to emerge. The swimmer's two fastest times came from the blocks with the highest Z1 volume (and the lowest Z2 volume). Z0 and Z2 don't seem to matter much. Total volume lines up almost perfectly with performance. And a bump in Z3 in Block 2 seemed helpful, but that block also had the third-highest total volume, so it's hard to disentangle.

Based on this quick-and-dirty analysis, you might plan the next block around higher total volume, more Z1 in particular, and perhaps a modest Z3 increase if it doesn't come at the cost of the above. Then you make a prediction: given the slowing rate of improvement, maybe you estimate 52.58 for the next competition.

06
Step Six

Make a Prediction and Measure Your Error

This process of using known numerical data points to predict an unknown one is, in modeling terms, called regression. The goal is to fit a model to the data, make a prediction, and then compare that prediction to reality when the result comes in.

Let's say the swimmer goes 52.7 in the next competition, despite the increased load. The error on your "back of the napkin" prediction was 0.12 seconds. Not bad, but can a machine do better?

Why This Matters

"Calling your shot" and then assessing the error is one of the most important habits for a coach committed to improvement. Without circling back to see what actually happened, you're in a prime position to fool yourself, and more importantly, to not learn from your mistakes.

07
Step Seven

Enlist the Help of a Machine

Computers are exceptionally good at pattern recognition. Even a simple Excel scatterplot of total hours versus performance gives you a useful linear regression equation. In our swimmer's case, plotting total hours against 100m time yields:

y = -0.1397 × hours + 67.243

Plugging in 105 hours gives a prediction of 52.57 seconds, basically the same as our mental estimate. The error: 0.13s. Slightly worse than our manual calculation, which isn't surprising since we used only one variable while our mental model was (informally) weighing several.

Multiple regression: the real power

Excel can do more than single-variable scatterplots. Its Analysis ToolPak includes a multiple regression function that weighs all available variables simultaneously. When we feed in time-in-zone data across all six zones, the model produces:

y = -0.08 × Z0 - 0.12 × Z1 + 0.18 × Z2 + 0.26 × Z3 + 56.44

Remember that for swim times, negative coefficients are "good" (they reduce the predicted time). So the model confirms that Z0 and Z1 training are beneficial for this swimmer, while Z2 and Z3 time is actually counterproductive. Z4 and Z5 got zero coefficients, meaning they added nothing useful.

Multiple Regression Coefficients by Zone
Negative = faster times (good). Positive = slower times (bad). For this specific swimmer.
0 Beneficial Detrimental -0.08 Z0 -0.12 Z1 +0.18 Z2 +0.26 Z3 0 Z4 0 Z5

Plugging in the Block 6 data, the model predicts 52.72 seconds, an error of only 0.02 from the actual 52.7. That's a dramatic improvement over both the back-of-the-napkin calculation (0.12s error) and the simple linear regression (0.13s error).

More importantly, the model revealed a crucial insight we missed entirely in our manual analysis: Z2 and Z3 work were actually hurting this swimmer's performance. We had increased Z3 thinking it might help, when all it did was offset the gains from additional Z1 work.

Using the model for "what if" analysis

Now the coach can ask the model predictive questions. What if we drop Z3 from 11 hours down to 9?

Predicted: 52.2s

That's a meaningful improvement from a simple reallocation of training emphasis. This is the iterative process of optimization:

Collect Data
Build Model
Assess Error
Adjust Plan
Feed Back
A Critical Nuance

Test vs. Training Data

In our example, we put a lot of value on that single 52.7 result. But maybe the swimmer didn't improve simply because of the Z3 increase. Maybe they had a bad night's sleep, or the pool conditions weren't ideal. To draw more robust conclusions, we need to test the model on data it hasn't seen.

This is a core principle of machine learning: always evaluate your model on held-out data. Instead of training on all five completed blocks, you train on the first four and have the model predict blocks five and six. The coefficients shift slightly, the predictions are a bit less accurate (because the model had less data to learn from), but you get a much more honest picture of the model's real-world predictive power.

As a general rule, the more data you can feed the model, the more accurate its predictions become. When you're working with an athlete over years, you accumulate enough blocks to comfortably hold 20% back as a test set to validate model accuracy.

Scaling Up

Let an AI Do the Heavy Lifting

Excel is fine for small problems, but it becomes tedious (at best) and impractical (at worst) when dealing with years of data across multiple athletes. Historically, that meant learning a programming language like Python to scale up your analysis. And while Python remains a powerful option for those inclined to learn it, we now have a dramatically more accessible alternative: large language models.

Tools like Claude, ChatGPT, and others can analyze your training data directly. You can upload a spreadsheet, describe what you're trying to figure out in plain English, and get back regression analyses, visualizations, pattern identification, and actionable predictions without writing a single line of code. The barrier to entry has essentially dropped to zero.

A Practical Workflow

1. Export your training data from your platform (TrainingPeaks, Garmin, intervals.icu, etc.) as a CSV or spreadsheet.

2. Upload it to an LLM like Claude and ask a specific question: "Which training zones are most strongly correlated with my race performance?" or "Build a regression model predicting my 100m time from my time-in-zone data."

3. Review the output, ask follow-up questions, and iterate. "What would my predicted time be if I shifted 5 hours from Z3 to Z1 next block?"

For our swimmer example, you could simply paste the training table into Claude and say: "Here are six blocks of training data with time in each pace zone and competition results. Which zones have the strongest relationship with faster times? What training distribution would you recommend for the next block to optimize my 100m time?" The model would identify the same patterns we found through multiple regression (Z1 beneficial, Z2/Z3 counterproductive) and could suggest specific adjustments, all in conversational language.

The underlying statistical methods haven't changed. What has changed is that you no longer need to be a programmer to access them. The coach's domain knowledge (knowing what to measure, what questions to ask, and how to interpret the results in context) remains irreplaceable. The LLM just handles the computation.

Where to go from here

In future posts, we'll explore including time as an additional dimension (how does Block 1 training affect Block 4 results?), non-linear modeling that can capture the curves and thresholds that linear regression misses, and how to structure your data collection so that these analyses become a natural part of your coaching workflow rather than an afterthought.

We've covered a lot of ground: from choosing the right inputs and outputs, to building a linear regression model in Excel, to implementing the same model in Python. But beyond the practical mechanics, the core message is this: base your training decisions not on the group-level results of a six-week study on untrained college students, but on actual data from the individual you're trying to help.

In upcoming posts, we'll go deeper into specific decisions, from macro-level periodization over long time frames to micro-level daily adjustments based on athlete readiness. Throughout, the principle remains the same.

Every athlete is an experiment of one.

References

Oxford English Dictionary – Google Books · Bouchard et al. on training response variability – PubMed · Encyclopedia of Machine Learning – Emerald · Siegel, Predictive AnalyticsGoogle Books

Next
Next

7 Principles of Effective Training