Wearables: What They Can and Can't Measure
This page is educational. It describes what published research has measured. It is not medical advice and does not replace consultation with a qualified healthcare professional.
This content is educational. It describes what consumer wearables measure and the limits of those measurements. It is not medical advice or device recommendation.
Why this matters
Consumer wearables have surfaced personal health data to a scale that didn't exist a decade ago. Apple Watch alone is on more than 100 million wrists. Oura, Whoop, Garmin, Fitbit, Polar, and others add tens of millions more. Most consumers now have access to continuous data on heart rate, heart rate variability, sleep, activity, and increasingly metabolic and respiratory metrics.
The question this page addresses is what those numbers actually mean. Some wearable measurements are highly accurate. Others are derived estimates with meaningful error bars. A few are interesting but currently more marketing than measurement.
This page describes what published validation studies have measured. It does not recommend a device, because the right device for any individual depends on what they want to measure and how precise they need to be.
The sensor types
Most consumer wearables combine a small set of sensor types:
| Sensor | What it directly measures | Used to derive |
|---|---|---|
| Optical (PPG) | Light absorption in skin tissue | Heart rate, HRV, blood-oxygen estimate, respiratory rate estimate |
| Accelerometer | 3-axis motion | Activity, sleep movement, posture, fall detection |
| Gyroscope | Rotational movement | Exercise type detection, sleep position |
| Skin temperature | Skin surface temperature | Trend monitoring, cycle tracking |
| ECG (single-lead) | Electrical heart signal | Heart rhythm, atrial fibrillation detection (some devices) |
| GPS | Position | Distance, pace, route |
| Altimeter | Pressure | Elevation, stairs |
| Microphone | Sound | Sleep snoring detection (some devices) |
Everything else is a downstream calculation from these raw signals. Understanding what the device is measuring versus what it is inferring is the key distinction.
What wearables measure well
Heart rate (during steady state)
Optical heart rate measurement at rest and during steady-state activity has been validated extensively. Apple Watch, Garmin, Fitbit, and similar devices have reported correlation coefficients above 0.95 with chest-strap or ECG reference measurement in steady-state conditions (Bent et al. 2020; Nelson et al. 2019). Error margins are typically within 2-5 beats per minute.
This is the most-validated wearable measurement and is reliable enough for most personal use cases.
Steps and distance (with GPS)
Step counting on the wrist has reported correlation coefficients of approximately 0.85-0.95 with reference step counters. Distance via GPS is accurate within a few percent on outdoor activity. Indoor distance estimation from wrist motion is less accurate and varies meaningfully by activity type [Case et al. 2015].
Sleep timing (not depth)
Sleep onset and offset detection has been validated reasonably well. Wearables identify when sleep starts and ends within roughly 10-20 minutes of polysomnography (the gold-standard sleep lab measurement) in healthy adults [Roomkham et al. 2018]. Total sleep time estimates are typically within 30 minutes.
This is enough for tracking sleep duration trends. It is not enough for diagnosing sleep disorders.
Heart rate during exercise (some conditions)
For steady-state cardiovascular exercise (running, cycling at constant pace), wrist-worn heart rate has reported good agreement with chest-strap measurement. For variable-intensity exercise (interval training, weightlifting, sports with rapid motion), accuracy degrades [Gillinov et al. 2017].
What wearables measure with meaningful error
Heart rate during high-intensity intervals
Wrist-worn HR sensors struggle with rapid changes in heart rate, motion artefact during high-impact activity, and tattoo coverage. Validation studies have reported 10-20% error margins during high-intensity interval training compared with chest-strap measurement [Gillinov et al. 2017].
For interval training where precise HR matters, chest straps remain the more accurate option.
HRV (heart rate variability)
Consumer HRV measurement is more nuanced than the marketing suggests. Most wearables measure HRV during sleep (a low-noise window) and report a single daily number. The underlying calculation uses RMSSD or SDNN depending on the device, and devices report results on different scales.
Comparison between devices is unreliable. Comparison within a single device over time is more reliable but still subject to noise from:
- Position during sleep (side vs back)
- Movement during the night
- Optical signal quality
- Ambient temperature changes
For tracking personal HRV trends over weeks and months, wearables have utility. For comparing HRV between people or across devices, the numbers are not directly comparable [Stone et al. 2022].
Sleep stages
This is where most wearables overpromise. Polysomnography distinguishes sleep stages (light, deep, REM) using EEG, EOG, and EMG signals from electrodes placed directly on the skin. Wearables infer sleep stages from motion and heart rate patterns.
Validation studies have reported wide variability. Apple Watch sleep stages agreed with polysomnography 70-75% of the time for total sleep but only 50-60% of the time for specific stages [Roberts et al. 2020]. Oura ring has reported similar figures in published validation studies [de Zambotti et al. 2019]. Whoop showed comparable agreement levels [Berryhill et al. 2020].
The pattern across devices: total sleep time and sleep timing are reasonably accurate; specific stage detection has substantial error. Individual nights can be quite wrong; trends across many nights are more reliable.
Blood oxygen (SpO2)
Consumer SpO2 measurement from the wrist has variable accuracy. Apple Watch and Garmin have reported approximately 4-5% mean absolute error compared with fingertip pulse oximetry — meaningful enough that a wrist SpO2 reading should not be used to make medical decisions [Pipek et al. 2021].
The signal can be useful for trend detection but is too noisy for clinical use.
Respiratory rate
Wearable respiratory rate estimation has reported reasonable accuracy at rest (within 1-2 breaths per minute) but degrades during activity. The signal is derived from heart-rate variation patterns, which is reasonable at rest but contaminated by motion during exercise [Charlton et al. 2017].
What wearables estimate (not measure)
VO2 max
Apple Watch, Garmin, Fitbit, and Polar all report VO2 max estimates. None of them measure VO2 max — gas-exchange analysis during an incremental exercise test is the lab gold standard for that.
Wearables estimate VO2 max from heart rate response, age, sex, weight, and sometimes pace and altitude data. Published validation studies have reported correlation coefficients of approximately 0.85 with lab-measured VO2 max in trained populations and lower correlations in untrained populations [Cao et al. 2022].
The estimates are usable for tracking personal fitness trends and for getting a rough population percentile. They are not accurate enough for elite training prescription.
Cardiovascular age / fitness age
Several devices report a "cardiovascular age" or "fitness age" number derived from VO2 max estimates and demographic data. This is a downstream calculation from an already-estimated number. The label suggests precision that the underlying measurement does not support.
Caloric expenditure
Wearable calorie estimates have reported some of the largest error margins in the validation literature — typically 25-40% off reference indirect calorimetry [Shcherbina et al. 2017]. The error is large enough that calorie tracking from wearables is broadly unreliable for individual-meal-level decisions, though may show usable trends across many days.
Stress
"Stress" scores combine HRV, heart rate, and sometimes skin temperature into a single number. There is no validated reference measurement for "stress" in the way there is for sleep stages — researchers can measure cortisol or self-reported stress but neither is what a wearable computes. Stress scores have weak validation evidence and should be interpreted as broad trend indicators, not precise readings.
Recovery scores
Whoop, Garmin, Oura, and others compute recovery scores combining HRV, sleep, and activity. These scores are useful for visualising trends, but the underlying calculations are proprietary and not directly validated against athletic performance outcomes in published studies.
How to read your wearable data
For consumers interested in using wearable data effectively, the published research suggests:
- Trust steady-state heart rate and total sleep time. These are the best-validated measurements.
- Treat sleep stage breakdowns as approximate. Total sleep matters more than the deep/REM split your device shows.
- Compare HRV to yourself over weeks, not to other people. Different devices use different formulas and aren't directly comparable.
- Treat VO2 max and recovery scores as personal trend indicators. Useful for "am I improving" — not for absolute fitness assessment.
- Disregard calorie counts for any meaningful decision. The error margins are large enough to be misleading.
- Cross-reference unusual readings. If your wearable flags an irregular heart rhythm, follow up with a clinical-grade ECG.
The research-quality lens on wearables: they are useful for population-level pattern detection and personal trend monitoring; they are not clinical instruments and should not replace clinical evaluation.
Related Proco pages
- VO2 max and mortality: what the research shows (when archived spoke is republished, link to /longevity/vo2-max-mortality)
- The wellness economy in 2026
- How to read a clinical trial
- Health misinformation: the scale
Sources
-
Bent B, Goldstein BA, Kibbe WA, Dunn JP. Investigating sources of inaccuracy in wearable optical heart rate sensors. NPJ Digital Medicine. 2020;3(1):18.
-
Nelson BW, Allen NB. Accuracy of consumer wearable heart rate measurement during an ecologically valid 24-hour period: intraindividual validation study. JMIR mHealth and uHealth. 2019;7(3):e10828.
-
Case MA, Burwick HA, Volpp KG, Patel MS. Accuracy of smartphone applications and wearable devices for tracking physical activity data. JAMA. 2015;313(6):625-626.
-
Roomkham S, Lovell D, Cheung J, Perrin D. Promises and challenges in the use of consumer-grade devices for sleep monitoring. IEEE Reviews in Biomedical Engineering. 2018;11:53-67.
-
Gillinov S, Etiwy M, Wang R, et al. Variable Accuracy of Wearable Heart Rate Monitors during Aerobic Exercise. Medicine and Science in Sports and Exercise. 2017;49(8):1697-1703.
-
Stone JD, Ulman HK, Tran K, et al. Assessing the Accuracy of Popular Commercial Technologies That Measure Resting Heart Rate and Heart Rate Variability. Frontiers in Sports and Active Living. 2021;3:585870.
-
Roberts DM, Schade MM, Mathew GM, et al. Detecting sleep using heart rate and motion data from multisensor consumer-grade wearables. Sleep. 2020;43(7):zsaa045.
-
de Zambotti M, Rosas L, Colrain IM, Baker FC. The Sleep of the Ring: Comparison of the Oura Sleep Tracker Against Polysomnography. Behavioral Sleep Medicine. 2019;17(2):124-136.
-
Berryhill S, Morton CJ, Dean A, et al. Effect of wearables on sleep in healthy individuals: a randomized crossover trial and validation study. Journal of Clinical Sleep Medicine. 2020;16(5):775-783.
-
Pipek LZ, Nascimento RFV, Acencio MMP, Teixeira LR. Comparison of SpO2 and heart rate values on Apple Watch and conventional commercial oximeters devices in patients with lung disease. Scientific Reports. 2021;11:18901.
-
Charlton PH, Bonnici T, Tarassenko L, et al. An assessment of algorithms to estimate respiratory rate from the electrocardiogram and photoplethysmogram. Physiological Measurement. 2016;37(4):610-626.
-
Cao R, Azimi I, Sarhaddi F, et al. Accuracy Assessment of Oura Ring Nocturnal Heart Rate and Heart Rate Variability in Comparison With Electrocardiography. JMIR mHealth and uHealth. 2022;10(1):e27487.
-
Shcherbina A, Mattsson CM, Waggott D, et al. Accuracy in Wrist-Worn, Sensor-Based Measurements of Heart Rate and Energy Expenditure in a Diverse Cohort. Journal of Personalized Medicine. 2017;7(2):3.
Proco provides educational, research-based information. This page describes what published validation studies have measured about consumer wearables. It is not a device recommendation and not medical advice. If your wearable flags an unusual reading, consult a clinician.
Schema (for implementation)
{
"@context": "https://schema.org",
"@type": "Article",
"headline": "Wearables: What They Can and Can't Measure",
"description": "Consumer wearables measure some things accurately, estimate others, and overpromise on a few. This page describes what validation studies have reported about Apple Watch, Whoop, Oura, Garmin, and similar devices.",
"datePublished": "2026-06-01",
"dateModified": "2026-05-31",
"author": {"@type": "Organization", "name": "Proco"},
"publisher": {"@type": "Organization", "name": "Proco", "url": "https://procohq.com"},
"about": {"@type": "Thing", "name": "Wearable device measurement accuracy"}
}
Proco provides educational, research-based information. It does not diagnose, treat, cure, or prevent any condition. Individual responses to interventions vary based on age, health status, medications, and other factors. If you are pregnant, breastfeeding, take prescription medication, manage a chronic condition, or are considering health changes for a child, talk to a qualified healthcare professional before relying on any information from Proco.
If you are experiencing a medical emergency, contact your local emergency services.