Can AI automate complete echocardiography interpretation?

A single multitask AI system (PanEcho) accurately automates comprehensive TTE interpretation across settings.

Background

Echocardiography is central to cardiovascular diagnosis but depends on expert interpretation of multi-view videos, creating bottlenecks and variability. Prior AI efforts typically addressed single views or single tasks. This study developed and validated PanEcho, a unified, multiview, multitask deep learning system to automate complete transthoracic echocardiography (TTE) interpretation.

Patients

Internal (YNHHS, USA): 32,265 TTE studies from 24,405 patients (1.2 million videos) for development and validation; temporally distinct internal validation set: 5,130 studies (July–December 2022).
External (RVENet+, Hungary): 944 complete TTE studies from 831 patients (18,862 videos; 2013–2021) spanning diverse subpopulations.
Point-of-care cohort (POCUS, USA): 3,310 studies from 3,170 ED patients (25,407 videos; 2015–2023); clinician-acquired, limited-quality bedside scans.
Public datasets: EchoNet-Dynamic (n=10,032 single-view A4C studies) and EchoNet-LVH (n≈12,000 PLAX videos) for LV function/structure.

Intervention

PanEcho, a multiview, multitask video-based deep learning system using a 2D image encoder, temporal transformer, and task-specific heads to produce automated study-level reports for 39 echocardiographic tasks (18 diagnostic classifications; 21 measurements) from B-mode and color Doppler videos.

Control

Reference standard: certified echocardiographer’s final report/measurements extracted from clinical systems; task labels defined per guidelines and local practice.

Outcome

Primary outcomes:
- Area under the ROC curve (AUC) for diagnostic classification tasks.
- Mean absolute error (MAE) for parameter estimation tasks (with normalized MAE for cross-task summaries).
Secondary outcomes: External and point-of-care validation performance; robustness to image quality; fairness across sex and race; interpretability via task-specific view relevance.

Study Design

Model development and retrospective, multisite validation of a multitask, view-agnostic AI system. Internal temporal validation (YNHHS), international external validation (RVENet+), public single-view datasets (EchoNet-Dynamic, EchoNet-LVH), and point-of-care ED POCUS cohort. TRIPOD+AI-aligned reporting.

Level of Evidence

Diagnostic accuracy study with external validation; retrospective design. Oxford CEBM: approximately Level III (non-consecutive/retrospective diagnostic cohort with reference standard).

Follow up period

None (cross-sectional imaging assessments; no longitudinal clinical follow-up).

Results

Primary outcomes

Overall diagnostic classification (18 tasks):
- Internal validation (YNHHS): median AUC 0.91 (IQR 0.88–0.93).
- External validation (RVENet+): median AUC 0.91 (IQR 0.85–0.94).
Parameter estimation (21 tasks):
- Internal: median normalized MAE 0.13 (IQR 0.10–0.18).
- External (RVENet+): median normalized MAE 0.16 (IQR 0.11–0.23).
Representative tasks:
- LVEF estimation: MAE 4.2% (internal); 4.5% (external).
- Moderate/worse LV systolic dysfunction: AUC 0.98 (internal); 0.99 (external).
- RV systolic dysfunction: AUC 0.93 (internal); 0.94 (external).
- Severe aortic stenosis: AUC 0.98 (internal); 1.00 (external).

Secondary outcomes

Abbreviated protocols: With up to one key video per view (PLAX/PSAX/A4C/A5C/A2C), 15 tasks achieved median AUC 0.91 (IQR 0.87–0.94); LV systolic dysfunction AUC 0.98; severe AS AUC 0.96; MV stenosis AUC 0.94.
POCUS (ED bedside scans): 14 evaluable tasks median AUC 0.85 (IQR 0.77–0.87); LV systolic dysfunction AUC 0.93; severe AS AUC 0.86; MV stenosis AUC 0.92; pericardial effusion AUC 0.86.
Public datasets: Maintained strong performance on EchoNet-Dynamic (e.g., LVEF MAE ≈5.5%) and EchoNet-LVH (high AUCs for structural labels; millimeter-scale MAEs).
Fairness and robustness: Demographic parity across sex and race; stable LVEF estimates across image quality strata.
Interpretability: Task-specific view relevance aligned with guidelines (e.g., PLAX for dimensions/AV anatomy; A4C for LV function; color Doppler for regurgitation).
Number needed to treat (NNT): Not applicable (diagnostic accuracy study without patient-level treatment outcomes).

Limitations

Limited to 2D grayscale and color Doppler (no spectral Doppler, strain, or 3D data).
Direct estimation without segmentation may limit interpretability; future overlays could enhance transparency.
Heterogeneous measurement methods across sites (e.g., 3D vs 2D LV volumes) may affect error metrics.
Low-prevalence or difficult labels (e.g., bicuspid AV, right atrial abnormalities) lead to class imbalance and potentially noisy ground truth.
Scope restricted to 39 predefined tasks; some clinical questions may still require expert synthesis within the broader clinical context.
Retrospective design; prospective workflow studies needed to assess impact on speed, accuracy, and human–AI collaboration.

Funding

National Institutes of Health (R01HL167858, R01AG089981, K23HL153775).
Doris Duke Charitable Foundation (Award 2022060).
Industry-sponsored research through Yale: Bristol Myers Squibb, Novo Nordisk, BridgeBio.
Hungarian funding: European Union–MILAB, National Research, Development, and Innovation Office of Hungary (FK142573), Hungarian Academy of Sciences.

Citation

Holste G, Oikonomou EK, Tokodi M, Kovács A, Wang Z, Khera R. Complete AI-Enabled Echocardiography Interpretation With Multitask Deep Learning. JAMA. 2025;334(4):306-318. doi:10.1001/jama.2025.8731. Published online June 23, 2025.