A single multitask AI system (PanEcho) accurately automates comprehensive TTE interpretation across settings.
Background
Echocardiography is central to cardiovascular diagnosis but depends on expert interpretation of multi-view videos, creating bottlenecks and variability. Prior AI efforts typically addressed single views or single tasks. This study developed and validated PanEcho, a unified, multiview, multitask deep learning system to automate complete transthoracic echocardiography (TTE) interpretation.
Patients
- Internal (YNHHS, USA): 32,265 TTE studies from 24,405 patients (1.2 million videos) for development and validation; temporally distinct internal validation set: 5,130 studies (July–December 2022).
- External (RVENet+, Hungary): 944 complete TTE studies from 831 patients (18,862 videos; 2013–2021) spanning diverse subpopulations.
- Point-of-care cohort (POCUS, USA): 3,310 studies from 3,170 ED patients (25,407 videos; 2015–2023); clinician-acquired, limited-quality bedside scans.
- Public datasets: EchoNet-Dynamic (n=10,032 single-view A4C studies) and EchoNet-LVH (n≈12,000 PLAX videos) for LV function/structure.
Intervention
PanEcho, a multiview, multitask video-based deep learning system using a 2D image encoder, temporal transformer, and task-specific heads to produce automated study-level reports for 39 echocardiographic tasks (18 diagnostic classifications; 21 measurements) from B-mode and color Doppler videos.
Control
Reference standard: certified echocardiographer’s final report/measurements extracted from clinical systems; task labels defined per guidelines and local practice.
Outcome
- Primary outcomes:
- Area under the ROC curve (AUC) for diagnostic classification tasks.
- Mean absolute error (MAE) for parameter estimation tasks (with normalized MAE for cross-task summaries).
- Secondary outcomes: External and point-of-care validation performance; robustness to image quality; fairness across sex and race; interpretability via task-specific view relevance.
Study Design
Model development and retrospective, multisite validation of a multitask, view-agnostic AI system. Internal temporal validation (YNHHS), international external validation (RVENet+), public single-view datasets (EchoNet-Dynamic, EchoNet-LVH), and point-of-care ED POCUS cohort. TRIPOD+AI-aligned reporting.
Level of Evidence
Diagnostic accuracy study with external validation; retrospective design. Oxford CEBM: approximately Level III (non-consecutive/retrospective diagnostic cohort with reference standard).
Follow up period
None (cross-sectional imaging assessments; no longitudinal clinical follow-up).
Results
Primary outcomes
- Overall diagnostic classification (18 tasks):
- Internal validation (YNHHS): median AUC 0.91 (IQR 0.88–0.93).
- External validation (RVENet+): median AUC 0.91 (IQR 0.85–0.94).
- Parameter estimation (21 tasks):
- Internal: median normalized MAE 0.13 (IQR 0.10–0.18).
- External (RVENet+): median normalized MAE 0.16 (IQR 0.11–0.23).
- Representative tasks:
- LVEF estimation: MAE 4.2% (internal); 4.5% (external).
- Moderate/worse LV systolic dysfunction: AUC 0.98 (internal); 0.99 (external).
- RV systolic dysfunction: AUC 0.93 (internal); 0.94 (external).
- Severe aortic stenosis: AUC 0.98 (internal); 1.00 (external).
Secondary outcomes
- Abbreviated protocols: With up to one key video per view (PLAX/PSAX/A4C/A5C/A2C), 15 tasks achieved median AUC 0.91 (IQR 0.87–0.94); LV systolic dysfunction AUC 0.98; severe AS AUC 0.96; MV stenosis AUC 0.94.
- POCUS (ED bedside scans): 14 evaluable tasks median AUC 0.85 (IQR 0.77–0.87); LV systolic dysfunction AUC 0.93; severe AS AUC 0.86; MV stenosis AUC 0.92; pericardial effusion AUC 0.86.
- Public datasets: Maintained strong performance on EchoNet-Dynamic (e.g., LVEF MAE ≈5.5%) and EchoNet-LVH (high AUCs for structural labels; millimeter-scale MAEs).
- Fairness and robustness: Demographic parity across sex and race; stable LVEF estimates across image quality strata.
- Interpretability: Task-specific view relevance aligned with guidelines (e.g., PLAX for dimensions/AV anatomy; A4C for LV function; color Doppler for regurgitation).
- Number needed to treat (NNT): Not applicable (diagnostic accuracy study without patient-level treatment outcomes).
Limitations
- Limited to 2D grayscale and color Doppler (no spectral Doppler, strain, or 3D data).
- Direct estimation without segmentation may limit interpretability; future overlays could enhance transparency.
- Heterogeneous measurement methods across sites (e.g., 3D vs 2D LV volumes) may affect error metrics.
- Low-prevalence or difficult labels (e.g., bicuspid AV, right atrial abnormalities) lead to class imbalance and potentially noisy ground truth.
- Scope restricted to 39 predefined tasks; some clinical questions may still require expert synthesis within the broader clinical context.
- Retrospective design; prospective workflow studies needed to assess impact on speed, accuracy, and human–AI collaboration.
Funding
- National Institutes of Health (R01HL167858, R01AG089981, K23HL153775).
- Doris Duke Charitable Foundation (Award 2022060).
- Industry-sponsored research through Yale: Bristol Myers Squibb, Novo Nordisk, BridgeBio.
- Hungarian funding: European Union–MILAB, National Research, Development, and Innovation Office of Hungary (FK142573), Hungarian Academy of Sciences.
Citation
Holste G, Oikonomou EK, Tokodi M, Kovács A, Wang Z, Khera R. Complete AI-Enabled Echocardiography Interpretation With Multitask Deep Learning. JAMA. 2025;334(4):306-318. doi:10.1001/jama.2025.8731. Published online June 23, 2025.