VitalLens: Take a Vital Selfie
VitalLens estimates vital signs such as heart rate and respiratory rate from selfie videos in real time. Powered by a cutting-edge neural network trained on diverse video and physiological datasets, VitalLens outperforms traditional methods and maintains fast inference speeds. You can download VitalLens from the App Store or integrate it into your own products with the VitalLens API.
This blog article simplifies the key insights from our technical report paper [1] to help you understand the science behind VitalLens without needing a deep technical background. If you're intrigued and want all the details, feel free to dive into the full paper!
Introduction
The human face is a rich source of physiological signals, including vital signs such as heart rate and respiratory rate. These signals can be extracted from video using a technique known as Remote Photoplethysmography (rPPG) [2]. rPPG has the potential to revolutionize non-invasive, real-time health monitoring.
Several approaches to rPPG exist, ranging from handcrafted algorithms like POS [3] to learning-based models such as DeepPhys [4]. However, these methods often involve trade-offs between accuracy, robustness, and inference speed.
VitalLens is the first widely distributed app to provide real-time rPPG estimation directly from selfie videos. This article highlights the app's development, capabilities, and performance based on extensive evaluations.
Key Contributions
- Real-Time rPPG Application: VitalLens delivers real-time heart rate and respiratory rate estimation.
- Comprehensive Dataset Evaluation: Performance was benchmarked on diverse datasets, including Vital Videos [5] with 289 participants.
- Privacy-First Design: When using the app, all processing is performed locally on the device, ensuring user privacy and eliminating the need for an internet connection. The VitalLens API relies on pre-processed video that is deleted immediately after processing.
- Robust Performance: VitalLens adapts to challenges like motion artifacts, varying lighting, and skin tone diversity, making it reliable across real-world conditions.
How VitalLens Works
VitalLens transforms video frames into vital sign estimates through the following steps:
- Video Input: Captures a selfie video using the front-facing camera.
- Signal Extraction: The estimation engine analyzes pixel-level color changes over time to isolate pulse and respiratory signals.
- Vital Sign Estimation: Uses Fast Fourier Transform (FFT) to derive heart rate and respiratory rate from these signals.
The VitalLens estimation engine is based on a modified EfficientNetV2 architecture [6], optimized for efficient training and inference in rPPG.
Datasets
To develop and validate VitalLens, we relied on a combination of proprietary and publicly available datasets. These datasets not only provided the foundation for training but also enabled rigorous evaluation under diverse real-world conditions.
Training Data
- PROSIT The PROSIT dataset is an in-house dataset collected in Australia. Data was captured in natural settings such as homes, offices, and public spaces. The dataset includes synchronized video recordings alongside ground truth physiological signals like ECG, PPG, and respiration.
- Vital Videos Africa To address biases in rPPG models stemming from a lack of diversity in skin tones, we incorporated the Vital Videos Africa dataset. Collected in Ghana, this dataset emphasizes darker skin tones (Fitzpatrick types 5 and 6), helping VitalLens deliver equitable performance across a broader range of users.
By combining these datasets, we ensured that the VitalLens model was exposed to diverse conditions, making it robust and generalizable.
Evaluation Data
- Vital Videos Medium This large, publicly available dataset consists of over 280 participants recorded under controlled conditions in Belgium. With stationary cameras and minimal participant movement, it offers a benchmark for evaluating accuracy under ideal conditions.
- PROSIT Test Set This subset of the PROSIT dataset was reserved for testing and includes scenarios with both participant and camera movement. Its real-world variability makes it a critical resource for assessing VitalLens's robustness in less controlled environments.
- Vital Videos Africa Test Set This smaller test set emphasizes darker skin tones, providing insights into how effectively VitalLens handles challenges related to skin tone diversity. Testing on this dataset underscores VitalLens's ability to address biases present in many existing rPPG models.
Results
VitalLens underwent extensive evaluation to compare its performance against established methods, including POS [3], DeepPhys [4], and MTTS-CAN [7]. The results highlight its superior accuracy and robustness across all datasets.
Benchmark Performance
On the Vital Videos Medium dataset [5], which features stationary participants in controlled lighting, VitalLens achieved outstanding accuracy:
- Heart Rate Estimation: The mean absolute error (MAE) was just 0.71 bpm, outperforming all tested methods.
- Respiratory Rate Estimation: Similarly, an MAE of 0.76 bpm was achieved.
Despite the simplicity of this dataset, these results demonstrate the high baseline performance of VitalLens in ideal conditions.
On the PROSIT Test Set, which includes significant variability in participant movement and lighting, VitalLens maintained its edge:
- While errors increased compared to the controlled conditions of Vital Videos Medium, VitalLens still outperformed traditional methods like POS [3] and MTTS-CAN [7], showcasing its robustness.
The Vital Videos Africa Test Set offered a crucial test of skin tone diversity. VitalLens demonstrated reduced biases compared to traditional methods, thanks to its training on diverse datasets. This equitable performance underscores the importance of inclusive training data.
Factors Impacting Performance
The evaluation also revealed key factors influencing performance:
-
Participant Movement
Movement introduces noise into the rPPG signal, reducing estimation accuracy. While VitalLens is robust to some motion, users are advised to minimize movement during recordings for best results. -
Lighting Conditions
Variations in ambient lighting can degrade accuracy. Even illumination of the face is critical for reliable rPPG estimation. This is particularly important for users recording in non-controlled environments. -
Skin Tone Diversity
Traditional rPPG methods often struggle with darker skin tones due to lower signal strength. By training on datasets like Vital Videos Africa, VitalLens significantly reduces these biases, ensuring equitable performance for all users.
Conclusion
VitalLens is a groundbreaking tool for real-time, contactless vital sign estimation. By combining the power of neural networks with diverse training datasets, it delivers unparalleled accuracy and robustness. Whether you're curious about your wellness or seeking to integrate health insights into an application, VitalLens is designed to empower users and developers alike.
For more technical details, read our full research paper.