Philipp Rouast

Dr Philipp Rouast

My research focuses on human-centered applications of deep learning and computer vision, especially in the health domain.

How It Started

I have been fascinated with Remote Photoplethysmography (rPPG) ever since I dabbled with a simple implementation back in 2016 for my Master's thesis. It has been great to see the buzz around rPPG with many researchers and startups diving in. Yet, there's a catch – either the tech is tucked away behind an API or requires a tech wizard to unravel.

During the COVID-19 lockdown I decided to roll up my sleeves and start building a modern rPPG system that would be available for anyone to use. The results are Rouast Labs and VitalLens - which to this day are still solo projects.


VitalLens: Take A Vital Selfie
Philipp V. Rouast
arXiv preprint arXiv:2312.06892

This report introduces VitalLens, an app that estimates vital signs such as heart rate and respiration rate from selfie video in real time. VitalLens uses a computer vision model trained on a diverse dataset of video and physiological sensor data. We benchmark performance on several diverse datasets, including VV-Medium, ... which consists of 289 unique participants. VitalLens outperforms several existing methods including POS and MTTS-CAN on all datasets while maintaining a fast inference speed. On VV-Medium, VitalLens achieves absolute errors of 0.71 bpm for heart rate estimation, and 0.76 rpm for respiratory rate estimation. Read more

OREBA dataset
Single-stage intake gesture detection using CTC loss and extended prefix beam search
Philipp V. Rouast and Marc T. P. Adam
IEEE Journal of Biomedical and Health Informatics 25 (7), 2733-2743 (2020)

Accurate detection of individual intake gestures is a key step towards automatic dietary monitoring. Both inertial sensor data of wrist movements and video data depicting the upper body have been used for this purpose. The most advanced approaches to date use a two-stage approach, in which (i) frame-level intake probabilities ... are learned from the sensor data using a deep neural network, and then (ii) sparse intake events are detected by finding the maxima of the frame-level probabilities. In this study, we propose a single-stage approach which directly decodes the probabilities learned from sensor data into sparse intake detections. This is achieved by weakly supervised training using Connectionist Temporal Classification (CTC) loss, and decoding using a novel extended prefix beam search decoding algorithm. Benefits of this approach include (i) end-to-end training for detections, (ii) simplified timing requirements for intake gesture labels, and (iii) improved detection performance compared to existing approaches. Across two separate datasets, we achieve relative F1 score improvements between 1.9% and 6.2% over the two-stage approach for intake detection and eating/drinking detection tasks, for both video and inertial sensors. Read more

OREBA dataset
OREBA: A Dataset for Objectively Recognizing Eating Behaviour and Associated Intake
Philipp V. Rouast, Hamid Heydarian, Marc T. P. Adam, and Megan E. Rollo
IEEE Access 8, 181955–181963 (2020)

Automatic detection of intake gestures is a key element of automatic dietary monitoring. Several types of sensors, including inertial measurement units (IMU) and video cameras, have been used for this purpose. The common machine learning approaches make use of the labelled sensor data to automatically learn how to make detections. ... One characteristic, especially for deep learning models, is the need for large datasets. To meet this need, we collected the Objectively Recognizing Eating Behavior and Associated Intake (OREBA) dataset. The OREBA dataset aims to provide a comprehensive multi-sensor recording of communal intake occasions for researchers interested in automatic detection of intake gestures. Two scenarios are included, with 100 participants for a discrete dish and 102 participants for a shared dish, totalling 9069 intake gestures. Available sensor data consists of synchronized frontal video and IMU with accelerometer and gyroscope for both hands. We report the details of data collection and annotation, as well as technical details of sensor processing. The results of studies on IMU and video data involving deep learning models are reported to provide a baseline for future research. Read more

Learning deep representations for video-based intake gesture detection Learning deep representations for video-based intake gesture detection
Learning deep representations for video-based intake gesture detection
Philipp V. Rouast and Marc T. P. Adam
IEEE Journal of Biomedical and Health Informatics 24 (6), 1727–1737 (2020)

Automatic detection of individual intake gestures during eating occasions has the potential to improve dietary monitoring and support dietary recommendations. Existing studies typically make use of on-body solutions such as inertial and audio sensors, while video is used as ground truth. Intake gesture detection directly based on video has rarely ... been attempted. In this study, we address this gap and show that deep learning architectures can successfully be applied to the problem of video-based detection of intake gestures. For this purpose, we collect and label video data of eating occasions using 360-degree video of 102 participants. Applying state-of-the-art approaches from video action recognition, our results show that (1) the best model achieves an F1 score of 0.858, (2) appearance features contribute more than motion features, and (3) temporal context in form of multiple video frames is essential for top model performance. Read more

Deep Learning for Human Affect Recognition: Insights and New Developments
Deep Learning for Human Affect Recognition: Insights and New Developments
Philipp V. Rouast, Marc T. P. Adam, Raymond Chiong
IEEE Transactions on Affective Computing 12 (2) 524-543 (2021)

Automatic human affect recognition is a key step towards more natural human-computer interaction. Recent trends include recognition in the wild using a fusion of audiovisual and physiological sensors, a challenging setting for conventional machine learning algorithms. Since 2010, novel deep learning algorithms have been applied increasingly in this field. ... In this paper, we review the literature on human affect recognition between 2010 and 2017, with a special focus on approaches using deep neural networks. By classifying a total of 950 studies according to their usage of shallow or deep architectures, we are able to show a trend towards deep learning. Reviewing a subset of 233 studies that employ deep neural networks, we comprehensively quantify their applications in this field. We find that deep learning is used for learning of (i) spatial feature representations, (ii) temporal feature representations, and (iii) joint feature representations for multimodal sensor data. Exemplary state-of-the-art architectures illustrate the recent progress. Our findings show the role deep architectures will play in human affect recognition, and can serve as a reference point for researchers working on related applications. Read more

Remote heart rate measurement using low-cost RGB face video: a technical literature review
Remote heart rate measurement using low-cost RGB face video: a technical literature review
Philipp V. Rouast, Marc T. P. Adam, Raymond Chiong, David Cornforth, Eva Lux
Frontiers of Computer Science 12 (5), 858–872 (2018)

Remote photoplethysmography (rPPG) allows remote measurement of the heart rate using low-cost RGB imaging equipment. In this study, we review the development of the field of rPPG since its emergence in 2008. We also classify existing rPPG approaches and derive a framework that provides an overview of modular steps. ... Based on this framework, practitioners can use our classification to design algorithms for an rPPG approach that suits their specific needs. Researchers can use the reviewed and classified algorithms as a starting point to improve particular features of an rPPG algorithm. Read more


Rouast Labs: Founder
  • VitalLens
The University of Newcastle: Associate Lecturer
  • COMP3330: Machine Intelligence
  • INFT2060: Applied Artificial Intelligence
The University of Newcastle: PhD Information Systems
Thesis: Using deep learning to detect food intake behaviour from video.
Karlsruhe Institute of Technology: MSc Industrial Engineering
Thesis: Contactless Heart Rate Measurement Using Facial Video: A Real-Time Approach and Evaluation in Information Systems.
Karlsruhe Institute of Technology: BSc Industrial Engineering