Many contemporary systems in human computer interaction (HCI) including mobile and ubiquitous computing are based on some form of automated sensor data analysis. Prominent examples are innovative and more intuitive input modalities such as voice and gesture, or automated activity logging and analysis. It is fair to conclude that sensor data analysis is key to context aware computing as a whole. Such prominence requires robust and reliable methods that can cope with the challenges of real-world HCI applications and systems, of which there are many: Noisy sensor readings; often ambiguous, sometimes erroneous ground truth annotation (labeling); small datasets that can be used for method development; hard real-time constraints for analysis; etc.
As a key component of sensor data analysis in HCI (and beyond) many researchers have moved towards employing machine learning techniques, especially those related to the automated analysis of time-series data as they are recorded through the multitude of sensors used in HCI. In recent years the field of machine learning has seen an explosion in growth and very sophisticated methods now do exist that are key enablers for a plethora of application areas. Most appealing to many practitioners is the availability of toolkits such as Matlab, Weka, scikit-learn, and the various deep learning frameworks to name but a few, that nicely package machine learning methods. These toolkits effectively hide the complexity of machine learning methods — which, I argue, is both a blessing and a curse. Packaging away complex functionality is common practice in, for example, software engineering where libraries with clear interface specifications provide higher level functionality to practitioners. To some extent machine learning toolkits provide similar functionalities and as such make these methods accessible to practitioners in the first place. Yet, hiding the complexity of machine learning can be dangerous. Without careful considerations of appropriateness of methods for specific problems beyond the mere interface fit of the chosen toolkit, practitioners are at risk of falling victim to flaky conclusions.
In this talk I will advocate the enormous potential machine learning methods have for current and next generations of HCI applications and systems — specifically targeting time-series assessment as it is most common for HCI related sensor data analysis problems. In doing so I will focus on common pitfalls that a utilitarian use of machine learning methods inevitably brings — and will offer ways to avoid these.