Session 1: Natural Language Processing (9:45-11:00)
Session chair: Anssi Yli-Jyrä
9:45 - Self-Supervised End-to-End ASR for Low Resource L2 Swedish - Ragheb Al-Ghezi (Aalto University), Yaroslav Getman (Aalto University), Mikko Kurimo (Aalto University) [click for abstract]
Unlike traditional (hybrid) Automatic Speech Recognition (ASR), end-to-end ASR systems simplify the training procedure by directly mapping acoustic features to sequences of graphemes or characters, thereby eliminating the need for specialized acoustic, language, or pronunciation models. However, one drawback of end-to-end ASR systems is that they require more training data than conventional ASR systems to achieve similar word error rate (WER). This makes it difficult to develop ASR systems for tasks where transcribed target data is limited such as developing ASR for Second Language (L2) speakers of Swedish. Nonetheless, recent advancements in self-supervised acoustic learning, manifested in wav2vec models leverage the available untranscribed speech data to provide compact acoustic representation that can achieve low WER when incorporated in end-to-end systems. To this end, we experiment with several monolingual and cross-lingual self-supervised acoustic models to develop end-to-end ASR system for L2 Swedish. Even though our test is very small, it indicates that these systems are competitive in performance with traditional ASR pipeline. Our best model seems to reduce the WER by 7% relative to our traditional ASR baseline trained on the same target data.
10:00 - Speaker Verification Experiments for Adults and Children using a shared embedding space - Tuomas Kaseva (Aalto University), Hemant Kathania (Aalto University), Aku Rouhe (Aalto University), Mikko Kurimo (Aalto University) [click for abstract]
In this work, we present our efforts towards developing a robust speaker verification system for children when the data is limited. We propose a novel deep learning -based speaker verification system that combines long-short term memory cells with NetVLAD and additive margin softmax loss. First we investigated these methods on a large corpus of adult data and then applied the best configuration for child speaker verification. For children, the system trained on a large corpus of adult speakers performed worse than a system trained on a much smaller corpus of children’s speech. This is due to the acoustic mismatch between training and testing data. To capture more acoustic variability we trained a shared system with mixed data from adults and children. The shared system yields the best EER for children with no degradation for adults. Thus, the single system trained with mixed data is applicable for speaker verification for both adults and children.
10:15 - Convex Surrogates for Unbiased Loss Functions in Extreme Classification With Missing Labels – Mohammadreza Qaraei (Aalto University), Erik Schultheis (Aalto University), Priyanshu Gupta (IIT, Kanpur), Rohit Babbar (Aalto University) [click for abstract]
Extreme Classification (XC) refers to supervised learning where each training/test instance is labeled with small subset of relevant labels that are chosen from a large set of possible target labels. The framework of XC has been widely employed in web applications such as automatic labeling of web-encyclopedia, prediction of related searches, and recommendation systems. While most state-of-the-art models in XC achieve high overall accuracy by performing well on the frequently occurring labels, they perform poorly on a large number of infrequent (tail) labels. This arises from two statistical challenges, (i) missing labels, as it is virtually impossible to manually assign every relevant label to an instance, and (ii) highly imbalanced data distribution where a large fraction of labels are tail labels. In this work, we consider common loss functions that decompose over labels, and calculate unbiased estimates that compensate missing labels according to (Natarajan et al., 2017). This turns out to be disadvantageous from an optimization perspective, as important properties such as convexity and lower-boundedness are lost. To circumvent this problem, we use the fact that typical loss functions in XC are convex surrogates of the 0-1 loss, and thus propose to switch to convex surrogates of its unbiased version. These surrogates are further adapted to the label imbalance by combining with label-frequency-based rebalancing. We show that the proposed loss functions can be easily incorporated into various different frameworks for extreme classification. This includes (i) linear classifiers, such as DiSMEC, on sparse input data representation, (ii) attention-based deep architecture, AttentionXML, learnt on dense Glove embeddings, and (iii) XLNet-based transformer model for extreme classification, APLC-XLNet. Our results demonstrate consistent improvements over the respective vanilla baseline models, on the propensity-scored metrics for precision and nDCG.
10:30 - Dialog Modelling Experiments with Finnish One-to-One Chat Data – Lili Aunimo (Haaga-Helia University of Applied Sciences), Janne Kauttonen (Haaga-Helia University of Applied Sciences) [click for abstract]
We analyzed two conversational corpora in Finnish: A public library question-answering (QA) data and a private medical chat data. We developed response retrieval (ranking) models using TF-IDF, StarSpace, ESIM and BERT methods. These four represent techniques ranging from the simple and classical ones to recent pretrained transformer neural networks. We evaluated the effect of different preprocessing strategies, including raw, casing, lemmatization and spell-checking for the different methods. Using our medical chat data, we also developed a novel three-stage preprocessing pipeline with speaker role classification. We found the BERT model pretrained with Finnish (FinBERT) an unambiguous winner in ranking accuracy, reaching 92.2% for the medical chat and 98.7% for the library QA in the 1-out-of-10 response ranking task where the chance level was 10%. The best accuracies were reached using uncased text with spell-checking (BERT models) or lemmatization (non-BERT models). The role of preprocessing had less impact for BERT models compared to the classical and other neural network models. Furthermore, we found the TF-IDF method still a strong baseline for the vocabulary-rich library QA task, even surpassing the more advanced StarSpace method. Our results highlight the complex interplay between preprocessing strategies and model type when choosing the optimal approach in chat-data modelling. Our study is the first work on dialogue modelling using neural networks for the Finnish language. It is also first of the kind to use real medical chat data. Our work contributes towards the development of automated chatbots in the professional domain.
10:45 - Inferring Case-Based Reasoners’ Knowledge to Enhance Interactivity – Pierre-Alexandre Murena (Aalto University), Marie Al-Ghossein (University of Helsinki) [click for abstract]
When interacting with a human user, an artificial intelligence needs to have a clear model of the human’s behaviour to make the correct decisions, be it recommending items, helping the user in a task or teaching a language. This is in particular the case for intelligent tutoring systems which must maintain a good understanding of what the user knows. In practice, this raises two questions: what did the user memorize and how does the user reuse this knowledge? These two questions are at the core of the domain of Case-Based Reasoning (CBR). In this paper, we explore the feasibility of modelling the human as a case-based reasoning agent through the question of how to infer the state of a CBR agent from interaction data. We identify the main parameters to be inferred, and propose a Bayesian belief update as a possible way to infer both the parameters of the agent and the content of their case base. We illustrate our ideas with the simple application of an agent learning Finnish grammar rules throughout a sequence of observations and show that the teacher can indeed predict what the user's knowledge and reasoning parameters.
Session 2: Machine Learning (11:15-12:15)
Session chair: Arto Klami
11:15 - Differentially Private Hamiltonian Monte Carlo – Ossi Räisä (University of Helsinki), Antti Koskela (University of Helsinki), Antti Honkela (University of Helsinki) [click for abstract]
Markov chain Monte Carlo (MCMC) algorithms have long been the main workhorses of Bayesian inference. Among them, Hamiltonian Monte Carlo (HMC) has recently become very popular due to its efficiency resulting from effective use of the gradients of the target distribution. In privacy-preserving machine learning, differential privacy (DP) has become the gold standard in ensuring that the privacy of data subjects is not violated. Existing DP MCMC algorithms either use random-walk proposals, or do not use the Metropolis-Hastings (MH) acceptance test to ensure convergence without decreasing their step size to zero. We present a DP variant of HMC using the MH acceptance test that builds on a recently proposed DP MCMC algorithm called the penalty algorithm, and adds noise to the gradient evaluations of HMC. We prove that the resulting algorithm converges to the correct distribution, and is ergodic. We compare DP-HMC with the existing penalty, DP-SGLD and DP-SGNHT algorithms, and find that DP-HMC has better or equal performance than the penalty algorithm, and performs more consistently than DP-SGLD or DP-SGNHT.
11:30 - d3p - A Python Package for Differentially-Private Probabilistic Programming – Lukas Prediger (Aalto University), Niki Loppi (NVIDIA), Samuel Kaski (Aalto University and University of Manchester), Antti Honkela (University of Helsinki) [click for abstract]
We present d3p, a software package designed to help fielding runtime efficient widely-applicable Bayesian inference under differential privacy guarantees. d3p achieves general applicability to a wide range of probabilistic modelling problems by implementing the differentially private variational inference algorithm, allowing users to fit any parametric probabilistic model with a differentiable density function. d3p adopts the probabilistic programming paradigm as a powerful way for the user to flexibly define such models. We demonstrate the use of our software on a hierarchical logistic regression example, showing the expressiveness of the modelling approach as well as the ease of running the parameter inference. We also perform an empirical evaluation of the runtime of the private inference on a complex model and find a ~10 fold speed-up compared to an implementation using TensorFlow Privacy.
11:45 - Behaviour conditioned Policies for Reinforcement Learning Tasks – Antti Keurulainen (Aalto University and Bitville Oy), Isak Westerlund (Bitville Oy), Ariel Kwiatkowski (Bitville Oy), Samuel Kaski (Aalto University and University of Manchester), Alexander Ilin (Aalto University) [click for abstract]
The cooperation among AI systems, and between AI systems and humans is becoming increasingly important. In various real-world tasks, an agent needs to cooperate with unknown partner agent types. This requires the agent to assess the behaviour of the partner agent during a cooperative task and to adjust its own policy to support the cooperation. Deep reinforcement learning models can be trained to deliver the required functionality but are known to suffer from sample inefficiency and slow learning. However, adapting to a partner agent behaviour during the ongoing task requires ability to assess the partner agent type quickly. We suggest a method, where we synthetically produce populations of agents with different behavioural patterns together with ground truth data of their behaviour, and use this data for training a meta-learner. We additionally suggest an agent architecture, which can efficiently use the generated data and gain the meta-learning capability. When an agent is equipped with such a meta-learner, it is capable of quickly adapting to cooperation with unknown partner agent types in new situations. This method can be used to automatically form a task distribution for meta-training from emerging behaviours that arise, for example, through self-play.
12:00 - Automating Privilege Escalation with Deep Reinforcement Learning – Kalle Kujanpää (Aalto University), Willie Victor (F-Secure), Alexander Ilin (Aalto University) [click for abstract]
AI-based defensive solutions are necessary to defend against intelligent automated attacks but gathering enough realistic data for training machine learning-based defenses is a significant practical challenge. In this work, we present a reinforcement learning agent that can perform local privilege escalation in a Windows 7 environment using a wide variety of different techniques depending on the environment configuration it encounters. Hence, our agent is usable for generating realistic attack sensor data for training and evaluating defense systems.
SESSION 3: Constraint Optimization and Search (13:00-14:00)
Session chair: Jussi Rintanen
13:00 - Responsive and Personalized Web Layouts with Integer Programming – Markku Laine (Aalto University), Yu Zhang (Aalto University), Simo Santala (Aalto University), Jussi P. P. Jokinen (University of Helsinki), Antti Oulasvirta (Aalto University) [click for abstract]
Over the past decade, responsive web design (RWD) has become the de facto standard for adapting web pages to a wide range of devices used for browsing. While RWD has improved the usability of web pages, it is not without drawbacks and limitations: designers and developers must manually design the web layouts for multiple screen sizes and implement associated adaptation rules, and its “one responsive design fits all” approach lacks support for personalization. This paper presents a novel approach for automated generation of responsive and personalized web layouts. Given an existing web page design and preferences related to design objectives, our integer programming -based optimizer generates a consistent set of web designs. Where relevant data is available, these can be further automatically personalized for the user and browsing device. The paper includes presentation of techniques for runtime adaptation of the designs generated into a fully responsive grid layout for web browsing. Results from our ratings-based online studies with end users (N = 86) and designers (N = 64) show that the proposed approach can automatically create high-quality responsive web layouts for a variety of real-world websites.
13:15 - Enabling Incrementality in the Implicit Hitting Set Approach to MaxSAT under Changing Weights – Andreas Niskanen (University of Helsinki), Jeremias Berg (University of Helsinki), Matti Järvisalo (University of Helsinki) [click for abstract]
Recent advances in solvers for the Boolean satisfiability (SAT) based optimization paradigm of maximum satisfiability (MaxSAT) have turned MaxSAT into a viable approach to finding provably optimal solutions for various types of hard optimization problems. In various types of real-world problem settings, a sequence of related optimization problems need to solved. This calls for studying ways of enabling incremental computations in MaxSAT, with the hope of speeding up the overall computation times. However, current state-of-the-art MaxSAT solvers offer no or limited forms of incrementality. In this work, we study ways of enabling incremental computations in the context of the implicit hitting set (IHS) approach to MaxSAT solving, as both one of the key MaxSAT solving approaches today and a relatively well-suited candidate for extending to incremental computations. In particular, motivated by several recent applications of MaxSAT in the context of interpretability in machine learning calling for this type of incrementality, we focus on enabling incrementality in IHS under changes to the objective function coefficients (i.e., to the weights of soft clauses). To this end, we explain to what extent different search techniques applied in IHS-based MaxSAT solving can and cannot be adapted to this incremental setting. As practical result, we develop an incremental version of an IHS MaxSAT solver, and show it provides significant runtime improvements in recent application settings which can benefit from incrementality but in which MaxSAT solvers have so-far been applied only non-incrementally, i.e., by calling a MaxSAT solver from scratch after each change to the problem instance at hand.
We introduce a novel problem for diversity-aware clustering. We assume that the potential cluster centers belong to a set of groups defined by protected attributes, such as ethnicity, gender, etc. We then ask to find a minimum-cost clustering of the data into $k$ clusters so that a specified minimum number of cluster centers are chosen from each group. We thus require that all groups are represented in the clustering solution as cluster centers, according to specified requirements.
We show that in the general case where the facility groups may overlap, the diversity-aware $k$- median problem is NP-hard, fixed-parameter intractable, and inapproximable to any multiplicative factor. On the other hand, when the facility groups are disjoint, approximation algorithms can be obtained by reduction to the matroid median and red-blue median problems. Experimentally, we evaluate our approximation methods for the tractable cases, and present a relaxation-based heuristic for the theoretically intractable case, which can provide high-quality and efficient solutions for real-world datasets.
13:45 - Approximating the Permanent with Deep Rejection Sampling – Juha Harviainen (University of Helsinki), Antti Röyskö (ETH Zürich), Mikko Koivisto (University of Helsinki) [click for abstract]
We present a randomized approximation scheme for the permanent of a matrix with nonnegative entries. Our scheme extends a recursive rejection sampling method of Huber and Law (SODA 2008) by replacing the permanent upper bound with a linear combination of the subproblem bounds at a moderately large depth of the recursion tree. This method, we call deep rejection sampling, is empirically shown to outperform the basic, depth-zero variant, as well as a related method by Kuck et al. (NeurIPS 2019). We analyze the expected running time of the scheme on random (0, 1)-matrices where each entry is independently 1 with probability p. Our bound is superior to a previous one for p less than 1/5, matching another bound that was only known to hold when every row and column has density exactly p.
SESSION 4: Multidisciplinary Applications (14:15-15:15)
Session chair: Tapio Pahikkala
14:15 - EYES-project case study: Selecting Feature Sets and Comparing Classification Methods for Cognitive State Estimation – Kati Pettersson (VTT), Jaakko Tervonen (VTT), Johanna Närväinen (VTT), Pentti Henttonen (UH), Ilmari Määttänen (UH) and Jani Mäntyjärvi (VTT) [click for abstract]
Acute stress and high workload are part of everyday work at safety critical fields. Adaptive human computer interaction systems could support and guide professionals in their hectic situations. Seamless HCI requires accurate cognitive state estimation of the person. The Academy-project EYES aims to explore and develop novel & seamless cognitive state estimation methods for real-time & real-life settings. The cognitive state estimation focuses on biosensor data combined with information from the eyes.
This study demonstrates a classification of different types of cognitive states by using feature combinations from the eyes (measured with electro-oculography, EOG) and heart (measured with electrocardiography, ECG) in general and personalized approaches, comparing three different classifiers. The classification is evaluated for features extracted from both signals separately and together, and the most important features are selected and reported. Results indicate that the best performance is achieved when features from both EOG and ECG signals are used, and approximately twenty features from EOG and ECG signals are enough to distinguish the two/three states. A personalized approach together with feature selection and support vector machine classifier achieves accuracies of 96.9% and 86.3% in classifying between two states (relaxation and stress) and three states (relaxation, psycho-social stress, and physiological stress), respectively, which exceed state-of-the-art performance. Thus cognitive state estimation benefits from combining selected eye and heart parameters, which suggests a promising basis for real-time estimation in the future.
K. Pettersson, J. Tervonen, J. Närväinen, P. Henttonen, I. Määttänen and J. Mäntyjärvi, ""Selecting Feature Sets and Comparing Classification Methods for Cognitive State Estimation,"" 2020 IEEE 20th International Conference on Bioinformatics and Bioengineering (BIBE), 2020, pp. 683-690, doi: 10.1109/BIBE50027.2020.00115.
14:30 - Self-Swarming for Multi-Robot Systems Deployed for Situational Awareness – Fabrice Saffre (VTT), Hanno Hildmann (TNO), Hannu Karvonen (VTT), Timo Lind (VTT) [click for abstract]
Machine-based situational awareness is a key element to conscious and intelligent interaction with the complex world we live in, be it for the individual unit, a complex dynamical system, or even complex systems of systems. To create this awareness, the frequent gathering of accurate and real-time intelligence data is required to ensure timely, accurate, and actionable information. Unmanned aerial vehicles (UAVs) and other semi-autonomous cyber-physical systems are increasingly among the mechanisms and systems employed to assess the state of the world around us and collect intelligence through surveillance and reconnaissance missions. The current state of the art for humanitarian and military operations is still relying on human-controlled flight/asset operations, but with increasingly autonomous systems comes an opportunity to offload this to the devices themselves. In this paper, we present a principled and expandable methodology for evaluating the relative performance of a collective of autonomous devices in various scenarios. The proposed approach, which is illustrated with drone swarms as an example use case, is expected to develop into a generic tool to inform the deployment of such collectives, providing the means to infer key parameter values from problem specifications, known constraints, and objective functions.
14:45 - Flexible Motion Optimization with Modulated Assistive Forces – Nam Hee Kim (Aalto University), Hung Yu Ling (University of British Columbia), Zhaoming Xie (University of British Columbia), Michiel van de Panne (University of British Columbia) [click for abstract]
Animated motions should be simple to direct while also being plausible. We present a flexible keyframe-based character animation system that generates plausible simulated motions for both physically-feasible and physically-infeasible motion specifications. We introduce a novel control parameterization, optimizing over internal actions, external assistive-force modulation, and keyframe timing. Our method allows for emergent behaviors between keyframes, does not require advance knowledge of contacts or exact motion timing, supports the creation of physically impossible motions, and allows for near-interactive motion creation. The use of a shooting method allows for the use of any black-box simulator. We present results for a variety of 2D and 3D characters and motions, using sparse and dense keyframes. We compare our control parameterization scheme against other possible approaches for incorporating external assistive forces.
15:00 - GANSpaceSynth: Organising the Latent Space for Alternative Autonomous Features and Intelligent Behaviours on New Musical Instruments – Koray Tahiroglu (Aalto University), Miranda Kastemaa (Aalto University) and Oskar Koli (Aalto University) [click for abstract]
Generative models enable possibilities in audio domain to present timbre as vectors in a high-dimensional latent space with Generative Adversarial Networks (GANs). It is a common method in GAN models in which the musician’s control over timbre is mostly limited to sampling random points from the space and interpolating between them. In this talk, I present our novel hybrid GAN architecture, GANSpaceSynth, that allows musicians to explore the GAN latent space in a more controlled manner, identifying the audio features in the trained checkpoints and giving an opportunity to specify particular audio features to be present or absent in the generated audio samples. The applications of GANSpaceSynth, Hallu composition tool and AI-terity musical instrument, contribute to the work in generative systems, in audio domain, that makes it possible exploring GAN latent space with more awareness of what is happening musically and having the opportunity to control the development of musical creativity in a human-musician and AI cooperation.
SESSION 5: MACHINE LEARNING (15:30-16:30)
Session chair: Rohit Babbar
15:30 - L1-constrained hierarchical non-stationary Gaussian processes – Zheng Zhao (Aalto University), Rui Gao (Aalto University), Simo Särkkä (Aalto University) [click for abstract]
This work is concerned with regularized extensions of hierarchical non-stationary temporal Gaussian processes (NSGPs) in which the parameters (e.g., length-scale) are modeled as GPs. In particular, we consider two commonly used NSGP constructions which are based on explicitly constructed non-stationary covariance functions and stochastic differential equations, respectively. We extend these NSGPs by including L1-regularization on the processes in order to induce sparseness. To solve the resulting regularized NSGP (R-NSGP) regression problem we develop a method based on the alternating direction method of multipliers (ADMM) and we also analyze its convergence properties theoretically.
15:45 - System identification using Bayesian neural networks with nonparametric noise models – Christos Merkatas (Aalto University), Simo Särkkä (Aalto University) [click for abstract]
System identification is of special interest in science and engineering. This article is concerned with a system identification problem arising in stochastic dynamic systems, where the aim is to estimating the parameters of a system along with its unknown noise processes. In particular, we propose a Bayesian nonparametric approach for system identification in discrete time nonlinear random dynamical systems assuming only the order of the Markov process is known. The proposed method replaces the assumption of Gaussian distributed error components with a highly flexible family of probability density functions based on Bayesian nonparametric priors. Additionally, the functional form of the system is estimated by leveraging Bayesian neural networks which also leads to flexible uncertainty quantification. Asymptotically on the number of hidden neurons, the proposed model converges to full nonparametric Bayesian regression model. A Gibbs sampler for posterior inference is proposed and its effectiveness is illustrated in simulated and real time series.
16:00 - Likelihood-Free Inference in State-Space Models with Unknown Dynamics – Alexander Aushev (Aalto University), Thong Tran (Aalto University), Henri Pesonen (University of Oslo), Andrew Howes (Birmingham University and Aalto University), Samuel Kaski (Aalto University and University of Manchester) [click for abstract]
We introduce a method for inferring and predicting latent states in the important and difficult case of state-space models where observations can only be simulated, and transition dynamics are unknown. In this setting, the likelihood of observations is not available and only synthetic observations can be generated from a black-box simulator. We propose a way of doing likelihood-free inference (LFI) of states and state prediction with a limited number of simulations. Our approach uses a multi-output Gaussian process for state inference, and a Bayesian Neural Network as a model of the transition dynamics for state prediction. We improve upon existing LFI methods for the inference task, while also accurately learning transition dynamics. The proposed method is necessary for modelling inverse problems in dynamical systems with computationally expensive simulations, as demonstrated in experiments with non-stationary user models.
16:15 - Unbiased Loss Functions for Evaluation and Training with Missing Labels – Erik Schultheis (Aalto University), Rohit Babbar (Aalto University) [click for abstract]
This talk considers extreme multilabel classification (XMC) problems in a setting where labels are missing independently and with a known rate. The goal in XMC typically is to maximize either precision or recall at the top-ranked predictions, which can be achieved by reducing the multilabel problem into a series of binary (One-vs-All) or multiclass (Pick-all-Labels) problems. Missing labels are a ubiquitous phenomenon in XMC tasks, yet the interaction of missing labels and reductions has hitherto only been investigated for the case of One-vs-All reduction. In this paper, we close this gap by providing unbiased estimates for the Pick-all-Labels reduction, as well as the normalized reductions which are required for consistency with the recall metric. These estimators suffer from increased variance and may lead to ill-posed optimization problems, which we address by switching to convex upper-bounds. The theoretical considerations are supplemented by experiments showing that the unbiased estimators significantly alter the bias-variance trade-off.
SESSION 6: Human Aspects, Interactions and Applications (16:45-18:00)
Session chair: Simo Särkkä
16:45 - Practices and Infrastructures for ML Systems -- An Interview Study – Dennis Muiruri (University of Helsinki), Lucy Ellen Lwakatare (University of Helsinki), Jukka K. Nurminen (University of Helsinki), Tomi Mikkonen (University of Helsinki) [click for abstract]
The best practices and infrastructures for developing and maintaining machine learning (ML) enabled software systems are often reported by large and experienced data-driven organizations. However, little is known about the state of practice across other organizations. Using interviews, we investigated practices and toolchains for ML-enabled systems from sixteen organizations in various domains. Our study makes three broad observations related to data management practices, monitoring practices and automation practices in ML model training, and serving workflows. To a large extent, there are limited number of generic practices and tools applicable across organizations in different domains. We further use this work to inform about the choices of practices and infrastructure decisions within the VesselAI project. VesselAI project aims to apply ML techniques within the maritime domain characterized by extreme scale challenges. Nonetheless, there is great potential for ML-enabled applications in diverse use cases such as forecasting trajectories and predicting potential collisions of vessels.
17:00 - Sociotechnical Envelopment of Artificial Intelligence: An Approach to Organizational Deployment of Inscrutable Artificial Intelligence Systems – Aleksandre Asatiani (University of Gothenburg), Pekka Malo (Aalto University), Per Rådberg Nagbøl (IT University of Copenhagen), Esko Penttinen (Aalto University), Tapani Rinta-Kahila (University of Queensland), Antti Salovaara (Aalto University) [click for abstract]
The paper presents an approach for implementing inscrutable (i.e., nonexplainable) artificial intelligence (AI) in an accountable and safe manner in organizational settings. Drawing on an exploratory case study and the recently proposed concept of envelopment, it describes how an organization successfully “enveloped” its AI solutions to balance the AI's flexible performance with the risks that inscrutable models can entail. The paper presents several envelopment methods—establishing clear boundaries within which the AI is to interact with its surroundings, choosing and curating the training data well, and appropriately managing input and output sources—alongside their influence on the choice of AI models within the organization. This work makes illustrate how sociotechnical envelopment enables an organization to manage the trade-off between low explainability and high performance presented by inscrutable models. These contributions pave the way for more responsible, accountable AI implementations in organizations.
17:15 - mmWave Radar based Gesture Recognition: From Research to Practice – Dariush Salami (Aalto University), Ramin Hasibi (University of Bergen), Sameera Palipana (Aalto University), Luis Leiva (University of Luxembourg), Tom Michoel (University of Bergen), Stephan Sigg (Aalto University) [click for abstract]
Gesture recognition provides a natural and device-free way of non-verbal communications in a wide range of applications from vehicular scenarios to smart-home applications. RGB-Depth-based gesture recognition systems suffer from privacy issues since they make it possible to recognize people in the environment and every detail about them. Moreover, they fail to generalize on different lighting and weather conditions. To tackle the problems, we introduce mmWave FMCW radar-based gesture recognition systems that not only recognize gestures with high accuracy (up to 100% accuracy in case of using multiple radars) but also are robust to lighting and weather conditions while preserving privacy. In the first work entitled ""Pantomime: Mid-Air Gesture Recognition with Sparse Millimeter-Wave Radar Point Clouds"" which is published in IMWUT 2021, we proposed a neural network-based pipeline to sense the environment using the radar and process the data to recognize gestures. Although recognition accuracy was 95%, the model was computationally expensive to implement on embedded devices like Raspberry Pi. In the second work entitled ""Tesla-Rapture: A Lightweight Gesture Recognition System from mmWave Radar Point Clouds"" submitted to TMC, we introduced a graph-based neural network model to capture the Spatio-temporal dependencies in a single forward pass resulting in 98% recognition accuracy and 40 times computationally efficiency compared to Pantomime. Finally, in the last paper submitted to JSAC entitled ""Integrating Sensing and Communication in Cellular Networks via NR Sidelink"", we extended the radar-based gesture recognition idea to NR Sidelink concept to address the problem of congestion in the Radio Frequency (RF) spectrum. To do so, we used eight different NR Sidelink radars to demonstrate the concept addressing the shadowing problem achieving 100% recognition accuracy.
17:30 - Directing and Combining Multiple Queries for Exploratory Search by Visual Interactive Intent Modeling – Jonathan Strahl (Aalto University), Jaakko Peltonen (Tampere University), Patrik Floreen (University of Helsinki) [click for abstract]
In interactive information-seeking, a user often performs many interrelated queries and interactions covering multiple aspects of a broad topic of interest. Especially in difficult information-seeking tasks the user may need to find what is in common among such multiple aspects. Therefore, the user may need to compare and combine results across queries. While methods to combine queries or rankings have been proposed, little attention has been paid to interactive support for combining multiple queries in exploratory search. We introduce an interactive information retrieval system for exploratory search with multiple simultaneous search queries that can be combined. The user is able to direct search in the multiple queries, and combine queries by two operations: intersection and difference, which reveal what is relevant to the user intent of two queries, and what is relevant to one but not the other. Search is directed by relevance feedback on visualized user intent models of each query. Operations on queries act directly on the intent models inferring a combined user intent model. Each combination yields a new result (ranking) and acts as a new search that can be interactively directed and further combined. User experiments on difficult information-seeking tasks show that our novel system with query operations yields more relevant top-ranked documents in a shorter time than a baseline multiple-query system.
17:45 - Entitybot: Supporting Everyday Digital Tasks with Entity Recommendations – Tung Vuong (University of Helsinki), Salvatore Andolina (Università degli Studi di Palermo), Giulio Jacucci (University of Helsinki), Pedram Daee (Aalto University), Khalil Klouche (University of Helsinki), Mats Sjöberg (CSC), Tuukka Ruotsalo (University of Helsinki), Samuel Kaski (Aalto University) [click for abstract]
Everyday digital tasks can highly benefit from systems that recommend the right information to use at the right time. However, existing solutions typically support only specific applications and tasks. We demonstrate the EntityBot, a system that captures context across application boundaries and recommends information entities related to the current task. The user's digital activity is continuously monitored by capturing all content on the computer screen using optical character recognition. This includes all applications and services being used and specific to individuals' computer usages such as instant messaging, emailing, web browsing, and word processing. A linear model is then applied to detect the user's task context to retrieve entities such as applications, documents, contact information, and several keywords determining the task. The system has been evaluated with real-world tasks, demonstrating that the recommendation had an impact on the tasks and led to high user satisfaction.