Program · ECML PKDD

Schedule

For further information you can download the Program Booklet.

Conference Track
Industrial Track

Conference Track

Causal Inference and Machine Learning: Estimating and Evaluating Policies

Susan Athey - Stanford Graduate School of Business

In many contexts, a decision-making can choose to assign one of a number of "treatments" to individuals. The treatments may be drugs, offers, advertisements, algorithms, or government programs. One setting for evaluating such treatments involves randomized controlled trials, for example A/B testing platforms or clinical trials. In such settings, we show how to optimize supervised machine learning methods for the problem of estimating heterogeneous treatment effects, while preserving a key desiderata of randomized trials, which is providing valid confidence intervals for estimates. We also discuss approaches for estimating optimal policies and online learning. In environments with observational (non-experimental) data, different methods are required to separate correlation from causality. We show how supervised machine learning methods can be adapted to this problem.

About the speaker...

Susan Athey is The Economics of Technology Professor at Stanford Graduate School of Business. She received her bachelor's degree from Duke University and her Ph.D. from Stanford, and she holds an honorary doctorate from Duke University. She previously taught at the economics departments at MIT, Stanford and Harvard. In 2007, Professor Athey received the John Bates Clark Medal, awarded by the American Economic Association to "that American economist under the age of forty who is adjudged to have made the most significant contribution to economic thought and knowledge." She was elected to the National Academy of Science in 2012 and to the American Academy of Arts and Sciences in 2008. Professor Athey’s research focuses on the economics of the internet, online advertising, the news media, marketplace design, virtual currencies and the intersection of computer science, machine learning and economics. She advises governments and businesses on marketplace design and platform economics, notably serving since 2007 as a long-term consultant to Microsoft Corporation in a variety of roles, including consulting chief economist.

Automating Machine Learning

Zoubin Ghahramani - University of Cambridge and Alan Turing Institute

I will describe the "Automatic Statistician" (http://www.automaticstatistician.com/), a project which aims to automate the exploratory analysis and modelling of data. Our approach starts by defining a large space of related probabilistic models via a grammar over models, and then uses Bayesian marginal likelihood computations to search over this space for one or a few good models of the data. The aim is to find models which have both good predictive performance, and are somewhat interpretable. The Automatic Statistician generates a natural language summary of the analysis, producing a 10-15 page report with plots and tables describing the analysis. I will also link this to recent work we have been doing in the area of Probabilistic Programming (including an new system in Julia) to automate inference, and on the rational allocation of computational resources (and our entry in the AutoML conference).

About the speaker...

Zoubin Ghahramani FRS is Professor of Information Engineering at the University of Cambridge, where he leads the Machine Learning Group, and the Cambridge Liaison Director of the Alan Turing Institute, the UK’s national institute for Data Science. He studied computer science and cognitive science at the University of Pennsylvania, obtained his PhD from MIT in 1995, and was a postdoctoral fellow at the University of Toronto. His academic career includes concurrent appointments as one of the founding members of the Gatsby Computational Neuroscience Unit in London, and as a faculty member of CMU's Machine Learning Department for over 10 years. His current research interests include statistical machine learning, Bayesian nonparametrics, scalable inference, probabilistic programming, and building an automatic statistician. He has published over 250 research papers, and has held a number of leadership roles as programme and general chair of the leading international conferences in machine learning including: AISTATS (2005), ICML (2007, 2011), and NIPS (2013, 2014). In 2015 he was elected a Fellow of the Royal Society.

AlphaGo - Mastering the Game of Go with Deep Neural Networks and Tree Search

Thore Graepel - Google DeepMind and University College London

The game of Go has long been viewed as the most challenging of classic games for artificial intelligence owing to its enormous search space and the difficulty of evaluating board positions and moves. Here we introduce a new approach to computer Go that uses 'value networks' to evaluate board positions and ‘policy networks’ to select moves. These deep neural networks are trained by a novel combination of supervised learning from human expert games, and reinforcement learning from games of self-play.

Using this search algorithm, our program AlphaGo achieved a 99.8% winning rate against other Go programs and beat the human European Go champion Fan Hui by 5 games to 0, a feat thought to be at least a decade away by Go and AI experts alike. Finally, in a dramatic and widely publicised match, AlphaGo defeated Lee Sedol, the top player of the past decade, 4 games to 1.

In this talk, I will explain how AlphaGo works, describe our process of evaluation and improvement, and discuss what we can learn about computational intuition and creativity from the way AlphaGo plays.

About the speaker...

Thore Graepel is a research group lead at Google DeepMind and holds a part-time position as Chair of Machine Learning at University College London. He studied physics at the University of Hamburg, Imperial College London, and Technical University of Berlin, where he also obtained his PhD in machine learning in 2001. He spent time as a postdoctoral researcher at ETH Zurich and Royal Holloway College, University of London, before joining Microsoft Research in Cambridge in 2003, where he co-founded the Online Services and Advertising group. Major applications of Thore’s work include Xbox Live’s TrueSkill system for ranking and matchmaking, the AdPredictor framework for click-through rate prediction in Bing, and the Matchbox recommender system which inspired the recommendation engine of Xbox Live Marketplace. More recently, Thore’s work on the predictability of private attributes from digital records of human behaviour has been the subject of intense discussion among privacy experts and the general public. Thore’s current research interests include probabilistic graphical models and inference, reinforcement learning, games, and multi-agent systems. He has published over one hundred peer-reviewed papers, is a named co-inventor on dozens of patents, serves on the editorial boards of JMLR and MLJ, and is a founding editor of the book series Machine Learning & Pattern Recognition at Chapman & Hall/CRC. At DeepMind, Thore has returned to his original passion of understanding and creating intelligence, and recently contributed to creating AlphaGo, the first computer program to defeat a human professional player in the full-sized game of Go, a feat previously thought to be at least a decade away.

Sequences, Choices, and their Dynamics ( slides)

Ravi Kumar - Google

Sequences arise in many online and offline settings: urls to visit, songs to listen to, videos to watch, restaurants to dine at, and so on. User-generated sequences are tightly related to mechanisms of choice, where a user must select one from a finite set of alternatives. In this talk, we will discuss a class of problems arising from studying such sequences and the role discrete choice theory plays in these problems. We will present modeling and algorithmic approaches to some of these problems and illustrate them in the context of large-scale data analysis.

About the speaker...

Ravi Kumar has been a senior staff research scientist at Google since 2012. Prior to this, he was a research staff member at the IBM Almaden Research Center and a principal research scientist at Yahoo! Research. His research interests include Web search and data mining, algorithms for massive data, and the theory of computation.

Dimensionality reduction with certainty ( slides)

Rasmus Pagh - IT University of Copenhagen

Tool such as Johnson-Lindenstrauss dimensionality reduction and 1-bit minwise hashing have been successfully used to transform problems involving very high-dimensional real vectors into lower-dimensional equivalents, at the cost of introducing a random distortion of distances/similarities among vectors. While this can alleviate the computational cost associated with high dimensionality, the effect on the outcome of the computation (compared to working on the original vectors) can be hard to analyze and interpret. For example, the behavior of a basic kNN classifier is easy to describe and interpret, but if the algorithm is run on dimension-reduced vectors with distorted distances it is much less transparent what is happening.

The talk starts with an introduction to randomized (data-independent) dimensionality reduction methods and gives some example applications in machine learning. Based on recent work in the theoretical computer science community we describe tools for dimension reduction that give stronger guarantees on approximation, replacing probabilistic bounds on distance/similarity with bounds that hold with certainty. For example, we describe a "distance sensitive Bloom filter": a succinct representation of high-dimensional boolean vectors that can identify vectors within distance r with certainty, while far vectors are only thought to be close with a small "false positive" probability. We also discuss work towards a deterministic alternative to random feature maps (i.e., dimension-reduced vectors from a high-dimensional feature space), and settings in which a pair of dimension-reducing mappings outperform single-mapping methods. While there are limits to what performance can be achieved with certainty, such techniques may be part of the toolbox for designing transparent and scalable machine learning and knowledge discovery methods.

About the speaker...

Rasmus Pagh graduated from Aarhus University in 2002, and is now a full professor at the IT University of Copenhagen. His work is centered around efficient algorithms for big data, with an emphasis on randomized techniques. His publications span theoretical computer science, databases, information retrieval, knowledge discovery, and parallel computing. His most well-known work is the cuckoo hashing algorithm (2001), which has led to new developments in several fields. In 2014 he received the best paper award at the WWW Conference for a paper with Pham and Mitzenmacher on similarity estimation, and started a 5-year research project funded by the European Research Council on scalable similarity search.

Social Learning

Alex "Sandy" Pentland - MIT

Human decisions are heavily influenced by social interaction, so that predicting or influencing individual behavior requires modeling these interaction effects. In addition the distributed learning strategies exhibited by human communities suggest methods of improving both machine learning and human-machine systems. Several practical examples will be described.

About the speaker...

Professor Alex "Sandy" Pentland directs the MIT Connection Science and Human Dynamics labs and previously helped create and direct the MIT Media Lab and the Media Lab Asia in India. He is one of the most-cited scientists in the world, and Forbes recently declared him one of the "7 most powerful data scientists in the world" along with Google founders and the Chief Technical Officer of the United States. He has received numerous awards and prizes such as the McKinsey Award from Harvard Business Review, the 40th Anniversary of the Internet from DARPA, and the Brandeis Award for work in privacy.

He is a founding member of advisory boards for Google, AT&T, Nissan, and the UN Secretary General, a serial entrepreneur who has co-founded more than a dozen companies including social enterprises such as the Data Transparency Lab, the Harvard-ODI-MIT DataPop Alliance and the Institute for Data Driven Design. He is a member of the U.S. National Academy of Engineering and leader within the World Economic Forum.

Industrial Track

Towards Industrial Machine Intelligence

Michael May - Siemens Corporate Technology, Munich

The next decade will see a deep transformation of industrial applications by big data analytics, machine learning and the internet of things. Industrial applications have a number of unique features, setting them apart from other domains. Central for many industrial applications in the internet of things is time series data generated by often hundreds or thousands of sensors at a high rate, e.g. by a turbine or a smart grid. In a first wave of applications this data is centrally collected and analyzed in Map-Reduce or streaming systems for condition monitoring, root cause analysis, or predictive maintenance. The next step is to shift from centralized analysis to distributed in-field or in situ analytics, e.g in smart cities or smart grids. The final step will be a distributed, partially autonomous decision making and learning in massively distributed environments.

In this talk I will give an overview on Siemens’ journey through this transformation, highlight early successes, products and prototypes and point out future challenges on the way towards machine intelligence. I will also discuss architectural challenges for such systems from a Big Data point of view.

About the speaker...

Michael May is Head of the Technology Field Business Analytics & Monitoring at Siemens Corporate Technology, Munich, and responsible for eleven research groups in Europe, US, and Asia. Michael is driving research at Siemens in data analytics, machine learning and big data architectures. In the last two years he was responsible for creating the Sinalytics platform for Big Data applications across Siemens’ business.

Before joining Siemens in 2013, Michael was Head of the Knowledge Discovery Department at the Fraunhofer Institute for Intelligent Analysis and Information Systems in Bonn, Germany. In cooperation with industry he developed Big Data Analytics applications in sectors ranging from telecommunication, automotive, and retail to finance and advertising.

Between 2002 and 2009 Michael coordinated two Europe-wide Data Mining Research Networks (KDNet, KDubiq). He was local chair of ICML 2005, ILP 2005 and program chair of the ECML/PKDD Industrial Track 2015. Michael did his PhD on machine discovery of causal relationships at the Graduate Programme for Cognitive Science at the University of Hamburg.

Machine Learning Challenges at Amazon

Matthias Seeger - Amazon, Berlin

At Amazon, some of the world's largest and most diverse problems in e-commerce, logistics, digital content management, and cloud computing services are being addressed by machine learning on behalf of our customers. In this talk, I will give an overview of a number of key areas and associated machine learning challenges.

About the speaker...

Matthias Seeger got his PhD from Edinburgh. He had academic appointments at UC Berkeley, MPI Tuebingen, Saarbruecken, and EPF Lausanne. Currently, he is a principal applied scientist at Amazon in Berlin. His interests are in Bayesian methods, large scale probabilistic learning, active decision making and forecasting.

Accepted papers

Conference Track

A Bayesian Network Model for Interesting Itemsets
Jari Fowkes, Charles Sutton

Room: 300B
2016-09-20; 15:10 - 15:30
Authors:
- Jari Fowkes (University of Edinburgh)
- Charles Sutton (University of Edinburgh)
Abstract:

Mining itemsets that are the most interesting under a statistical model of the underlying data is a commonly used and well-studied technique for exploratory data analysis, with the most recent interestingness models exhibiting state of the art performance. Continuing this highly promising line of work, we propose the first, to the best of our knowledge, generative model over itemsets, in the form of a Bayesian network, and an associated novel measure of interestingness. Our model is able to efficiently infer interesting itemsets directly from the transaction database using structural EM, in which the E-step employs the greedy approximation to weighted set cover. Our approach is theoretically simple, straightforward to implement, trivially parallelizable and retrieves itemsets whose quality is comparable to, if not better than, existing state of the art algorithms as we demonstrate on several real-world datasets.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46227-1_26

A Hybrid Knowledge Discovery Approach for Mining Predictive Biomarkers in Metabolomic Data
Dhouha Grissa, Blandine Comte, Estelle Pujos Guillot, Amedeo Napoli

Room: 300B
2016-09-20; 16:40 - 17:00
Authors:
- Dhouha Grissa (INRA)
- Blandine Comte (INRA)
- Estelle Pujos Guillot (INRA)
- Amedeo Napoli (Inria Nancy Grand Est / LORIA)
Abstract:

The analysis of complex biological data deriving from metabolomic analytical platforms is a challenge. In this paper, we propose a new approach for processing massive and complex metabolomic data generated by such platforms with appropriate methods for discovering meaningful biological patterns. The analyzed datasets are constituted of a limited set of individuals and a large set of attributes or features where predictive makers of clinical outcomes should be mined. The experiments are based on a hybrid knowledge discovery process combining numerical methods such as SVM, Random Forests (RF) and ANOVA, with a symbolic method, such as Formal Concept Analysis (FCA). The outputs of these experiments highlight the best potential predictive biomarkers of metabolic diseases and the fact that RF and ANOVA are the most suited methods for feature selection and discovery. The visualization of predictive biomarkers is based on correlation graphs and heatmaps while FCA is used for visualizing the best feature selection methods within a concept lattice easily interpretable by an expert.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46128-1_36

A Novel Incremental Covariance-guided One-class Support Vector Machine
Takoua Kefi, Riadh Ksantini, Mohamed Kaâniche, Adel Bouhoula

Room: 300A
2016-09-20; 11:20 - 11:40
Authors:
- Takoua Kefi (Higher School of Communication)
- Riadh Ksantini (University of Windsor, 401, Sunset Avenue, Windsor, ON, Canada)
- Mohamed Kaâniche (Higher School of Communication of Tunis, Tunisia)
- Adel Bouhoula (Higher School of Communication of Tunis, Tunisia)
Abstract:

Covariance-guided One-Class Support Vector Machine (COSVM) is a very competitive kernel classifier, as it emphasizes the low variance projectional directions of the training data, which results in high accuracy. However, COSVM training involves solving a constrained convex optimization problem, which requires large memory and enormous amount of training time, especially for large scale datasets. Moreover, it has difficulties in classifying sequentially obtained data. For these reasons, this paper introduces an incremental COSVM method by controlling the possible changes of support vectors after the addition of new data points. The control procedure is based on the relationship between the Karush-Kuhn-Tuker conditions of COSVM and the distribution of the training set. Comparative experiments have been carried out to show the effectiveness of our proposed method, both in terms of execution time and classification accuracy. Incremental COSVM results in better classification performance when compared to canonical COSVM and contemporary incremental one-class classifiers.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46227-1_2

A Split-Merge DP-means Algorithm to Avoid Local Minima
Shigeyuki Odashima, Miwa Ueki, Naoyuki Sawasaki

Room: 300A
2016-09-20; 12:40 - 13:00
Authors:
- Shigeyuki Odashima (Fujitsu Laboratories LTD.)
- Miwa Ueki (Fujitsu Laboratories LTD.)
- Naoyuki Sawasaki (Fujitsu Laboratories LTD.)
Abstract:

We present an extension of the DP-means algorithm, a hard-clustering approximation of nonparametric Bayesian models. Though a recent work reports that the DP-means can converge to a local minimum, the condition when the DP-means converges to a local minimum is still unknown. This paper demonstrates that one reason why the DP-means converges to a local minimum: the DP-means cannot assign the optimal number of clusters when many data points exist within small distances. As a first attempt to avoid the local minimum, we propose an extension of the DP-means with the split-merge technique. The proposed algorithm splits clusters when the cluster has many data points to assign the number of clusters near to optimal. The experimental results with multiple datasets show the robustness of the proposed algorithm.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46227-1_5

Actively Interacting with Experts: A Probabilistic Logic Approach
Phillip Odom, Sriraam Natarajan

Room: 300B
2016-09-20; 11:40 - 12:00
Authors:
- Phillip Odom
- Sriraam Natarajan (Indiana University)
Abstract:

Machine learning approaches that utilize human experts combine domain experience with data to generate novel knowledge. Unfortunately, most methods either provide only a limited form of communication with the human expert and/or are overly reliant on the human expert to specify their knowledge upfront. Thus, the expert is unable to understand what the system could learn without the involvement of him/her. Allowing the learning algorithm to query the human expert in the most useful areas of the feature space takes full advantage of the data as well as the expert. We introduce active advice-seeking for relational domains. Relational logic allows for compact, but expressive interaction between the human expert and the learning algorithm. We demonstrate our algorithm empirically on several standard relational datasets.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46227-1_33

adaQN: An Adaptive Quasi-Newton Algorithm for Training RNNs
Nitish Shirish Keskar, Albert Berahas

Room: 1000A
2016-09-20; 11:20 - 11:40
Authors:
- Nitish Shirish Keskar (Northwestern University)
- Albert Berahas (Northwestern University)
Abstract:

Recurrent Neural Networks (RNNs) are powerful models that achieve exceptional performance on a plethora pattern recognition problems. However, the training of RNNs is a computationally difficult task owing to the well-known "vanishing/exploding" gradient problem. Algorithms proposed for training RNNs either exploit no (or limited) curvature information and have cheap per-iteration complexity, or attempt to gain significant curvature information at the cost of increased per-iteration cost. The former set includes diagonally-scaled first-order methods such as ADAGRAD and ADAM, while the latter consists of second-order algorithms like Hessian-Free Newton and K-FAC. In this paper, we present adaQN, a stochastic quasi-Newton algorithm for training RNNs. Our approach retains a low per-iteration cost while allowing for non-diagonal scaling through a stochastic L-BFGS updating scheme. The method uses a novel L-BFGS scaling initialization scheme and is judicious in storing and retaining L-BFGS curvature pairs. We present numerical experiments on two language modeling tasks and show that adaQN is competitive with popular RNN training algorithms.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46128-1_1

Aggregating Crowdsourced Ordinal Labels via Bayesian Clustering
Xiawei Guo, James Kwok

Room: 300A
2016-09-20; 17:40 - 18:00
Authors:
- Xiawei Guo (HKUST)
- James Kwok (HKUST)
Abstract:

Crowdsourcing allows the collection of labels from a crowd of workers at low cost. In this paper, we focus on ordinal labels, whose underlying order is important. However, the labels can be noisy as there may be amateur workers, spammers and/or even malicious workers. Moreover, some workers/items may have very few labels, making the estimation of their behavior difficult. To alleviate these problems, we propose a novel Bayesian model that clusters workers and items together using the nonparametric Dirichlet process priors. This allows workers/items in the same cluster to borrow strength from each other. Instead of directly computing the posterior of this complex model, which is infeasible, we propose a new variational inference procedure. Experimental results on a number of real-world data sets show that the proposed algorithm are more accurate than the state-of-the-art.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46128-1_27

Reproducible research
An Efficient Algorithm for Mining Frequent Sequence with Constraint Programming
John AOGA, Pierre Schaus, Tias Guns

Room: 300B
2016-09-22; 14:50 - 15:10
Authors:
- John AOGA (INGI/ICTEAM/SST/UCL)
- Pierre Schaus (ICTEAM/SST/UCL)
- Tias Guns
Abstract:

The main advantage of Constraint Programming (CP) approaches for sequential pattern mining (SPM) is their modularity, which includes the ability to add new constraints (regular expressions, length restrictions, etc). The current best CP approach for SPM uses a global constraint (module) that computes the projected database and enforces the minimum frequency; it does this with a filtering algorithm similar to the PrefixSpan method. However, the resulting system is not as scalable as some of the most advanced mining systems like Zaki’s cSPADE. We show how, using techniques from both data mining and constraint programming, one can use a generic constraint solver and yet outperform existing specialized systems. This is mainly due to two improvements in the module that computes the projected frequencies: first, computing the projected database can be sped up by pre-computing the positions at which an item can become unsupported by a sequence, thereby avoiding to scan the full sequence each time; and second by taking inspiration from the trailing used in CP solvers to devise a backtracking-aware data structure that allows fast incremental storing and restoring of the projected database. Detailed experiments show how this approach outperforms existing CP as well as specialized systems for SPM, and that the gain in efficiency translates directly into increased efficiency for constraint-based settings such as mining with regular expressions too.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46227-1_20
Download links: Code Data

An Online Gibbs Sampler Algorithm for Hierarchical Dirichlet Processes Prior
Yongdai Kim, Minwoo Chae, Kuhwan Jeong, Byungyup Kang, Hyoju Chung

Room: 1000B
2016-09-20; 16:40 - 17:00
Authors:
- Yongdai Kim
- Minwoo Chae
- Kuhwan Jeong (Seoul National University)
- Byungyup Kang
- Hyoju Chung
Abstract:

The hierarchical Dirichlet processes (HDP) is a Bayesian nonparametric model that provides a flexible mixed-membership to documents. In this paper, we develop a novel mini-batch online Gibbs sampler algorithm for the HDP which can be easily applied to massive and streaming data. For this purpose, a new prior process so called the generalized hierarchical Dirichlet processes (gHDP) is proposed. The gHDP is an extension of the standard HDP where some prespecified topics can be included in the top-level Dirichlet process. By analyzing various datasets, we show that the proposed mini-batch online Gibbs sampler algorithm performs significantly better than the online variational algorithm for the HDP.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46128-1_32

Anti Imitation-based Policy Learning
Michelle Sebag, Marc Schoenauer, Riad Akrour, Basile Mayeur

Room: 1000B
2016-09-22; 12:00 - 12:20
Authors:
- Michelle Sebag (LRI-CNRS)
- Marc Schoenauer (INRIA)
- Riad Akrour (LRI)
- Basile Mayeur (LRI)
Abstract:

The Anti Imitation-based Policy Learning (AIPoL) approach, taking inspiration from the Energy-based learning framework (LeCun et al. 2006), aims at a pseudo-value function such that it induces the same local order on the state space as a (nearly optimal) value function. By construction, the greedification of such a pseudo-value induces the same policy as the value function itself. The approach assumes that, thanks to prior knowledge, not-to-be-imitated demonstrations can easily be generated. For instance, applying a random policy on a good initial state (e.g., a bicycle in equilibrium) will on average lead to visit states with decreasing values (the bicycle ultimately falls down). Such a demonstration, that is, a sequence of states with decreasing values, is used along a standard learning-to-rank approach to define a pseudo-value function. If the model of the environment is known, this pseudo-value directly induces a policy by greedification. Otherwise, the bad demonstrations are exploited together with off-policy learning to learn a pseudo-Q-value function and likewise thence derive a policy by greedification. To our best knowledge the use of bad demonstrations to achieve policy learning is original. The theoretical analysis shows that the loss of optimality of the pseudo value-based policy is bounded under mild assumptions, and the empirical validation of AIPoL on the mountain car, the bicycle and the swing-up pendulum problems demonstrates the simplicity and the merits of the approach.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46227-1_35

Aspect Mining with Rating Bias
Yitong Li, Chuan Shi, Huidong Zhao, Fuzhen Zhuang, Bin Wu

Room: 300A
2016-09-20; 15:50 - 16:10
Authors:
- Yitong Li (BUPT)
- Chuan Shi
- Huidong Zhao
- Fuzhen Zhuang (Inst. of Com. Tech., CAS)
- Bin Wu
Abstract:

Due to the personalized needs for specific aspect evaluation on product quality, these years have witnessed a boom of researches on aspect rating prediction, whose goal is to extract ad hoc aspects from online reviews and predict rating or opinion on each aspect. Most of the existing works on aspect rating prediction have a basic assumption that the overall rating is the average score of aspect ratings or the overall rating is very close to aspect ratings. However, after analyzing real datasets, we have an insightful observation: there is an obvious rating bias between overall rating and aspect ratings. Motivated by this observation, we study the problem of aspect mining with rating bias, and design a novel RAting-center model with BIas (RABI). Different from the widely used review-center models, RABI adopts the overall rating as the center of the probabilistic model, which generates reviews and topics. In addition, a novel aspect rating variable in RABI is designed to effectively integrate the rating bias priori information. Experiments on two real datasets (Dianping and TripAdvisor) validate that RABI significantly improves the prediction accuracy over existing state-of-the-art methods.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46227-1_29

Asynchronous Distributed Incremental Computation on Evolving Graphs
Jiangtao Yin, Lixin Gao

Room: 1000B
2016-09-20; 11:00 - 11:20
Authors:
- Jiangtao Yin (University of Massachusetts Amherst)
- Lixin Gao (University of Massachusetts Amherst)
Abstract:

Graph algorithms have become an essential component in many real-world applications. An essential property of graphs is that they are often dynamic. Many applications must update the computation result periodically on the new graph so as to keep it up-to-date. Incremental computation is a promising technique for this purpose. Traditionally, incremental computation is typically performed synchronously, since it is easy to implement. In this paper, we illustrate that incremental computation can be performed asynchronously as well. Asynchronous incremental computation can bypass synchronization barriers and always utilize the most recent values, and thus it is more efficient than its synchronous counterpart. Furthermore, we develop a distributed framework, GraphIn, to facilitate implementations of incremental computation on massive evolving graphs. We evaluate our asynchronous incremental computation approach via extensive experiments on a local cluster as well as the Amazon EC2 cloud. The evaluation results show that it can accelerate the convergence speed by as much as 14x when compared to recomputation from scratch.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46227-1_45

Asynchronous Feature Extraction for Large-scale Linear Predictors
Shin Matsushima

Room: 1000A
2016-09-21; 17:20 - 17:40
Authors:
- Shin Matsushima (The University of Tokyo)
Abstract:

We are interested in learning from large-scale datasets with a massive number of possible features in order to obtain more accurate predictors. Specifically, the goal of this paper is to perform effective learning within the L1 regularized risk minimization problems in terms of both time and space computational resources. This will be accomplished by concentrating on the effective features out of a large number of unnecessary features. In order to do this, we propose a multi-threaded scheme that simultaneously runs processes for developing seemingly important features in the main memory and updating parameters with respect to only the seemingly important features. We verified our method through computational experiments showing that our proposed scheme can handle terabyte scale optimization problems with one machine.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46128-1_38

Reproducible research
Attribute Conjunction Learning with Recurrent Neural Network
Kongming Liang, Hong Chang, Shiguang Shan, Xilin Chen

Room: 1000B
2016-09-22; 17:00 - 17:20
Authors:
- Kongming Liang (ICT,CAS,CHINA)
- Hong Chang (ICT,CAS,CHINA)
- Shiguang Shan (ICT,CAS,CHINA)
- Xilin Chen (ICT,CAS,CHINA)
Abstract:

Searching images with multi-attribute queries shows practical significance in various real world applications. The key problem in this task is how to effectively and efficiently learn from the conjunction of query attributes. In this paper, we propose Attribute Conjunction Recurrent Neural Network (AC-RNN) to tackle this problem. Attributes involved in a query are mapped into the hidden units and combined in a recurrent way to generate the representation of the attribute conjunction, which is then used to compute a multi-attribute classifier as the output. To mitigate the data imbalance problem of multi-attribute queries, we propose a data weighting procedure in attribute conjunction learning with small positive samples. We also discuss on the influence of attribute order in a query and present two methods based on attention mechanism and ensemble learning respectively to further boost the performance. Experimental results on aPASCAL, ImageNet Attributes and LFWA datasets show that our method consistently and significantly outperforms the other comparison methods on all types of queries.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46128-1_22
Download links: Code

Reproducible research
AUC-Maximized Deep Convolutional Neural Fields for Protein Sequence Labeling
Sheng Wang, Siqi Sun, Jinbo Xu

Room: 1000A
2016-09-20; 11:00 - 11:20
Authors:
- Sheng Wang (University of Chicago)
- Siqi Sun (TTIC)
- Jinbo Xu (TTIC)
Abstract:

Deep Convolutional Neural Networks (DCNN) has shown excellent performance in a variety of machine learning tasks. This paper presents Deep Convolutional Neural Fields (DeepCNF), an integration of DCNN with Conditional Random Field (CRF), for sequence labeling with an imbalanced label distribution. The widely-used training methods, such as maximum-likelihood and maximum labelwise accuracy, do not work well on imbalanced data. To handle this, we present a new training algorithm called maximum-AUC for DeepCNF. That is, we train DeepCNF by directly maximizing the empirical Area Under the ROC Curve (AUC), which is an unbiased measurement for imbalanced data. To fulfill this, we formulate AUC in a pairwise ranking framework, approximate it by a polynomial function and then apply a gradient-based procedure to optimize it. We then test our AUC-maximized DeepCNF on three very different protein sequence labeling tasks: solvent accessibility prediction, 8-state secondary structure prediction, and disorder prediction. Our experimental results confirm that maximum-AUC greatly outperforms the other two training methods on 8-state secondary structure prediction and disorder prediction since their label distributions are highly imbalanced and also has similar performance as the other two training methods on solvent accessibility prediction, which has three equally-distributed labels. Furthermore, our experimental results show that our AUC-trained DeepCNF models greatly outperform existing popular predictors of these three tasks.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46227-1_1
Download links: Code

Augmented leverage score sampling with bounds
Daniel Perry, Ross Whitaker

Room: Belvedere
2016-09-22; 17:00 - 17:20
Authors:
- Daniel Perry (University of Utah)
- Ross Whitaker (University of Utah)
Abstract:

We present an interesting modification to the traditional leverage score sampling approach by augmenting the scores with information from the data variance, which improves empirical performance on the column subsample selection problem (CSSP). We further present, to our knowledge, the first deterministic bounds for this augmented leverage score, and discuss how it compares to the traditional leverage score. We present some experimental results demonstrating how the augmented leverage score performs better than traditional leverage score sampling on CSSP in both a deterministic and probabilistic setting.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46227-1_34

Ballpark Learning: Estimating Labels from Rough Group Comparisons
Tom Hope, Dafna Shahaf

Room: 1000B
2016-09-22; 15:30 - 15:50
Authors:
- Tom Hope (Hebrew University)
- Dafna Shahaf (Hebrew University)
Abstract:

We are interested in estimating individual labels given only coarse, aggregated signal over the data points. In our setting, we receive sets ("bags") of unlabeled instances with constraints on label proportions. We relax the unrealistic assumption of known label proportions, made in previous work; instead, we assume only to have upper and lower bounds, and constraints on bag differences. We motivate the problem and propose an intuitive problem formulation and algorithm, and apply it to real-world problems. Across several domains, we show how using only proportion constraints and no labeled examples, we can achieve surprisingly high accuracy. In particular, we demonstrate how to predict income level using rough stereotypes and how to perform sentiment analysis using very little information. We also apply our method to guide exploratory analysis, recovering geographical differences in twitter dialect.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46227-1_19

BASS: A Bootstrapping Approach for Aligning Heterogenous Social Networks
Xuezhi Cao, Yong Yu

Room: 1000A
2016-09-22; 11:40 - 12:00
Authors:
- Xuezhi Cao (Shanghai Jiao Tong University)
- Yong Yu (Shanghai Jiao Tong University)
Abstract:

Most people now participate in more than one online social network (OSN). However, the alignment indicating which accounts belong to same natural person is not revealed. Aligning these isolated networks can provide united environment for users and help to improve online personalization services. In this paper, we propose a bootstrapping approach BASS to recover the alignment. It is an unsupervised general-purposed approach with minimum limitation on target networks and users, and is scalable for real OSNs. Specifically, We jointly model user consistencies of usernames, social ties, and user generated contents, and then employ EM algorithm for the parameter learning. For analysis and evaluation, We collect and publish large-scale data sets covering various types of OSNs and multi-lingual scenarios. We conduct extensive experiments to demonstrate the performance of BASS, concluding that our approach significantly outperform state-of-the-art approaches.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46128-1_29

Beyond the Boundaries of SMOTE: A Framework for Manifold-Based Synthetically Oversampling
Colin Bellinger, Chris Drummond, Nathalie Japkowicz

Room: 1000A
2016-09-20; 15:30 - 15:50
Authors:
- Colin Bellinger (University of Ottawa)
- Chris Drummond (National Research Council of Canada)
- Nathalie Japkowicz (University of Ottawa)
Abstract:

Problems of class imbalance appear in diverse domains, ranging from gene function annotation to spectra and medical classification. On such problems, the classifier becomes biased in favour of the majority class. This leads to inaccuracy on the important minority classes, such as specific diseases and gene functions. Synthetic oversampling mitigates this by balancing the training set, whilst avoiding the pitfalls of random under and oversampling. The existing methods are primarily based on the SMOTE algorithm, which employs a bias of randomly generating points between nearest neighbours. The relationship between the generative bias and the latent distribution has a significant impact on the performance of the induced classifier. Our research into gamma-ray spectra classification has shown that the generative bias applied by SMOTE is inappropriate for domains that conform to the manifold property, such as spectra, text, image and climate change classification. To this end, we propose a framework for manifold-based synthetic oversampling, and demonstrate its superiority in terms of robustness to the manifold with respect to the AUC on three spectra classification tasks and 16 UCI datasets.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46128-1_16

Reproducible research
Building Ensembles of Adaptive Nested Dichotomies with Random-Pair Selection
Tim Leathart, Bernhard Pfahringer, Eibe Frank

Room: 1000A
2016-09-22; 15:50 - 16:10
Authors:
- Tim Leathart (University of Waikato)
- Bernhard Pfahringer (University of Waikato)
- Eibe Frank (University of Waikato)
Abstract:

A system of nested dichotomies is a method of decomposing a multi-class problem into a collection of binary problems. Such a system recursively splits the set of classes into two subsets, and trains a binary classifier to distinguish between each subset. Even though ensembles of nested dichotomies with random structure have been shown to perform well in practice, using a more sophisticated class subset selection method can be used to improve classification accuracy. We investigate an approach to this problem called random-pair selection, and evaluate its effectiveness compared to other published methods of subset selection. We show that our method outperforms other methods in many cases, and is at least on par in almost all other cases.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46227-1_12
Download links: Code

Cancer: Another Algorithm for Subtropical Matrix Factorization
Sanjar Karaev, Pauli Miettinen

Room: 1000A
2016-09-20; 10:00 - 10:30
Authors:
- Sanjar Karaev (MPI Informatics)
- Pauli Miettinen (Max-Planck Institute for Informatics)
Abstract:

Subtropical algebra is a semi-ring over the nonnegative real numbers with standard multiplication and the addition defined as the maximum operator. Factorizing a matrix over the subtropical algebra gives us a representation of the original matrix with element-wise maximum over a collection of nonnegative rank-1 matrices. Such structure can be compared to the well-known Nonnegative Matrix Factorization (NMF) that gives an element-wise sum over a collection of nonnegative rank-1 matrices. Using the maximum instead of sum changes the `parts-of-whole' interpretation of NMF to `winner-takes-it-all' interpretation. We recently introduced an algorithm for subtropical matrix factorization, called Capricorn, that was designed to work on discrete-valued data with discrete noise [Karaev & Miettinen, SDM '16]. In this paper we present another algorithm, called Cancer, that is designed to work over continuous-valued data with continuous noise - arguably, the more common case. We show that Cancer is capable of finding sparse factors with excellent reconstruction error, being better than either Capricorn, NMF, or SVD in continuous subtropical data. We also show that the winner-takes-it-all interpretation is usable in many real-world scenarios and lets us find structure that is different, and often easier to interpret, than what is found by NMF.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46227-1_36

CHADE: Metalearning with Classifier Chains for Dynamic Combination of Classifiers
Fábio Pinto, Carlos Soares, Joao Moreira

Room: 1000A
2016-09-20; 15:50 - 16:10
Authors:
- Fábio Pinto (INESC TEC)
- Carlos Soares
- Joao Moreira (LIAAD-INESC TEC)
Abstract:

Dynamic selection or combination (DSC) methods allow to select one or more classifiers from an ensemble according to the characteristics of a given test instance x. Most methods proposed for this purpose are based on the nearest neighbors algorithm: it is assumed that if a classifier performed well on a set of instances similar to x, it will also perform well on x. We address the problem of dynamically combining a pool of classifiers by combining two approaches: metalearning and multi-label classification. Taking into account that diversity is a fundamental concept in ensemble learning and the interdependencies between the classifiers cannot be ignored, we solve the multi-label classification problem by using a widely known technique: Classifier Chains (CC). Additionally, we extend a typical metalearning approach by combining metafeatures characterizing the interdependencies between the classifiers with the base-level features. We executed experiments on 42 classification datasets and compared our method with several state-of-the-art DSC techniques, including another metalearning approach. Results show that our method allows an improvement over the other metalearning approach and is very competitive with other four DSC methods.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46128-1_26

Communication-Efficient Distributed Online Learning with Kernels
Michael Kamp, Sebastian Bothe, Mario Boley, Michael Mock

Room: 300A
2016-09-22; 17:20 - 17:40
Authors:
- Michael Kamp (Fraunhofer IAIS)
- Sebastian Bothe (Fraunhofer IAIS)
- Mario Boley (Fritz Haber Institute of the Max Planck Society)
- Michael Mock (Fraunhofer IAIS)
Abstract:

We propose an efficient distributed online learning protocol for low-latency real-time services. It extends a previously presented protocol to kernelized online learners for multiple dynamic data streams. While kernelized learners achieve higher predictive performance on many real-world problems, communicating the support vector expansions of their models in-between distributed learners becomes inefficient for large numbers of support vectors. We propose a strong criterion for efficiency and investigate settings in which the proposed protocol fulfills this criterion. For that, we bound its communication, provide a loss bound, and evaluate the trade-off between predictive performance and communication.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46227-1_50

Composite denoising autoencoders
Krzysztof Geras, Charles Sutton

Room: 1000A
2016-09-20; 12:00 - 12:20
Authors:
- Krzysztof Geras (University of Edinburgh)
- Charles Sutton (University of Edinburgh)
Abstract:

In representation learning, it is often desirable to learn features at different levels of scale. For example, in image data, some edges will span only a few pixels, whereas others will span a large portion of the image. We introduce an unsupervised representation learning method called a composite denoising autoencoder (CDA) to address this. We exploit the observation from previous work that in a denoising autoencoder, training with lower levels of noise results in more specific, fine-grained features. In a CDA, different parts of the network are trained with different versions of the same input, corrupted at different noise levels. We introduce a novel cascaded training procedure which is designed to avoid types of bad solutions that are specific to CDAs. We show that CDAs learn effective representations on two different image data sets.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46128-1_43

Consistency of Probabilistic Classifier Trees
Krzysztof Dembczynski, Wojciech Kotlowski, Willem Waegeman, Robert Busa-Fekete, Eyke Huellermeier

Room: 1000A
2016-09-22; 15:10 - 15:30
Authors:
- Krzysztof Dembczynski (Poznan University of Technology)
- Wojciech Kotlowski (Poznan University of Technology)
- Willem Waegeman (Universiteit Gent)
- Robert Busa-Fekete (Paderborn University)
- Eyke Huellermeier (Universitat Paderborn)
Abstract:

Label tree classifiers are commonly used for efficient multi-class and multi-label classification. They represent a predictive model in the form of a tree-like hierarchy of (internal) classifiers, each of which is trained on a simpler (often binary) subproblem, and predictions are made by (greedily) following these classifiers' decisions from the root to a leaf of the tree. Unfortunately, this approach does normally not assure consistency for different losses on the original prediction task, even if the internal classifiers are consistent for their subtask. In this paper, we thoroughly analyze a class of methods referred to as probabilistic classifier trees (PCT). Thanks to training probabilistic classifiers at internal nodes of the hierarchy, these methods allow for searching the tree-structure in a more sophisticated manner, thereby producing predictions of a less greedy nature. Our main result is a regret bound for 0/1 loss, which can easily be extended to ranking-based losses. In this regard, PCT nicely complements a related approach called filter trees (FT), and can indeed be seen as a natural alternative thereof. We compare the two approaches both theoretically and empirically.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46227-1_32

Continuous Kernel Learning
John Moeller, Vivek Srikumar, Sarathkrishna Swaminathan, Suresh Venkatasubramanian, Dustin Webb

Room: 1000A
2016-09-20; 17:00 - 17:20
Authors:
- John Moeller (University of Utah)
- Vivek Srikumar (University of Utah)
- Sarathkrishna Swaminathan (University of Utah)
- Suresh Venkatasubramanian (University of Utah)
- Dustin Webb (University of Utah)
Abstract:

Kernel learning is the problem of determining the best kernel (either from a dictionary of fixed kernels, or from a smooth space of kernel representations) for a given task. In this paper, we describe a new approach to kernel learning that establishes connections between the Fourier-analytic representation of kernels arising out of Bochner's theorem and a specific kind of feed-forward network using cosine activations. We analyze the complexity of this space of hypotheses and demonstrate empirically that our approach provides scalable kernel learning superior in quality to prior approaches.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46227-1_41

Copula PC Algorithm for Causal Discovery from Mixed Data
Ruifei Cui, Perry Groot, Tom Heskes

Room: 1000B
2016-09-20; 17:20 - 17:40
Authors:
- Ruifei Cui (Radboud University Nijmegen)
- Perry Groot
- Tom Heskes
Abstract:

We propose the 'Copula PC' algorithm for causal discovery from a combination of continuous and discrete data, assumed to be drawn from a Gaussian copula model. It is based on a two-step approach. The first step applies Gibbs sampling on rank-based data to obtain samples of correlation matrices. These are then translated into an average correlation matrix and an effective number of data points, which in the second step are input to the standard PC algorithm for causal discovery. A stable version naturally arises when rerunning the PC algorithm on different Gibbs samples. Our 'Copula PC' algorithm extends the 'Rank PC' algorithm, which has been designed for Gaussian copula models for purely continuous data. In simulations, 'Copula PC' indeed outperforms 'Rank PC' in cases with mixed variables, in particular for larger numbers of data points, at the expense of a slight increase in computation time.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46227-1_24

Reproducible research
Cost-aware early classification of time series
Romain Tavenard, Simon Malinowski

Room: 300B
2016-09-22; 12:00 - 12:20
Authors:
- Romain Tavenard (LETG-Rennes COSTEL /IRISA - Univ. Rennes 2)
- Simon Malinowski (IRISA/ Univ. Rennes 1)
Abstract:

In time-series classification, two antagonist notions are at stake. On the one hand, in most cases, the sooner the time series is classified, the more rewarding. On the other hand, an early classification is more likely to be erroneous. Most of the early classification methods have been designed to take a decision as soon as sufficient level of accuracy is reached. However, in many applications, delaying the decision can be costly. Recently, a framework dedicated to optimizing a trade-off between classification accuracy and the cost of delaying the decision was proposed in [3], together with an algorithm that decides online the optimal time instant to classify an incoming time series. On top of this framework, we build in this paper two different early classification algorithms that optimize a trade-off between decision accuracy and the cost of delaying the decision. As in [3], these algorithms are non-myopic in the sense that, even when classification is delayed, they can provide an estimate of when the optimal classification time is likely to occur. Our experiments on real datasets demonstrate that the proposed approaches are more robust than that of [3].
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46128-1_40
Download links: Code Data

Coupled Hierarchical Dirichlet Process Mixtures for Simultaneous Clustering and Topic Modeling
Masamichi Shimosaka, Takeshi Tsukiji, Shoji Tominaga, Kota Tsubouchi

Room: 1000A
2016-09-21; 12:30 - 12:50
Authors:
- Masamichi Shimosaka (Tokyo Institute of Technology)
- Takeshi Tsukiji (The University of Tokyo)
- Shoji Tominaga
- Kota Tsubouchi (Yahoo Japan Research)
Abstract:

In this paper, we propose a nonparametric Bayesian mixture model that simultaneously optimizes the topic extraction and group clustering while allowing all topics to be shared by all clusters for grouped data. In addition, in order to enhance the computational efficiency on par with today’s large-scale data, we formulate our model so that it can use a closed-form variational Bayesian method to approximately calculate the posterior distribution. Experimental results with corpus data show that our model has a better performance than existing models, achieving 22% improvement against state-of-the-art model. Moreover, an experiment with location data from mobile phones shows that our model performs well in the field of big data analysis.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46227-1_15

Credible Review Detection with Limited Information using Consistency Features
Subhabrata Mukherjee, Sourav Dutta, Gerhard Weikum

Room: 1000A
2016-09-22; 16:40 - 17:00
Authors:
- Subhabrata Mukherjee (Max Planck Informatics)
- Sourav Dutta (MPI-INF)
- Gerhard Weikum (Max Planck Institute for Informatics, Germany)
Abstract:

Online reviews provide viewpoints on the strengths and shortcomings of products/services, possibly influencing customers’ purchase decisions. However, there is an increasing proportion of non-credible reviews — either fake (promoting/demoting an item), incompetent (involving irrelevant aspects), or biased — entailing the problem of identifying credible reviews. Prior works involve classifiers harnessing information about items and users, but fail to provide interpretability as to why a review is deemed non-credible. This paper presents a novel approach to address the above issues. We utilize latent topic models leveraging review texts, item ratings, and timestamps to derive consistency features without relying on item/user histories, unavailable for “long tail” items/users. We develop models, for computing review credibility scores to provide interpretable evidence for non-credible reviews, that are also transferable to other domains — addressing the scarcity of labeled data. Experiments on real review datasets demonstrate improvements over state-of-the-art baselines.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46227-1_13

Reproducible research
Deep Metric Learning with Data Summarization
Wenlin Wang, Changyou Chen, Wenlin Chen, Lawrence Carin

Room: 1000A
2016-09-20; 12:40 - 13:00
Authors:
- Wenlin Wang (Duke University)
- Changyou Chen (Duke University)
- Wenlin Chen (Washington University in St. Louis)
- Lawrence Carin (Duke University)
Abstract:

We propose Deep Stochastic Neighbor Compression (DSNC), a data compression algorithm that learns a subset of compressed data in the feature space induced by a deep neural network. The learned compressed data is a small representation of the entire data, enabling the traditional kNN for fast testing and significantly boosted test accuracy. We conduct comprehensive empirical evalua- tion on several benchmark datasets and show that DSNC achieves unprecedented test accuracy and compression ratio compared to related methods.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46128-1_49
Download links: Code

Reproducible research
Discovering Topically- and Temporally-Coherent Events in Interaction Networks
Han Xiao, Polina Rozenshtein, Aristides Gionis

Room: 1000A
2016-09-22; 11:20 - 11:40
Authors:
- Han Xiao (Aalto University)
- Polina Rozenshtein
- Aristides Gionis (Aalto University)
Abstract:

With the increasing use of online communication platforms, such as email, twitter, and messaging applications, we are faced with a growing amount of data that combine content(what is said), time(when), and user(by whom) information. An important computational challenge is to analyze these data, discover meaningful patterns, and understand what is happening. We consider the problem of mining online communication data and find top-k temporal events. An event is a topic that is discussed frequently, in a relatively short time span, while the information ow respects the underlying network structure. We construct our model for detecting temporal events in two steps. We first introduce the notion interaction meta-graph, which connects associated interactions. Using this notion, we define a temporal event to be a subset of interactions that (i) are topically and temporally close and (ii) correspond to tree that captures the information ow. The problem of finding the best temporal event leads to budget version of prize-collecting Steiner-tree (PCST) problem, which we solve using three di erent methods: a greedy approach, a dynamic-programming algorithm, and an adaptation to an existing approximation algorithm. The problem of finding the top-k events among a set of candidate events maps to maximum set-cover problem, and thus, solved by greedy. We compare and analyze our algorithms in both synthetic and real datasets, such as twitter and email communication. The results show that our methods are able to detect meaningful temporal events.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46227-1_43
Download links: Code

Reproducible research
Efficient Discovery of Sets of Co-occurring Items in Event Sequences
Boris Cule, Len Feremans, Bart Goethals

Room: 300B
2016-09-22; 15:30 - 15:50
Authors:
- Boris Cule (University of Antwerp)
- Len Feremans (University of Antwerp)
- Bart Goethals (University of Antwerp)
Abstract:

Discovering patterns in long event sequences is an important data mining task. Most existing work focuses on frequency-based quality measures that allow algorithms to use the anti-monotonicity property to prune the search space and efficiently discover the most frequent patterns. In this work, we step away from such measures, and evaluate patterns using cohesion --- a measure of how close to each other the items making up the pattern appear in the sequence on average. We tackle the fact that cohesion is not an anti-monotonic measure by developing a novel pruning technique in order to reduce the search space. By doing so, we are able to efficiently unearth rare, but strongly cohesive, patterns that existing methods often fail to discover.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46128-1_23
Download links: Code Data

Reproducible research
Efficient Distributed Decision Trees for Robust Regression
Tian Guo, Konstantin Kutzkov, mohamed Ahmed, jean-paul Calbimonte, Karl Aberer

Room: 300B
2016-09-20; 11:00 - 11:20
Authors:
- Tian Guo (EPFL)
- Konstantin Kutzkov
- mohamed Ahmed (NEC Lab, Europe)
- jean-paul Calbimonte (EPFL)
- Karl Aberer (EPFL)
Abstract:

The availability of massive volumes of data and recent advances in data collection and processing platforms have motivated the development of distributed machine learning algorithms. In numerous real-world applications large datasets are inevitably noisy and contain outliers. These outliers can dramatically degrade the performance of standard machine learning approaches such as regression trees. To this end, we present a novel distributed regression tree approach that utilizes robust regression statistics, statistics that are more robust to outliers, for handling large and noisy data. We propose to integrate robust statistics based error criteria into the regression tree. A data summarization method is developed and used to improve the efficiency of learning regression trees in the distributed setting. We implemented the proposed approach and baselines based on Apache Spark, a popular distributed data processing platform. Extensive experiments on both synthetic and real datasets verify the effectiveness and efficiency of our approach.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46227-1_6
Download links: Code Data

Enhancing Traffic Congestion Estimation with Social Media by Coupled Hidden Markov Model
Senzhang Wang, Fengxiang Li, Leon Stenneth, Philip Yu

Room: 300B
2016-09-20; 17:20 - 17:40
Authors:
- Senzhang Wang (Beihang University)
- Fengxiang Li (Peking University)
- Leon Stenneth (Nokia's HERE Connected Driving)
- Philip Yu (University of Illinois at Chicago)
Abstract:

Estimating traffic conditions in arterial networks with probe data is a practically important while substantially challenging problem. Limited by the lack of reliability and low sampling frequency of GPS probe, probe data are usually not sufficient for fully estimating traffic conditions for a large arterial network. For the first time, this paper studies how to explore social media as an auxiliary data source and incorporate it with probe data to enhance traffic congestion estimation. Motivated by the increasingly available traffic information in Twitter, we first extensively collect tweets that report various traffic events such as congestion, accident, and road construction. Next we propose an extended Coupled Hidden Markov Model to estimate traffic conditions of an arterial network by effectively combining the two types of data. To address the computational challenge, a sequential importance sampling based EM algorithm is also given. Evaluations on the arterial network of Chicago demonstrate the effectiveness of the proposed method.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46227-1_16

Exploring a mixed representation for encoding Temporal Coherence
Jon Parkinson, Ubai Sandouk, Ke Chen

Room: 1000B
2016-09-22; 17:20 - 17:40
Authors:
- Jon Parkinson (The University of Manchester)
- Ubai Sandouk (University of Manchester)
- Ke Chen (University of Manchester)
Abstract:

Guiding representation learning towards temporally stable features improves object identity encoding from video. Existing models have applied temporal coherence uniformly over all features based on the assumption that optimal object identity encoding only requires temporally stable components. We explore the effects of mixing temporally coherent ‘invariant’ features alongside ‘variable’ features in a single representation. Applying temporal coherence to different proportions of available features, we introduce a mixed representation autoencoder. Trained on several datasets, model outputs were passed to an object classification task to compare performance. Whilst the inclusion of temporal coherence improved object identity recognition in all cases, the majority of tests favoured a mixed representation.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46227-1_22

Reproducible research
F-Measure Maximization in Multi-Label Classification with Conditionally Independent Label Subsets
Maxime Gasse, Alex Aussem

Room: 1000A
2016-09-20; 14:50 - 15:10
Authors:
- Maxime Gasse (Université Lyon 1)
- Alex Aussem (Université Lyon 1)
Abstract:

We discuss a method to improve the exact F-measure maximization algorithm called GFM, proposed in \autocite{conf/nips/DembczynskiWCH11} for multi-label classification, assuming the label set can be can partitioned into conditionally independent subsets given the input features. If the labels were all independent, the estimation of only $m$ parameters ($m$ denoting the number of labels) would suffice to derive Bayes-optimal predictions in $O(m^2)$ operations~\autocite{conf/icml/NanCLC12}. In the general case, $m^2 + 1$ parameters are required by GFM, to solve the problem in $O(m^3)$ operations. In this work, we show that the number of parameters can be reduced further to $m^2/n$, in the best case, assuming the label set can be partitioned into $n$ conditionally independent subsets. As this label partition needs to be estimated from the data beforehand, we use first the procedure proposed in~\autocite{conf/icml/GasseAE15} that finds such partition and then infer the required parameters locally in each label subset. The latter are aggregated and serve as input to GFM to form the Bayes-optimal prediction. We show on a synthetic experiment that the reduction in the number of parameters brings about significant benefits in terms of performance.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46128-1_39
Download links: Code

Fast Hoeffding Drift Detection Method for Evolving Data Streams
Ali Pesaranghader, Herna Viktor

Room: 300B
2016-09-22; 11:20 - 11:40
Authors:
- Ali Pesaranghader (University of Ottawa)
- Herna Viktor
Abstract:

Decision makers increasingly require near-instant models to make sense of fast evolving data streams. Learning from such evolutionary environments is, however, a challenging task. This challenge is partially due to the fact that the distribution of data often changes over time, thus potentially leading to degradation in the overall performance. In particular, classification algorithms need to adapt their models after facing such distributional changes (also referred to as concept drifts). Usually, drift detection methods are utilized in order to accomplish this task. It follows that detecting concept drifts as soon as possible, while resulting in fewer false positives and false negatives, is a major objective of drift detectors. To this end, we introduce the Fast Hoeffding Drift Detection Method (FHDDM) which detects the drift points using a sliding window and Hoeffding's inequality. FHDDM detects a drift when a significant difference between the maximum probability of correct predictions and the most recent probability of correct predictions is observed. Experimental results confirm that FHDDM detects drifts with less detection delays, less false positives and less false negatives, when compared to the state-of-the-art.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46227-1_7

Reproducible research
Functional Bid Landscape Forecasting for Display Advertising
Yuchen Wang, Kan Ren, Weinan Zhang, Jun Wang, Yong Yu

Room: 300B
2016-09-20; 17:00 - 17:20
Authors:
- Yuchen Wang (Shanghai Jiao Tong University)
- Kan Ren (Shanghai Jiao Tong University)
- Weinan Zhang (University College London)
- Jun Wang (UCL)
- Yong Yu (Shanghai Jiao Tong University)
Abstract:

Real-time auction has become an important online advertising trading mechanism. A crucial issue for advertisers is to model the market competition, i.e., bid landscape forecasting. It is formulated as predicting the market price distribution for each ad auction provided by its side information. Existing solutions mainly focus on parameterized heuristic forms of the market price distribution and learn the parameters to fit the data. In this paper, we present a functional bid landscape forecasting method to automatically learning the function mapping from each ad auction features to the market price distribution without any assumption about the functional form. Specifically, to deal with the categorical feature input, we propose a novel decision tree model with a node splitting scheme by attribute value clustering. Furthermore, to deal with the problem of right-censored market price observations, we propose to incorporate a survival model into tree learning and prediction, which largely reduces the model bias. The experiments on real-world data demonstrate that our models achieve substantial performance gains over previous work in various metrics.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46128-1_8
Download links: Code

Gaussian Process Pseudo-Likelihood Models for Sequence Labeling
Srijith P.K., Balamurugan P, Shirish Shevade

Room: 1000B
2016-09-20; 17:00 - 17:20
Authors:
- Srijith P.K. (University of Sheffield)
- Balamurugan P (INRIA-ENS, Paris)
- Shirish Shevade
Abstract:

Several machine learning problems arising in natural language processing can be modelled as a sequence labelling problem. Gaussian processes (GPs) provide a Bayesian approach to learning such problems in a kernel based framework. We propose Gaussian process models based on a pseudo-likelihood to solve sequence labelling problems. The pseudo-likelihood model enables one to capture multiple dependencies among the output components of the sequence without becoming computationally intractable. We use an efficient variational Gaussian approximation method to perform inference in the proposed model. We also provide an iterative algorithm which can effectively make use of the information from the neighbouring labels to perform prediction. The ability to capture multiple dependencies makes the proposed approach useful for a wide range of sequence labelling problems. Numerical experiments on some sequence labelling problems in natural language processing demonstrate the usefulness of the proposed approach.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46128-1_14

Reproducible research
Graph-Margin Based Multi-Label Feature Selection
Peng Yan, Yun Li

Room: 300A
2016-09-22; 15:10 - 15:30
Authors:
- Peng Yan (NJUPT)
- Yun Li (NJUPT)
Abstract:

Since instances in multi-label problems are associated with several labels simultaneously, most traditional feature selection algorithms for single label problems are inapplicable. Therefore, new criteria to evaluate features and new methods to model label correlations are needed. In this paper, we adopt the graph model to capture the label correlation, and propose a feature selection algorithm for multi-label problems according to the graph combining with the large margin theory. The proposed multi-label feature selection algorithm GMBA can efficiently utilize the high order label correlation. Experiments on real world data sets demonstrate the effectiveness of the proposed method.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46128-1_34
Download links: Code

Graphical Model Sketch
Branislav Kveton, Hung Bui, Mohammad Ghavamzadeh, Georgios Theocharous, S Muthukrishnan, Siqi Sun

Room: 1000A
2016-09-21; 16:40 - 17:00
Authors:
- Branislav Kveton (Adobe Research, USA)
- Hung Bui (Adobe Research, USA)
- Mohammad Ghavamzadeh
- Georgios Theocharous (Adobe Research, USA)
- S Muthukrishnan
- Siqi Sun (TTIC)
Abstract:

Structured high-cardinality data arises in many domains and poses a major challenge for both modeling and inference. Graphical models are a popular approach to modeling structured data but they are not suitable for high-cardinality variables. The count-min (CM) sketch is a popular approach to estimating probabilities in high-cardinality data but it does not scale well beyond a few variables. In this paper, we bring together graphical models and count sketches, and address the problem of estimating probabilities in structured high-cardinality data. We view these data as a stream $(x^{(t)})_{t = 1}^n$ of $n$ observations from an unknown distribution $P$, where $x^{(t)} \in [M]^K$ is a $K$-dimensional vector and $M$ is the cardinality of its entries, which is very large. Suppose that the graphical model $\mathcal{G}$ of $P$ is known, and let $\bar{P}$ be the maximum-likelihood estimate (MLE) of $P$ from $(x^{(t)})_{t = 1}^n$ conditioned on $\mathcal{G}$. We design and analyze algorithms that approximate any $\bar{P}$ with $\hat{P}$, such that $\hat{P}(x) \approx \bar{P}$ for any $x \in [M]^K$ with at least $1 - \delta$ probability, in the space independent of $M$. The key idea of our approximations is to use the structure of $\mathcal{G}$ and approximately estimate its factors by ``sketches''. The sketches hash high-cardinality variables using random projections. Our approximations are computationally and space efficient, being independent of $M$. Our error bounds are multiplicative and significantly improve over those of the CM sketch, a state-of-the-art approach to estimating the frequency of values in streams. We evaluate our algorithms on synthetic and real-world problems, and report an order of magnitude improvements over the CM sketch.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46128-1_6

Huber-Norm Regularization for Linear Prediction Models
Oleksandr Zadorozhnyi, Gundhart Benecke, Stephan Mandt, Tobias Scheffer, Marius Kloft

Room: 1000A
2016-09-20; 17:40 - 18:00
Authors:
- Oleksandr Zadorozhnyi (University of Potsdam)
- Gundhart Benecke (Uni Potsdam)
- Stephan Mandt (Columbia University)
- Tobias Scheffer (Uni Potsdam)
- Marius Kloft (HU Berlin)
Abstract:

In order to avoid overfitting, it is a common practice to regularize linear classification models using squared or absolute-value norms as regularization terms. In our article we consider a new method of regularization in the SVM framework: Huber-norm regularization imposes a mixture of $\ell_1$ and $\ell_2$-norm regularization on the features. We derive the dual optimization problem, prove an upper bound on the statistical risk of the model class by means of Rademacher Complexity and establish a simple type of oracle inequality on the optimality of the decision rule. Empirically, we observe that the Huber-norm regularizer outperforms $\ell_1$-norm, $\ell_2$-norm, and elastic-net regularization for a wide range of benchmark data sets.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46128-1_45

Improving Locality Sensitive Hashing Based Similarity Search and Estimation for Kernels
Aniket Chakrabarti, Bortik Bandyopadhyay, Srinivasan Parthasarathy

Room: 1000A
2016-09-20; 16:40 - 17:00
Authors:
- Aniket Chakrabarti (The Ohio State University)
- Bortik Bandyopadhyay (The Ohio State University)
- Srinivasan Parthasarathy
Abstract:

We present a novel data embedding that signi cantly reduces the estimation error of locality sensitive hashing (LSH) technique when used in reproducing kernel Hilbert space (RKHS). Efficient and accurate kernel approximation techniques either involve the kernel principal component analysis (KPCA) approach or the Nyström approximation method. In this work we show that extant LSH methods in this space su er from a bias problem, that moreover is difficult to estimate apriori. Consequently, the LSH estimate of a kernel is different from that of the KPCA/Nyström approximation. We provide theoretical rationale for this bias, which is also confirmed empirically. We propose an LSH algorithm that can reduce this bias and consequently our approach can match the KPCA or the Nyström methods' estimation accuracy while retaining the traditional benefits of LSH. We evaluate our algorithm on a wide range of realworld image datasets (for which kernels are known to perform well) and show the efficacy of our algorithm using a variety of principled evaluations including mean estimation error, KL divergence and the Kolmogorov-Smirnov test.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46227-1_40

Infection hot spot mining from social media trajectories
Roberto Souza, Renato Assunção, Derick Oliveira, Denise Brito, Meira Wagner

Room: 300A
2016-09-20; 11:00 - 11:20
Authors:
- Roberto Souza (UFMG)
- Renato Assunção (UFMG)
- Derick Oliveira (UFMG)
- Denise Brito (UFMG)
- Meira Wagner (Federal University of Minas Gerais)
Abstract:

Traditionally, in health surveillance, high risk zones are identified based only on the residence address or the working place of diseased individuals. This provides little information about the places where people are infected, the truly important information for disease control. The recent availability of spatial data generated by geotagged social media posts offers a unique opportunity: by identifying and following diseased individuals, we obtain a collection of sequential geo-located events, each sequence being issued by a social media user. The sequence of map positions implicitly provides an estimation of the users' social trajectories as they drift on the map. The existing data mining techniques for spatial cluster detection fail to address this new setting as they require a single location to each individual under analysis. In this paper we present two stochastic models with their associated algorithms to mine this new type of data. The Visit Model finds the most likely zones that a diseased person visits, while the Infection Model finds the most likely zones where a person gets infected while visiting. We demonstrate the applicability and effectiveness of our proposed models by applying them to more than 100 million geotagged tweets from Brazil in 2015. In particular, we target the identification of infection hot spots associated with dengue, a mosquito-transmitted disease that affects millions of people in Brazil annually, and billions worldwide. We applied our algorithms to data from 11 large cities in Brazil and found infection hot spots, showing the usefulness of our methods for disease surveillance.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46227-1_46

Interactive Learning from Multiple Noisy Labels
Shankar Vembu, Sandra Zilles

Room: 1000B
2016-09-22; 15:50 - 16:10
Authors:
- Shankar Vembu (University of Toronto)
- Sandra Zilles (University of Regina)
Abstract:

Interactive learning is a process in which a machine learning algorithm is provided with meaningful, well-chosen examples as opposed to randomly chosen examples typical in standard supervised learning. In this paper, we propose a new method for interactive learning from multiple noisy labels where we exploit the disagreement among annotators to quantify the easiness (or meaningfulness) of an example. We demonstrate the usefulness of this method in estimating the parameters of a latent variable classification model, and conduct experimental analyses on a range of synthetic and benchmark data sets. Furthermore, we theoretically analyze the performance of perceptron in this interactive learning framework.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46128-1_31

Reproducible research
Interactive Visual Data Exploration with Subjective Feedback
Kai Puolamaki, Bo Kang, Jefrey Lijffijt, Tijl De Bie

Room: 1000A
2016-09-21; 10:50 - 11:10
Authors:
- Kai Puolamaki (Finnish Institute of Occupational Health)
- Bo Kang (Ghent University)
- Jefrey Lijffijt (Ghent University)
- Tijl De Bie (Ghent University)
Abstract:

Data visualization and iterative/interactive data mining are growing rapidly in attention, both in research as well as in industry. However, integrated methods and tools that combine advanced visualization and data mining techniques are rare, and those that exist are often specialized to a single problem or domain. In this paper, we introduce a novel generic method for interactive visual exploration of high-dimensional data. In contrast to most visualization tools, it is not based on the traditional dogma of manually zooming and rotating data. Instead, the tool initially presents the user with an `interesting' projection of the data and then employs data randomization with constraints to allow users to flexibly and intuitively express their interests or beliefs using visual interactions that correspond to exactly defined constraints. These constraints expressed by the user are then taken into account by a projection-finding algorithm to compute a new `interesting' projection, a process that can be iterated until the user runs out of time or finds that constraints explain everything she needs to find from the data. We present the tool by means of two case studies, one controlled study on synthetic data and another on real census data.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46227-1_14
Download links: Code Data

Reproducible research
Is Attribute-Based Zero-Shot Learning an Ill-Posed Strategy?
Ibrahim Alabdulmohsin, Moustapha Cisse, Xiangliang Zhang

Room: 1000A
2016-09-22; 15:30 - 15:50
Authors:
- Ibrahim Alabdulmohsin (KAUST)
- Moustapha Cisse (Facebook Artificial Intelligence Research (FAIR))
- Xiangliang Zhang (KAUST)
Abstract:

One transfer learning approach that has gained a wide popularity lately is attribute-based zero-shot learning. Its goal is to learn novel classes that were never seen during the training stage. The classical route towards realizing this goal is to incorporate a prior knowledge, in the form of a semantic embedding of classes, and to learn to predict classes indirectly via their semantic attributes. Despite the amount of research devoted to this subject lately, no known algorithm has yet reported a predictive accuracy that could exceed the accuracy of supervised learning with very few training examples. For instance, the direct attribute prediction (DAP) algorithm, which forms a standard baseline for the task, is known to be as accurate as supervised learning when as few as two examples from each hidden class are used for training on some popular benchmark datasets! In this paper, we argue that this lack of significant results in the literature is not a coincidence; attribute-based zero-shot learning is fundamentally an ill-posed strategy. The key insight is the observation that the mechanical task of predicting an attribute is, in fact, quite different from the epistemological task of learning the "correct meaning" of the attribute itself. This renders attribute-based zero-shot learning fundamentally ill-posed. In more precise mathematical terms, attribute-based zero-shot learning is equivalent to the mirage goal of learning with respect to one distribution of instances, with the hope of being able to predict with respect to any arbitrary distribution. We demonstrate this overlooked fact on some synthetic and real datasets.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46128-1_47
Download links: Code

Laplacian Hamiltonian Monte Carlo
Yizhe Zhang, Changyou Chen, Ricardo Henao, Lawrence Carin

Room: Belvedere
2016-09-22; 16:40 - 17:00
Authors:
- Yizhe Zhang (Duke university)
- Changyou Chen (Duke university)
- Ricardo Henao (Duke university)
- Lawrence Carin (Duke University)
Abstract:

We demonstrate the connection between slice sampling and Hamiltonian Monte Carlo (HMC) with double-exponential kinetic energy using Hamiltonian Jacobi equation. Based on this connection, we propose an approach to perform slice sampling using a numerical integrator, in an HMC fashion. This makes slice sampling in high-dimensional space more feasible. We provide theoretical analysis on the performance of such sampler in several univariate cases. Furthermore, the proposed approach extends the standard HMC by enabling sampling from discrete distributions. We compared our method with standard HMC on both synthetic and real data, and discuss its limitations and potential improvements.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46128-1_7

Learning Distributed Representations of Users for Source Detection in Online Social Networks
Simon Bourigault, Sylvain Lamprier, Patrick Gallinari

Room: 1000A
2016-09-22; 17:40 - 18:00
Authors:
- Simon Bourigault (UPMC-LIP6)
- Sylvain Lamprier
- Patrick Gallinari (LIP6)
Abstract:

In this paper, we study the problem of source detection in the context of information diffusion through online social networks. We propose a representation learning approach that leads to a robust model able to deal with the sparsity of the data. From learned continuous projections of the users, our approach is able to efficiently predict the source of any newly observed diffusion episode. Our model does not rely neither on a known diffusion graph nor on a hypothetical probabilistic diffusion law, but directly infers the source from diffusion episodes. It is also less complex than alternative state of the art models. It showed good performances on artificial and real-world datasets, compared with various state of the art baselines.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46227-1_17

Learning Efficiently in Semantic Based Regularization
Vincenzo Scoca, Michelangelo Diligenti, Marco Gori

Room: 1000B
2016-09-22; 14:50 - 15:10
Authors:
- Vincenzo Scoca (IMT)
- Michelangelo Diligenti (Universita' di Siena)
- Marco Gori (University of Siena)
Abstract:

Semantic Based Regularization (SBR) is a general framework to integrate semi-supervised learning with the application specific background knowledge, which is assumed to be expressed as a collection of first-order logic (FOL) clauses. While SBR has been proved to be a useful tool in many applications, the underlying learning task often requires to solve an optimization problem that has been empirically observed to be challenging. Heuristics and experience to achieve good results are therefore the key to success in the application of SBR. The main contribution of this paper is to study why and when training in SBR is easy. In particular, this paper shows that exists a large class of prior knowledge that can be expressed as convex constraints, which can be exploited during training in a very efficient and effective way. This class of constraints provides a natural way to break the complexity of learning by building a training plan that uses the convex constraints as an effective initialization step for the final full optimization problem. Whereas previous published results on SBR have employed Kernel Machines to approximate the underlying unknown predicates, this paper employs Neural Networks for the first time, showing the flexibility of the framework. The experimental results show the effectiveness of the training plan on categorization of real world images.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46227-1_3

Learning to Aggregate using Uninorms
Vitalik Melnikov, Eyke Huellermeier

Room: 1000A
2016-09-22; 14:50 - 15:10
Authors:
- Vitalik Melnikov (Paderborn University)
- Eyke Huellermeier (Universitat Paderborn)
Abstract:

In this paper, we propose a framework for a class of learning problems that we refer to as "learning to aggregate''. Roughly, learning-to-aggregate problems are supervised machine learning problems, in which instances are represented in the form of a composition of a (variable) number on constituents; such compositions are associated with an evaluation, score, or label, which is the target of the prediction task, and which can presumably be modeled in the form of a suitable aggregation of the properties of its constituents. Our learning-to-aggregate framework establishes a close connection between machine learning and a branch of mathematics devoted to the systematic study of aggregation functions. We specifically focus on a class of functions called uninorms, which combine conjunctive and disjunctive modes of aggregation. Experimental results for a corresponding model are presented for a review data set, for which the aggregation problem consists of combining different reviewer opinions about a paper into an overall decision of acceptance or rejection.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46227-1_47

Linear Bandits in Unknown Environments
Thibault Gisselbrecht, Sylvain Lamprier, Patrick Gallinari

Room: 300A
2016-09-22; 16:40 - 17:00
Authors:
- Thibault Gisselbrecht (LIP6)
- Sylvain Lamprier
- Patrick Gallinari (LIP6)
Abstract:

In contextual bandit problems, an agent has to choose an action among a bigger set of available ones at each decision step, according to features observed on them. The goal is to define a decision strategy that maximizes the cumulative reward of actions over time. We focus on the specific case where the features of each action correspond to some kind of a constant profile, which can be used to determine its intrinsic utility for the task in concern. If there exists an unknown linear application that allows rewards to be mapped from profiles, this can be leveraged to greatly improve the exploitation-exploration trade-off of stationary stochastic methods like UCB. In this paper, we consider the case where action profiles are unknown beforehand. Instead, the agent only observes sample vectors, with mean equal to the true profiles, for a subset of actions at each decision step. We propose a new algorithm, called SampLinUCB, and derive a finite time high probability upper bound on its regret. We also provide numerical experiments on a task of focused data capture from online social networks.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46227-1_18

Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition
Hamed Karimi, Julie Nutini, Mark Schmidt

Room: 1000B
2016-09-20; 14:50 - 15:10
Authors:
- Hamed Karimi (University of British Columbia)
- Julie Nutini (University of British Columbia)
- Mark Schmidt (University of British Columbia)
Abstract:

In 1963, Polyak proposed a simple condition that is sufficient to show a global linear convergence rate for gradient descent. This condition is a special case of the Lojasiewicz inequality proposed in the same year, and it does not require strong convexity (or even convexity). In this work, we show that this much-older Polyak-Lojasiewicz (PL) inequality is actually weaker than the five main conditions that have been explored to show linear convergence rates without strong convexity over the last 25 years. We also use the PL inequality to give new analyses of coordinate descent and stochastic gradient for many non-strongly-convex (and some non-convex) functions. We further propose a generalization that applies to proximal-gradient methods for non-smooth optimization, leading to simple proofs of linear convergence for support vector machines and L1-regularized least squares without additional assumptions.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46128-1_50

Reproducible research
Link Prediction in Dynamic Networks Using Graphlet
Mahmudur Rahman, Mohammad Al Hasan

Room: 1000A
2016-09-22; 12:20 - 12:40
Authors:
- Mahmudur Rahman (IUPUI)
- Mohammad Al Hasan (Indiana University - Purdue Universitye)
Abstract:

Predicting the link state of a network at a future time given a collection of link states at earlier time is an important task with many real-life applications. In existing literature this task is known as link prediction in dynamic networks. Solving this task is more difficult than its counterpart in static networks because an effective feature representation of node-pair instances for the case of dynamic network is hard to obtain. In this work we solve this problem by designing a novel graphlet transition based feature representation of the node-pair instances of a dynamic network. We propose a method GraTFEL which uses unsupervised feature learning methodologies on graphlet transition based features to give a low-dimensional feature representation of the node-pair instances. GraTFEL models the feature learning task as an optimal coding task where the objective is to minimize the reconstruction error, and it solves this optimization task by using a gradient descent method. We validate the effectiveness of the learned feature representations by utilizing it for link prediction in real-life dynamic networks. Specifically, we show that GraTFEL, which use the extracted feature representation of graphlet transition events, outperforms existing methods that use well-known link prediction features.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46128-1_25
Download links: Code Data

Local Roots: A Tree-based Subgoal Discovery Method to Accelerate Reinforcement Learning
Erkin Cilden, Alper Demir, Faruk Polat

Room: 1000B
2016-09-22; 11:40 - 12:00
Authors:
- Erkin Cilden (Middle East Technical Univ.)
- Alper Demir (Middle East Technical University)
- Faruk Polat (Middle East Technical University)
Abstract:

Subgoal discovery in reinforcement learning is an effective way of partitioning a problem domain with large state space. Recent research mainly focuses on automatic identification of such subgoals during learning, making use of state transition information gathered during exploration. Mostly based on the options framework, an identified subgoal leads the learning agent to an intermediate region which is known to be useful on the way to goal. In this paper, we propose a novel automatic subgoal discovery method which is based on analysis of predicted shortcut history segments derived from experience, which are then used to generate useful options to speed up learning. Compared to similar existing methods, it performs significantly better in terms of time complexity and usefulness of the subgoals identified, without sacrificing solution quality. The effectiveness of the method is empirically shown via experimentation on various benchmark problems compared to well known subgoal identification methods.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46227-1_23

Reproducible research
M-Flash: Fast Billion-Scale Graph Computation Using a Bimodal Block Processing Model
Hugo Gualdron Colmenares, Robson Leonardo Ferreira Cordeiro, José Fernando Rodrigues Jr., Duen Horng Chau, Minsuk Kahng, U Kang

Room: 1000B
2016-09-20; 12:40 - 13:00
Authors:
- Hugo Gualdron Colmenares (University of Sao Paulo)
- Robson Leonardo Ferreira Cordeiro (University of São Paulo)
- José Fernando Rodrigues Jr. (University of São Paulo)
- Duen Horng Chau (Georgia Tech)
- Minsuk Kahng (Georgia Institute of Technology)
- U Kang (KAIST)
Abstract:

Recent graph computation approaches have demonstrated that a single PC can perform efficiently on billion-scale graphs. While these approaches achieve scalability by optimizing I/O operations, they do not fully exploit the capabilities of modern hard drives and processors. To overcome their performance, in this work, we explore a bimodal block processing strategy (BBP) that is able to boost the computation speed by minimizing I/O cost. With this strategy, we achieved the following contributions: (1) a scalable and general graph computation framework named M-Flash; (2) a flexible and simple programming model to easily implement popular and essential graph algorithms, including the first single-machine billion-scale eigensolver; and (3) extensive experiments on real graphs with up to 6.6 billion edges, demonstrating M-Flash's consistent and significant speedup over state-of-the-art approaches.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46227-1_39
Download links: Code

Reproducible research
M-Zoom: Fast Dense-Block Detection in Tensors with Quality Guarantees
Kijung Shin, Bryan Hooi, Christos Faloutsos

Room: 300A
2016-09-22; 11:00 - 11:20
Authors:
- Kijung Shin (Carnegie Mellon University)
- Bryan Hooi (Carnegie Mellon University)
- Christos Faloutsos (Carnegie MellonUniversity)
Abstract:

Given a large-scale and high-order tensor, how can we find dense blocks in it? Can we find them in near-linear time but with a quality guarantee? Extensive previous work has shown that dense blocks in tensors as well as graphs correspond to anomalous and potentially fraudulent behavior. However, available methods for detecting such dense blocks are not satisfactory in terms of speed, accuracy, or flexibility. In this work, we propose M-Zoom, a flexible framework for finding dense blocks in tensors, which works with a broad class of density metrics. M-Zoom has the following properties: (1) Scalable: M-Zoom scales linearly with all aspects of tensors and is up to 114× faster than state-of-the-art methods with similar accuracy. (2) Provably accurate: M-Zoom provides a guarantee on the lowest density of the blocks it finds. (3) Flexible: M-Zoom supports multi-block detection and size bounds as well as diverse density measures. (4) Effective: M-Zoom successfully detected edit wars and bot activities in Wikipedia, and spotted network attacks from a TCP dump with near-perfect accuracy (AUC=0.98).
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46128-1_17
Download links: Code Data

Maximizing Time-decaying Influence in Social Networks
Naoto Ohsaka, Yutaro Yamaguchi, Naonori Kakimura, Ken-ichi Kawarabayashi

Room: 1000A
2016-09-22; 17:00 - 17:20
Authors:
- Naoto Ohsaka (The University of Tokyo)
- Yutaro Yamaguchi (Osaka University)
- Naonori Kakimura (The University of Tokyo)
- Ken-ichi Kawarabayashi (National Institute of Informatics)
Abstract:

Influence maximization is a well-studied problem of finding a small set of highly influential individuals in a social network such that the spread of influence under a certain diffusion model is maximized. We propose new diffusion models that incorporate the time-decaying phenomenon by which the power of influence decreases with elapsed time. In standard diffusion models such as the independent cascade and linear threshold models, each edge in a network has a fixed power of influence over time. However, in practical settings, such as rumor spreading, it is natural for the power of influence to depend on the time influenced. We generalize the independent cascade and linear threshold models with time-decaying effects. Moreover, we show that by using an analysis framework based on submodular functions, a natural greedy strategy obtains a solution that is provably within (1-1/e) of optimal. In addition, we propose theoretically and practically fast algorithms for the proposed models. Experimental results show that the proposed algorithms are scalable to graphs with millions of edges and outperform baseline algorithms based on a state-of-the-art algorithm.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46128-1_9

Reproducible research
Measuring the Stability of Feature Selection
Sarah Nogueira, Gavin Brown

Room: 300A
2016-09-22; 15:50 - 16:10
Authors:
- Sarah Nogueira (University of Manchester)
- Gavin Brown (University of Manchester)
Abstract:

In feature selection algorithms, ``stability" is the sensitivity of the chosen feature set to variations in the supplied training data. As such it can be seen as an analogous concept to the statistical variance of a predictor. However unlike variance, there is no unique definition of stability, with numerous proposed measures over 15 years of literature. In this paper, instead of defining a new measure, we start from an axiomatic point of view and identify what properties would be desirable. Somewhat surprisingly, we find that the simple Pearson's correlation coefficient has all necessary properties, yet has somehow been overlooked in favour of more complex alternatives. Finally, we illustrate how the use of this measure in practice can provide better interpretability and more confidence in the model selection process.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46227-1_28
Download links: Code

Modeling Sequential Preferences with Dynamic User and Context Factors
Duc-Trong Le, Yuan Fang, Hady Lauw

Room: 300B
2016-09-22; 17:40 - 18:00
Authors:
- Duc-Trong Le (Singapore Management University)
- Yuan Fang (Institute for Infocomm Research)
- Hady Lauw (Singapore Management University)
Abstract:

Users express their preferences for items in diverse forms, through their liking for items, as well as through the sequence in which they consume items. The latter, referred to as ``sequential preference'', manifests itself in scenarios such as song or video playlists, topics one reads or writes about in social media, etc. The current approach to modeling sequential preferences relies primarily on the sequence information, i.e., which item follows another item. However, there are other important factors, due to either the user or the context, which may dynamically affect the way a sequence unfolds. In this work, we develop generative modeling of sequences, incorporating dynamic user-biased emission and context-biased transition for sequential preference. Experiments on publicly-available real-life datasets as well as synthetic data show significant improvements in accuracy at predicting the next item in a sequence.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46227-1_10

Multi-graph Clustering Based on Interior-Node Topology with Applications to Brain Networks
Guixiang Ma, Lifang He, Bokai Cao, Jiawei Zhang, Philip Yu

Room: 300A
2016-09-20; 12:20 - 12:40
Authors:
- Guixiang Ma (UIC)
- Lifang He (Shenzhen University)
- Bokai Cao (University of Illinois Chicago)
- Jiawei Zhang
- Philip Yu (University of Illinois at Chicago)
Abstract:

Learning from graph data has been attracting much attention recently due to its importance in many scientific applications, where objects are represented as graphs. In this paper, we study the problem of multi-graph clustering ({\ie}, clustering multiple graphs). Conventional multi-graph clustering methods either use the entire graph as feature or attempt to find subgraph patterns as features for clustering. The former one only considers the global structure of each graph, while ignoring the latent local structures that could also be important. The latter one only works well when the graph data set is very large or the access to side information is available. However, in real world there are often a limited number of graphs available under some application scenarios. Thus, a more feasible structure extraction measure of graphs should be found and appropriately employed for further learning tasks. In this paper, we propose a multi-graph clustering approach (MGCT) based on the interior-node topology of graph, which can effectively capture the topological structure of each graph and facilitate the multi-graph clustering task. Specifically, we extract the interior-node topological structure of each graph through a sparsity-inducing interior-node clustering, which can group more compact interior nodes into different clusters. We merge the interior-node clustering stage and the multi-graph clustering stage into a unified iterative framework, where the multi-graph clustering will influence the interior-node clustering and the updated interior-node clustering results will be further exerted on multi-graph clustering. By mutual reinforcement of the two clustering stages, an optimal multi-graph clustering will be obtained. We apply MGCT on two real brain network data sets (i.e., ADHD and HIV). Experimental results demonstrate the superior performance of the proposed model on multi-graph clustering.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46128-1_30

Multi-Objective Group Discovery on the Social Web
Behrooz Omidvar-Tehrani, Sihem Amer-Yahia, Pierre-Francois Dutot, Denis Trystram

Room: 1000A
2016-09-21; 11:30 - 11:50
Authors:
- Behrooz Omidvar-Tehrani (Ohio State University)
- Sihem Amer-Yahia (CNRS/Univ. Grenoble Alpes)
- Pierre-Francois Dutot (CNRS/Univ. Grenoble Alpes)
- Denis Trystram (CNRS/Univ. Grenoble Alpes)
Abstract:

We are interested in discovering user groups from collaborative rating datasets. Each user has a set of attributes that help find labeled groups such as young computer scientists in France and American female designers. We formalize the problem of finding user groups whose quality is optimized in multiple dimensions and show that it is NP-Complete. We develop alpha-MOMRI, an alpha-approximation algorithm, and h-MOMRI, a heuristic-based algorithm, for multi-objective optimization to find high quality groups. Our extensive experiments on real datasets from the social Web examine the performance of our algorithms and report cases where alpha-MOMRI and h-MOMRI are useful.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46128-1_19

Multilabel classification on heterogeneous graphs with gaussian embeddings
Ludovic DOS SANTOS, Benjamin Piwowarski, Patrick Gallinari

Room: 1000A
2016-09-20; 11:40 - 12:00
Authors:
- Ludovic DOS SANTOS (LIP6)
- Benjamin Piwowarski (LIP6)
- Patrick Gallinari (LIP6)
Abstract:

We consider the problem of node classification in heterogeneous graphs where both nodes and relations may be of different types and a different set of categories is associated to each node type. When graph node classification has mainly been addressed for homogeneous graphs, heterogeneous classification is a recent problem which has been motivated by applications in fields such as social networks where the graphs are intrinsically heterogeneous. We propose a transductive approach to this problem based on learning graph embeddings and model the uncertainty associated to the node representations using Gaussian embeddings. A comparison with representative baselines is provided on three heterogeneous datasets.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46227-1_38

Native Advertisement Selection and Allocation in Social Media Post Feeds
Iordanis Koutsopoulos, Panagiotis Spentzouris

Room: 1000A
2016-09-21; 17:40 - 18:00
Authors:
- Iordanis Koutsopoulos (AUEB)
- Panagiotis Spentzouris (AUEB)
Abstract:

We study native advertisement selection and placement in social media post feeds. Ιn the prevalent pay-per-click model, each ad click entails a given amount of revenue for the platform. The probability of click of an ad depends on various attributes that are inherent to the ad (\eg ad quality), or related to the user profile and activity, or related to the post feed. While the first two types of attributes are also encountered in web-search advertising, the third one fundamentally differentiates native from web-search advertising, and it is the one we model and study in this paper. Evidence from online platforms suggests that prime attributes of the third type that affect ad clicks are the relevance of ads to preceding posts, and the distance between consecutively projected ads; \eg the fewer the intervening posts between ads, the smaller the click probability is, due to user saturation. We model the events of ad click with Bernoulli random variables. We seek the ad selection and allocation policy that optimizes a metric which is a combination of (i) the platform expected revenue, and (ii) uncertainty in revenue, captured by the variance of provisionally consumed budget of selected ads. Uncertainty in revenue should be minimum, since it translates into reduced profit or wasted advertising opportunities for the platform. On the other hand, the expected revenue from ad clicking should be maximum. The constraint is that the expected revenue attained for each selected ad should not exceed its apriori set budget. We show that the optimization problem above reduces to an instance of a resource-constrained minimum-cost path problem on a weighted directed acyclic graph. Through numerical evaluation, we assess the impact of various parameters on the objective and the way they shape the tradeoff between revenue and uncertainty.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46128-1_37

Node Re-Ordering as a Means of Anomaly Detection in Time-Evolving Graphs
Lida Rashidi, Christopher Leckie, Sutharshan Rajasegarar, Andrey Kan, Jeffrey Chan, James Bailey

Room: 300A
2016-09-20; 12:00 - 12:20
Authors:
- Lida Rashidi (The University of Melbourne)
- Christopher Leckie
- Sutharshan Rajasegarar (Deakin University)
- Andrey Kan (The University of Melbourne)
- Jeffrey Chan (RMIT University)
- James Bailey (University of Melbourne)
Abstract:

Anomaly detection is a vital task for maintaining and improving any dynamic system. In this paper, we address the problem of anomaly detection in time-evolving graphs, where graphs are a natural representation for data in many types of applications. A key challenge in this context is how to process large volumes of streaming graphs. We propose a pre-processing step before running any further analysis on the data, where we permute the rows and columns of the adjacency matrix. This pre-processing step expedites graph mining techniques such as anomaly detection, PageRank, graph coloring and visualization. We can then detect graph anomalies based on rank correlations of the reordered nodes. The merits of our approach lie in its simplicity and resilience to challenges such as unsupervised input, large volumes and high velocities of data. We evaluate the scalability and accuracy of our method on real graphs, where our method facilitates graph processing while producing more deterministic orderings. We show that the proposed approach is capable of revealing anomalies in a more efficient manner based on node rankings. Furthermore, our method can produce visual representations of graphs that are useful for graph compression.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46227-1_11

On Dynamic Feature Weighting for Feature Drifting Data Streams
Jean Barddal, Heitor Gomes, Fabrício Enembreck, Bernhard Pfahringer, Albert Bifet

Room: 300B
2016-09-22; 11:40 - 12:00
Authors:
- Jean Barddal (PUCPR)
- Heitor Gomes (PUCPR)
- Fabrício Enembreck (PUCPR)
- Bernhard Pfahringer (University of Waikato)
- Albert Bifet (ParisTech Université Paris-Saclay)
Abstract:

The ubiquity of data streams has been encouraging the development of new incremental and adaptive learning algorithms. Data stream learners must be fast, memory-bounded, but mainly, tailored to adapt to possible changes in the data distribution, a phenomenon named concept drift. Recently, several works have shown the impact of a so far nearly neglected type of drift: feature drifts. Feature drifts occur whenever a subset of features becomes, or ceases to be, relevant to the learning task. In this paper we (i) provide insights into how the relevance of features can be tracked as a stream progresses according to information theoretical Symmetrical Uncertainty; and (ii) how it can be used to boost two learning schemes: Naive Bayesian and k-Nearest Neighbor. Furthermore, we investigate the usage of these two new dynamically weighted learners as prediction models in the leaves of the Hoeffding Adaptive Tree classifier. Results show improvements in accuracy (an average of 10.69% for k-Nearest Neighbor, 6.23% for Naive Bayes and 4.42% for Hoeffding Adaptive Trees) in both synthetic and real-world datasets at the expense of a bounded increase in both memory consumption and processing time.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46227-1_9

On the Convergence of A Family of Robust Losses for Stochastic Gradient Descent
Bo Han, Ivor Tsang, Ling Chen

Room: 300B
2016-09-20; 11:20 - 11:40
Authors:
- Bo Han (Univ. of Technology Sydney)
- Ivor Tsang (University of Technology Sydney)
- Ling Chen (University of Technology)
Abstract:

The convergence of Stochastic Gradient Descent (SGD) using convex loss functions has been widely studied. However, vanilla SGD methods using convex losses cannot perform well with noisy labels, which adversely affect the update of the primal variable in SGD methods. Unfortunately, noisy labels are ubiquitous in real world applications such as crowdsourcing. To handle noisy labels, in this paper, we present a family of robust losses for SGD methods. By employing our robust losses, SGD methods successfully reduce negative effects caused by noisy labels on each update of the primal variable. We not only reveal that the convergence rate is O(1/T) for SGD methods using robust losses, but also provide the robustness analysis on two representative robust losses. Comprehensive experimental results on six real-world datasets show that SGD methods using robust losses are obviously more robust than other baseline methods in most situations with fast convergence.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46128-1_42

Reproducible research
Online Density Estimation of Heterogeneous Data Streams in Higher Dimensions
Michael Geilke, Andreas Karwath, Stefan Kramer

Room: 300B
2016-09-22; 11:00 - 11:20
Authors:
- Michael Geilke (JGU Mainz)
- Andreas Karwath (University of Mainz)
- Stefan Kramer
Abstract:

The joint density of a data stream is suitable for performing data mining tasks without having access to the original data. However, the methods proposed so far only target a small to medium number of variables, since their estimates rely on representing all the interdependencies between the variables of the data. High-dimensional data streams, which are becoming more and more frequent due to increasing numbers of interconnected devices, are, therefore, pushing these methods to their limits. To mitigate these limitations, we present an approach that projects the original data stream into a vector space and uses a set of representatives to provide an estimate. Due to the structure of the estimates, it enables the density estimation of high-dimensional data in the form of streams and approaches the true density estimate with increasing dimensionality of the vector space. Moreover, it is not only designed to estimate homogeneous data, i.e., where all variables are nominal or all variables are numeric, but it can also estimate heterogeneous data. The evaluation is conducted on synthetic and real-world data.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46128-1_5
Download links: Code

OSLa: Online Structure Learning using Background Knowledge Axiomatization
Evangelos Michelioudakis, Anastasios Skarlatidis, George Paliouras, Alexander Artikis

Room: 300B
2016-09-20; 12:20 - 12:40
Authors:
- Evangelos Michelioudakis (NCSR Demokrios)
- Anastasios Skarlatidis (NCSR Demokritos)
- George Paliouras (N.C.S.R. Demokritos)
- Alexander Artikis (NCSR Demokritos)
Abstract:

In this paper, we present OSLa -- an online structure learner for MLNs that exploits background knowledge axiomatization in order to constrain the space of possible structures. Many domains of interest today are characterized by uncertainty and complex relational structure. Markov Logic Networks (MLNs) is a state-of-the-art Statistical Relational Learning (SRL) framework that can naturally be applied in domains governed by these characteristics. Learning MLNs from data is challenging, as their relational structure increases the complexity of the learning process. In addition, due to the dynamic nature of many real-world applications, it is desirable to incrementally learn or revise the model's structure and parameters. Experimental results are presented in the domain of activity recognition under uncertainty using the axiomatization of a probabilistic variant of the Event Calculus (MLNEC) as background knowledge and a benchmark dataset for video surveillance.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46128-1_15

Persistent Roles in Online Social Networks
Matt Revelle, Carlotta Domeniconi, Aditya Johri

Room: 1000A
2016-09-22; 12:40 - 13:00
Authors:
- Matt Revelle (George Mason University)
- Carlotta Domeniconi (George Mason University)
- Aditya Johri (George Mason University)
Abstract:

Users in online social networks often have very different structural positions which may be attributed to a latent factor: roles. In this paper, we analyze dynamic networks from two datasets (Facebook and Scratch) to find roles which define users' structural positions. Each dynamic network is partitioned into snapshots and we independently find roles for each network snapshot. We present our role discovery methodology and investigate how roles differ between snapshots and datasets. Six persistent roles are found and we investigate user role membership, transitions between roles, and interaction preferences.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46227-1_4

Planning with Information-Processing Constraints and Model Uncertainty in Markov Decision Processes
Jordi Grau-Moya, Felix Leibfried, Tim Genewein, Daniel Braun

Room: 1000B
2016-09-22; 11:00 - 11:20
Authors:
- Jordi Grau-Moya (Max Planck Institute)
- Felix Leibfried
- Tim Genewein
- Daniel Braun
Abstract:

Recently, information-theoretic principles for learning and acting have been proposed to solve particular classes of Markov Decision Problems. Mathematically, such approaches are governed by a variational free energy principle and allow solving MDP planning problems with information-processing constraints expressed in terms of a Kullback-Leibler divergence with respect to a reference distribution. Here we consider a generalization of such MDP planners by taking model uncertainty into account. As model uncertainty can also be formalized as an information-processing constraint, we can derive a unified solution from a single generalized variational principle. We provide a generalized value iteration scheme together with a convergence proof. As limit cases, this generalized scheme includes standard value iteration with a known model, Bayes Adaptive MDP planning, and robust planning. We demonstrate the benefits of this approach in a grid world simulation.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46227-1_30

Proactive Transfer Learning for Heterogeneous Feature and Label Spaces
Seungwhan Moon, Jaime Carbonell

Room: 300A
2016-09-22; 17:40 - 18:00
Authors:
- Seungwhan Moon (Carnegie Mellon University)
- Jaime Carbonell (Carnegie Mellon University)
Abstract:

We propose a framework for learning new target tasks by leveraging existing heterogeneous knowledge sources. Unlike the traditional transfer learning, we do not require explicit relations between source and target tasks, and instead let the learner actively mine transferable knowledge from a source dataset. To this end, we develop (1) a transfer learning method for source datasets with heterogeneous feature and label spaces, and (2) a proactive learning framework which progressively builds bridges between target and source domains in order to improve transfer accuracy. Experiments on a challenging transfer learning scenario (learning from hetero-lingual datasets with non-overlapping label spaces) show the efficacy of the proposed approach.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46227-1_44

Pure Exploration for Max-Quantile Bandits
Yahel David, Nahum Shimkin

Room: 1000B
2016-09-22; 12:20 - 12:40
Authors:
- Yahel David (Technion)
- Nahum Shimkin (Technion)
Abstract:

We consider a variant of the pure exploration problem in Multi-Armed Bandits, where the goal is to find the arm for which the λ-quantile is maximal. Within the PAC framework, we provide a lower bound on the sample complexity of any (ε,δ)-correct algorithm, and propose algorithms with matching upper bounds. Our bounds sharpen existing ones by explicitly incorporating the quantile factor λ. We further provide experiments that compare the sample complexity of our algorithms with that of previous works.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46128-1_35

Robust Dictionary Learning on the Hilbert Sphere in Kernel Feature Space
Nishanth Koushik, Suyash Awate

Room: 1000A
2016-09-20; 17:20 - 17:40
Authors:
- Nishanth Koushik (IIT Bombay)
- Suyash Awate (IIT Bombay)
Abstract:

This paper presents a novel dictionary learning method in kernel feature space that is part of a reproducing kernel Hilbert space (RKHS). Our method focuses on several popular kernels, e.g., radial basis function kernels like the Gaussian, that implicitly map data to a Hilbert sphere, a Riemannian manifold, in RKHS. Our method exploits this manifold structure of the mapped data in RKHS, unlike typical methods for kernel dictionary learning that use linear methods in RKHS. We show that dictionary learning on a Hilbert sphere in RKHS is possible without the need of the explicit lifting map underlying the kernel, but using solely the Gram matrix. Unlike the typical L1 norm sparsity prior, we incorporate the non-convex Lp quasi-norm based penalty, with p < 1, on coefficients to enforce a stronger sparsity prior and achieve more robust dictionary learning in the presence of corrupted training data. We evaluate our method for image classification on two large publicly available datasets and demonstrate the improved performance of our method over the state of the art dictionary learning methods.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46128-1_46

Reproducible research
Robust Principal Component Analysis by Reverse Iterative Linear Programming
Andrea Visentin, Steven Prestwich, Armagan Tarim

Room: 300A
2016-09-22; 15:30 - 15:50
Authors:
- Andrea Visentin (Insight Centre Data Analytics)
- Steven Prestwich (Insight Centre for Data Analytics)
- Armagan Tarim (Hacettepe University)
Abstract:

Principal Components Analysis (PCA) is a data analysis technique widely used in dimensionality reduction. It extracts a small number of orthonormal vectors that explain most of the variation in a dataset, which are called the Principal Components. Conventional PCA is sensitive to outliers because it is based on the L2-norm, so to improve robustness several algorithms based on the L1-norm have been introduced in the literature. We present a new algorithm for robust L1-norm PCA that computes components iteratively in reverse, using a new heuristic based on Linear Programming. This solution is focused on finding the projection that minimizes the variance of the projected points. It has only one parameter to tune, making it simple to use. On common benchmarks it performs competitively compared to other methods.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46227-1_37
Download links: Code

Reproducible research
Scalable Hyperparameter Optimization with Products of Gaussian Process Experts
Nicolas Schilling, Martin Wistuba, Larst Schmidt-Thieme

Room: 1000B
2016-09-20; 15:50 - 16:10
Authors:
- Nicolas Schilling (University of Hildesheim)
- Martin Wistuba (University of Hildesheim)
- Larst Schmidt-Thieme (University of Hildesheim)
Abstract:

In machine learning, hyperparameter optimization is a challenging but necessary task that is usually approached in a computationally expensive manner such as grid-search. Out of this reason, surrogate based black-box optimization techniques such as sequential model-based optimization have been proposed which allow for a faster hyperparameter optimization. Recent research proposes to also integrate hyperparameter performances on past data sets to allow for a faster and more efficient hyperparameter optimization. In this paper, we use products of Gaussian process experts as surrogate models for hyperparameter optimization. Naturally, Gaussian processes are a decent choice as they offer good prediction accuracy as well as estimations about their uncertainty. Additionally, their hyperparameters can be tuned very effectively. However, in the light of large meta data sets, learning a single Gaussian process is not feasible as it involves inversion of a large kernel matrix. This directly limits their usefulness for hyperparameter optimization if large scale hyperparameter performances on past data sets are given. By using products of Gaussian process experts the scalability issues can be circumvened, however, this usually comes with the price of having less predictive accuracy. In our experiments, we show empirically that products of experts nevertheless perform very well compared to a variety of published surrogate models. Thus, we propose a surrogate model that performs as well as the current state of the art, is scalable to large scale meta knowledge, does not include hyperparameters itself and finally is even very easy to parallelize.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46128-1_3
Download links: Code

Selecting Collaborative Filtering algorithms using Metalearning
Tiago Cunha, Carlos Soares, André Carvalho

Room: 300B
2016-09-22; 17:20 - 17:40
Authors:
- Tiago Cunha (University of Porto/INESC)
- Carlos Soares
- André Carvalho (ICMC - University of São Paulo - São Carlos)
Abstract:

Recommender Systems are an important tool in e-business, for both companies and customers. Several algorithms are available to developers, however, there is little guidance concerning which is the best algorithm for a specific recommendation problem. In this study, a metalearning approach is proposed to address this issue. It consists of relating the characteristics of problems (metafeatures) to the performance of recommendation algorithms. We propose a set of metafeatures based on the application of systematic procedure to develop metafeatures and by extending and generalizing the state of the art metafeatures for recommender systems. The approach is tested on a set of Matrix Factorization algorithms and a collection of real-world Collaborative Filtering datasets. The performance of these algorithms in these datasets is evaluated using several standard metrics. The algorithm selection problem is formulated as classification tasks, where the target attributes are the best Matrix Factorization algorithm, according to each metric. The results show that the approach is viable and that the metafeatures used contain information that is useful to predict the best algorithm for a dataset.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46227-1_25

Reproducible research
Semi-supervised Tensor Factorization for Brain Network Analysis
Bokai Cao, Chun-Ta Lu, Xiaokai Wei, Philip Yu, Alex Leow

Room: 300A
2016-09-22; 11:20 - 11:40
Authors:
- Bokai Cao (University of Illinois Chicago)
- Chun-Ta Lu
- Xiaokai Wei (Univ. of Illinois at Chicago)
- Philip Yu (University of Illinois at Chicago)
- Alex Leow (UIC)
Abstract:

Brain networks characterize the temporal and/or spectral connections between brain regions and are inherently represented by multi-way arrays (tensors). In order to discover the underlying factors driving such connections, we need to derive compact representations from brain network data. Such representations should be discriminative so as to facilitate the identification of subjects performing different cognitive tasks or with different neurological disorders. In this paper, we propose semiBAT, a novel semi-supervised Brain network Analysis approach based on constrained Tensor factorization. semiBAT 1) leverages unlabeled resting-state brain networks for task recognition, 2) explores the temporal dimension to capture the progress, 3) incorporates classifier learning procedure to introduce supervision from labeled data, and 4) selects discriminative latent factors for different tasks. The Alternating Direction Method of Multipliers (ADMM) framework is utilized to solve the optimization objective. Experimental results on EEG brain networks illustrate the superior performance of the proposed semiBAT model on graph classification with a significant improvement 31.60% over plain vanilla tensor factorization. Moreover, the data-driven factors can be readily visualized which should be informative for investigating cognitive mechanisms.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46128-1_2
Download links: Code

Reproducible research
Semigeometric Tiling of Event Sequences
Andreas Henelius, Isak Karlsson, Panagiotis Papapetrou, Antti Ukkonen, Kai Puolamaki

Room: 300B
2016-09-22; 15:50 - 16:10
Authors:
- Andreas Henelius (Finnish Institute of Occupational Health)
- Isak Karlsson (Stockholm University)
- Panagiotis Papapetrou (Stockholm University)
- Antti Ukkonen (Occupational health institute)
- Kai Puolamaki (Finnish Institute of Occupational Health)
Abstract:

Event sequences are common in many domains, e.g., in finance, medicine, and social media. Often the same underlying phenomenon, such as television advertisements during Superbowl, is reflected in a number of independent event sequences, like different Twitter users. In such situations it is of interest to find combinations of temporal segments and subsets of sequences where an event of interest, like a particular hashtag, has an increased probability of occurrence. Such patterns are useful for exploring the event sequences in terms of their evolving temporal dynamics, and provide more fine-grained insights to the data than what for example straightforward clustering can reveal. We formulate the task of finding such patterns as a novel matrix tiling problem, and propose two algorithms for solving it. Our first algorithm is a greedy set-cover heuristic, while in the second approach we view the problem as time-series segmentation. We also apply the algorithms on real and artificial datasets and obtain promising results.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46128-1_21
Download links: Code

Reproducible research
Sequential Data Classification in the Space of Liquid State Machines
Yang Li, Junyuan Hong, Huanhuan Chen

Room: 1000B
2016-09-22; 17:40 - 18:00
Authors:
- Yang Li (Univ. Sci. & Technol. China)
- Junyuan Hong (Univ. Sci. & Tec. China)
- Huanhuan Chen (Univ. Sci. & Tec. China)
Abstract:

This paper proposes a novel classification approach to carrying out sequential data classification. In this approach, each sequence in a data stream is approximated and represented by a state space model -- liquid state machine. Each sequence is mapped into the state space of the approximating model. Instead of carrying out classification on the sequences directly, we discuss measuring the dissimilarity between models under different hypotheses. The classification conducted on binary synthetic data demonstrate to be more robust with an appropriate measurement. The classifications are conducted on benchmark univariate and multivariate data. The experimental results confirm the advantages of proposed method compared with some state-of-the-art algorithms.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46128-1_20
Download links: Code

Stochastic CoSaMP: Randomizing Greedy Pursuit for Sparse Signal Recovery
Dipan Pal, OLE MENGSHOEL

Room: 300B
2016-09-20; 12:00 - 12:20
Authors:
- Dipan Pal (Carnegie Mellon University)
- OLE MENGSHOEL (Carnegie Mellon University)
Abstract:

In this paper, we formulate the $K$-sparse compressed signal recovery problem with the L_0 norm within a Stochastic Local Search (SLS) framework. Within this randomized framework, we generalize the popular recovery algorithm CoSaMP, creating Stochastic CoSaMP (StoCoSaMP). Interestingly, our deterministic worst case analysis shows that under RIP, even a purely random version of StoCoSaMP is guaranteed to recover a notion of strong components of a sparse signal, thereby leading to support convergence. Empirically, we find that randomness helps StoCoSaMP to outperform CoSaMP both in terms of signal recoverability and computational cost, even in high dimensions (upto 1 million dimensions), on a variety of problems. Further, StoCoSaMP outperforms many other popular recovery algorithms (including StoGradMP and StoIHT) on large scale real-world gene-expression datasets.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46128-1_48

Structure Pattern Analysis and Cascade Prediction in Social Networks
Bolei Zhang, Zhuzhong Qian, Sanglu Lu

Room: 1000A
2016-09-22; 12:00 - 12:20
Authors:
- Bolei Zhang (Nanjing University)
- Zhuzhong Qian
- Sanglu Lu
Abstract:

As information spreads across social links, it may reach different people and become cascades in social networks. However, the elusive micro-foundations of social behaviors and the complex underlying social networks make it very difficult to model and predict the information diffusion process precisely. From a different perspective, we can often observe the interplay between information diffusion and the cascade structures. For example, homophily-driven cascades may evolve into simple and star-like structures, while influence-driven cascades are more likely to become complex and rapid. On one hand, different mechanics of information diffusion can drive the cascades evolving into different structures; On the other hand, the cascade structures also have an affect on the range of susceptible users for information diffusion. In this paper, we explore the relationships between information diffusion and the cascade structures in social networks. By embedding the cascades in a lower dimensional space and employing spectral clustering algorithm, we find that the cascades generally evolve into five typical structure patterns with distinguishable characteristics. In addition, these patterns can be identified by observing the initial footprints of the cascades. Based on this observation, we propose that the structure patterns can be predictive of the cascade growth. The experiment results show that the accuracy of predicting both the structure and virality of cascades can be improved significantly by incorporating the cascade structure patterns.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46128-1_33

Subgroup Discovery with Proper Scoring Rules
Hao Song, Meelis Kull, Peter Flach, Georgios Kalogridis

Room: 300B
2016-09-20; 14:50 - 15:10
Authors:
- Hao Song (University of Bristol)
- Meelis Kull (University of Bristol)
- Peter Flach (University of Bristol)
- Georgios Kalogridis
Abstract:

Subgroup Discovery is the process of finding and describing sufficiently large subsets of a given population that have unusual distributional characteristics with regard to some target attribute. Such subgroups can be used as a statistical summary which improves on the default summary of stating the overall distribution in the population. A natural way to evaluate such summaries is to quantify the difference between predicted and empirical distribution of the target. In this paper we propose to use proper scoring rules, a well-known family of evaluation measures for assessing the goodness of probability estimators, to obtain theoretically well-founded evaluation measures for subgroup discovery. From this perspective, one subgroup is better than another if it has lower divergence of target probability estimates from the actual labels on average. We demonstrate empirically on both synthetic and real-world data that this leads to higher quality statistical summaries than the existing methods based on measures such as Weighted Relative Accuracy.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46227-1_31

Reproducible research
Temporal PageRank
Polina Rozenshtein, Aristides Gionis

Room: 1000A
2016-09-22; 11:00 - 11:20
Authors:
- Polina Rozenshtein (Aalto University)
- Aristides Gionis (Aalto University)
Abstract:

PageRank is one of the most popular measures for ranking the nodes of a network according to their importance. However, PageRank is defined as a steady state of a random walk, which implies that the underlying network need to be fixed and static. Thus, to extend PageRank for networks with a temporal component, the temporal information has to be incorporated into the model. Although numerous recent works study the problem of computing PageRank on dynamic graphs, most of them consider the case of updating static PageRank under node/edge insertions/deletions. In other words, PageRank is always defined as the static PageRank of the current instance of the graph. In this paper we introduce temporal PageRank, a generalization of PageRank for temporal networks, where activity is represented as a sequence of time-stamped edges. Our model uses the random-walk interpretation of static PageRank, generalized by the concept of temporal random walk. By highlighting the actual information flow in the network, temporal PageRank captures more accurately the network dynamics. A main feature of temporal PageRank is that it adapts to concept drifts: the importance of nodes may change during the lifetime of the network, according to changes in the distribution of edges. On the other hand, if the distribution of edges remains constant, temporal PageRank is equivalent to static PageRank. We present variants of temporal PageRank along with efficient algorithms, suitable for online streaming scenarios. We conduct experiments on various real and semi-real datasets, and provide empirical evidence that temporal PageRank is a flexible measure that adjusts to changes in the network dynamics.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46227-1_42
Download links: Code

The Matrix Generalized Inverse Gaussian Distribution: Properties and Applications
Farideh Fazayeli, Arindam Banerjee

Room: 1000A
2016-09-21; 11:50 - 12:10
Authors:
- Farideh Fazayeli (University of Minnesota)
- Arindam Banerjee
Abstract:

While the Matrix Generalized Inverse Gaussian (MGIG) distribution arises naturally in some settings as a distribution over symmetric positive semi-definite matrices, certain key properties of the distribution and effective ways of sampling from the distribution have not been carefully studied. In this paper, we show that the MGIG is unimodal, and the mode can be obtained by solving an Algebraic Riccati Equation (ARE) equation [8]. Based on the property, we propose an importance sampling method for the MGIG where the mode of the proposal distribution matches that of the target. The proposed sampling method is more efficient than existing approaches [30, 31], which use proposal distributions that may have the mode far from the MGIG’s mode. Further, we illustrate that the the posterior distribution in latent factor models, such as probabilistic matrix factorization (PMF) [23], when marginalized over one latent factor has the MGIG distribution. The characterization leads to a novel Collapsed Monte Carlo (CMC) inference algorithm for such latent factor models. We illustrate that CMC has a lower log loss or perplexity than MCMC, and needs fewer samples.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46128-1_41

Top-N Recommendation via Joint Cross-Domain User Clustering and Similarity Learning
Dimitrios Rafailidis, Fabio Crestani

Room: 300B
2016-09-22; 16:40 - 17:00
Authors:
- Dimitrios Rafailidis (Aristotle University of Thessaloniki)
- Fabio Crestani (Universita della Svizzera Italiana (USI), Switzerland)
Abstract:

A cross-domain recommendation algorithm exploits user preferences from multiple domains to solve the data sparsity and cold-start problems, in order to improve the recommendation accuracy. In this study, we propose an efficient Joint cross-domain user Clustering and Similarity Learning recommendation algorithm, namely JCSL. We formulate a joint objective function to perform adaptive user clustering in each domain, when calculating the user-based and cluster-based similarities across the multiple domains. In addition, the objective function uses an L_2,1 regularization term, to consider the sparsity that occurs in the user-based and cluster-based similarities between multiple domains. The joint problem is solved via an efficient alternating optimization algorithm, which adapts the clustering solutions in each iteration so as to jointly compute the user-based and cluster-baser similarities. Our experiments on ten cross-domain recommendation tasks show that JCSL outperforms other state-of-the-art cross-domain strategies.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46227-1_27

Transactional Tree Mining
Mostafa Haghir Chehreghani, Morteza Haghir Chehreghani

Room: 300B
2016-09-20; 15:50 - 16:10
Authors:
- Mostafa Haghir Chehreghani (KU Leuven)
- Morteza Haghir Chehreghani (Xerox Research Centre Europe (XRCE), France)
Abstract:

Mining frequent tree patterns is useful in many areas including XML documents, bioinformatics and the world wide web. The crucial step in finding frequent patterns is to decide whether a tree pattern is a subtree, under a matching operator, to a database tree. A widely used matching operator for tree-structured data is subtree homeomorphism, where an edge in the tree pattern is mapped onto an ancestor-descendant relationship in the database tree. In this paper, we study the problem of finding frequent tree patterns in transactional setting where frequency of a pattern is defined as the number of database trees for which the pattern is subtree homeomorphic. Our extensive study on the properties of real-world datasets reveals that while many vertices in a database tree may have the same label, no two vertices on the same path are labeled identically. We restrict ourselves to this class of database trees and propose a novel and efficient method for deciding whether a tree pattern is subtree homeomorphic to a database tree. Our algorithm is based on a compact data-structure, called EMET, that stores all information required for subtree homeomorphism, that are distinct pairs of rightmost vertices and second rightmost vertices of occurrences. We propose an efficient algorithm to generate EMETs of larger patterns using EMETs of the smaller ones. Based on the proposed subtree homeomorphism method, we introduce TTM, an effective algorithm for finding frequent tree patterns from rooted ordered trees. We evaluate the efficiency of TTM on several real-world and synthetic datasets and show that it outperforms well-known existing algorithms by an order of magnitude.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46128-1_12

Trust Hole Identification in Signed Networks
Jiawei Zhang, Qianyi Zhan, Lifang He, Charu Aggarwal, Philip Yu

Room: 1000A
2016-09-22; 17:20 - 17:40
Authors:
- Jiawei Zhang (Univ of Illinois at Chicago)
- Qianyi Zhan (NJU)
- Lifang He (Shenzhen University)
- Charu Aggarwal (IBM)
- Philip Yu
Abstract:

In the trust-centric context of signed networks, the social links among users are associated with specific polarities to denote the attitudes (trust vs distrust) among the users. Different from traditional unsigned social networks, the diffusion of information in signed networks can be affected by the link polarities and users’ positions significantly. In this paper, a new concept called “trust hole” is introduced to characterize the advantages of specific users positions in signed networks. To uncover the trust holes, a novel trust hole detection framework named “Social Community based tRust hOLe expLoration” (SCROLL) is proposed in this paper. Framework SCROLL is based on the signed community detection technique. By removing the potential trust hole candidates, SCROLL aims at maximizing the community detection cost drop to identify the optimal set of trust holes. Extensive experiments have been done on real-world signed network datasets to show the effectiveness of SCROLL.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46128-1_44

Reproducible research
Two-Stage Transfer Surrogate Model for Automatic Hyperparameter Optimization
Martin Wistuba, Nicolas Schilling, Lars Schmidt-Thieme

Room: 1000B
2016-09-20; 15:30 - 15:50
Authors:
- Martin Wistuba (University of Hildesheim)
- Nicolas Schilling (University of Hildesheim)
- Lars Schmidt-Thieme (University of Hildesheim)
Abstract:

The choice of hyperparameters and the selection of algorithms is a crucial part in machine learning. Bayesian optimization methods have been used very successfully to tune hyperparameters automatically, in many cases even being able to outperform the human expert. Recently, these techniques have been massively improved by using meta-knowledge. The idea is to use knowledge of the performance of an algorithm on given other data sets to automatically accelerate the hyperparameter optimization for a new data set. In this work we present a model that transfers this knowledge in two stages. At the first stage, the function that maps hyperparameter configurations to hold-out validation performances is approximated for previously seen data sets. At the second stage, these approximations are combined to rank the hyperparameter configurations for a new data set. In extensive experiments on the problem of hyperparameter optimization as well as the problem of combined algorithm selection and hyperparameter optimization, we are outperforming the state of the art methods.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46128-1_13
Download links: Code

Uncovering Locally Discriminative Structure for Feature Analysis
Sen Wang, Feiping Nie, Xiaojun Chang, Xue Li, Quan Z. Sheng, Lina Yao

Room: 1000B
2016-09-22; 15:10 - 15:30
Authors:
- Sen Wang (ITEE, UQ)
- Feiping Nie
- Xiaojun Chang (Center for QCIS, University of Technology Sydney, Australia)
- Xue Li
- Quan Z. Sheng (School of CS, The University of Adelaide, Australia.)
- Lina Yao (School of CSE, The University of New South Wales, Australia.)
Abstract:

Manifold structure learning is often applied to exploit geometric information among data in semi-supervised feature learning algorithms. In this paper, we find that the local discriminative information is also of importance for semi-supervised feature learning. We propose a method that utilizes both the manifold structure of data and local discriminant information. Specifically, we define a local clique for each data point. The k-Nearest Neighbors (kNN) is used to determine the structural information in each clique. Consequently, we employ a variant of Fisher criterion model to each clique as a local discriminant evaluation and sum up all cliques as a global integration into the framework. In this way, local discriminant information is embedded. Besides, labels are also utilized to minimize distances between the data from the same class. Meanwhile, we use the kernel method to extend our proposed model facilitating the feature learning in a high-dimensional space after feature mapping. Experimental results show that our method is superior to all other compared methods over a number of datasets.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46128-1_18

Reproducible research
Warped Matrix Factorisation for Multi-View Data Integration
Naruemon Pratanwanich, Pietro Lio', Oliver Stegle

Room: 1000A
2016-09-21; 15:50 - 16:10
Authors:
- Naruemon Pratanwanich (University of Cambridge)
- Pietro Lio' (University of Cambridge)
- Oliver Stegle (EBI Hinxton)
Abstract:

Matrix factorisation is a widely used tool and has found applications in collaborative filtering, image analysis and in genomics. Several extensions to the classical model have been proposed, such as joint modelling of multiple related ``data views'' or accounting for side information on the latent factors. However, as the complexity of these models increases even subtle mismatches of the distributional assumptions can severely affect the model performance. Here, we propose a simple yet effective solution to address this challenge by modelling the observed data in a transformed or warped space. We derive a joint model of a multi-view matrix factorisation that infers view-specific data transformations and provide a computationally efficient variational approach for parameter inference, which exploits the low-rank structure of side information. We validate the model on synthetic data before applying it to a major matrix completion problem in genomics, finding that our model improves the imputation of missing values in gene-disease association analysis and allows to discover enhanced consensus structures across multiple data views.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46227-1_49
Download links: Code

Journal Track

Springer Data Mining and Knowledge Discovery

Reproducible research
A distributed approach for graph mining in massive networks
Nilothpal Talukder, Mohammed J. Zaki.

Room: 1000B
2016-09-20; 12:20 - 12:40
Authors:
- Nilothpal Talukder
- Mohammed J. Zaki.
Abstract:

We propose a novel distributed algorithm for mining frequent subgraphs from a single, very large, labeled network. Our approach is the first distributed method to mine a massive input graph that is too large to fit in the memory of any individual compute node. The input graph thus has to be partitioned among the nodes, which can lead to potential false negatives. Furthermore, for scalable performance it is crucial to minimize the communication among the compute nodes. Our algorithm, DistGraph, ensures that there are no false negatives, and uses a set of optimizations and efficient collective communication operations to minimize information exchange. To our knowledge DistGraph is the first approach demonstrated to scale to graphs with over a billion vertices and edges. Scalability results on up to 2048 IBM Blue Gene/Q compute nodes, with 16 cores each, show very good speedup.
Springer Link: http://link.springer.com/article/10.1007/s10618-016-0466-x
Download links: Code

An Efficient Exact Algorithm for Triangle Listing in Large Graphs
Sofiane Lagraa, Hamida Seba

Room: 1000B
2016-09-20; 12:00 - 12:20
Authors:
- Sofiane Lagraa
- Hamida Seba
Abstract:

This paper presents a new efficient exact algorithm for listing triangles in a large graph. While the problem of listing triangles in a graph has been considered before, dealing with large graphs continues to be a challenge. Although previous research has attempted to tackle the challenge, this is the first contribution that addresses this problem on a compressed copy of the input graph. In fact, the proposed solution lists the triangles without decompressing the graph. This yields interesting improvements in both storage requirement of the graphs and their time processing.
Springer Link: http://link.springer.com/article/10.1007/s10618-016-0451-4

Bayesian Wishart matrix factorization
Cheng Luo, Xiongcai Cai.

Room: 300A
2016-09-22; 11:40 - 12:00
Authors:
- Cheng Luo
- Xiongcai Cai.
Abstract:

User tastes are constantly drifting over time as users are exposed to different types of products. The ability to model the tendency of both user preferences and product attractiveness is vital to the success of recommender systems (RSs). We propose a Bayesian Wishart matrix factorization method to model the temporal dynamics of variations among user preferences and item attractiveness in a novel algorithmic perspective. The proposed method is able to well model and properly control diverse rating behaviors across time frames and related temporal effects within time frames in the tendency of user preferences and item attractiveness. We evaluate the proposed method on two synthetic and three real-world benchmark datasets for RSs. Experimental results demonstrate that our proposed method significantly outperforms a variety of state-of-the-art methods in RSs.
Springer Link: http://link.springer.com/article/10.1007/s10618-016-0474-x

C-BiLDA extracting cross-lingual topics from non-parallel texts by distinguishing shared from unshared content
Geert Heyman, Ivan Vulic, Marie-Francine Moens

Room: 300A
2016-09-20; 15:10 - 15:30
Authors:
- Geert Heyman
- Ivan Vulic
- Marie-Francine Moens
Abstract:

We study the problem of extracting cross-lingual topics from non-parallel multilingual text datasets with partially overlapping thematic content (e.g., aligned Wikipedia articles in two different languages). To this end, we develop a new bilingual probabilistic topic model called comparable bilingual latent Dirichlet allocation (C-BiLDA), which is able to deal with such comparable data, and, unlike the standard bilingual LDA model (BiLDA), does not assume the availability of document pairs with identical topic distributions. We present a full overview of C-BiLDA, and show its utility in the task of cross-lingual knowledge transfer for multi-class document classification on two benchmarking datasets for three language pairs. The proposed model outperforms the baseline LDA model, as well as the standard BiLDA model and two standard low-rank approximation methods (CL-LSI and CL-KCCA) used in previous work on this task.
Springer Link: http://link.springer.com/article/10.1007/s10618-015-0442-x

ClusPath: A Temporal-driven Clustering to Infer Typical Evolution Paths
Marian-Andrei Rizoiu, Julien Velcin, Stéphane Bonnevay, Stéphane Lallich

Room: 300A
2016-09-20; 16:40 - 17:00
Authors:
- Marian-Andrei Rizoiu
- Julien Velcin
- Stéphane Bonnevay
- Stéphane Lallich
Abstract:

We propose ClusPath, a novel algorithm for detecting general evolution tendencies in a population of entities. We show how abstract notions, such as the Swedish socioeconomical model (in a political dataset) or the companies fiscal optimization (in an economical dataset) can be inferred from low-level descriptive features. Such high-level regularities in the evolution of entities are detected by combining spatial and temporal features into a spatio-temporal dissimilarity measure and using semi-supervised clustering techniques. The relations between the evolution phases are modeled using a graph structure, inferred simultaneously with the partition, by using a “slow changing world” assumption. The idea is to ensure a smooth passage for entities along their evolution paths, which catches the longterm trends in the dataset. Additionally, we also provide a method, based on an evolutionary algorithm, to tune the parameters of ClusPath to new, unseen datasets. This method assesses the fitness of a solution using four opposed quality measures and proposes a balanced compromise.
Springer Link: http://link.springer.com/article/10.1007/s10618-015-0445-7

Ensembles of label noise filters: a ranking approach
Luìs P. F. Garcia, Ana C. Lorena, Stan Matwin, André C. P. L. F. de Carvalho.

Room: 1000A
2016-09-21; 17:00 - 17:20
Authors:
- Luìs P. F. Garcia
- Ana C. Lorena
- Stan Matwin
- André C. P. L. F. de Carvalho.
Abstract:

Label noise can be a major problem in classification tasks, since most machine learning algorithms rely on data labels in their inductive process. Thereupon, various techniques for label noise identification have been investigated in the literature. The bias of each technique defines how suitable it is for each dataset. Besides, while some techniques identify a large number of examples as noisy and have a high false positive rate, others are very restrictive and therefore not able to identify all noisy examples. This paper investigates how label noise detection can be improved by using an ensemble of noise filtering techniques. These filters, individual and ensembles, are experimentally compared. Another concern in this paper is the computational cost of ensembles, once, for a particular dataset, an individual technique can have the same predictive performance as an ensemble. In this case the individual technique should be preferred. To deal with this situation, this study also proposes the use of meta-learning to recommend, for a new dataset, the best filter. An extensive experimental evaluation of the use of individual filters, ensemble filters and meta-learning was performed using public datasets with imputed label noise. The results show that ensembles of noise filters can improve noise filtering performance and that a recommendation system based on meta-learning can successfully recommend the best filtering technique for new datasets. A case study using a real dataset from the ecological niche modeling domain is also presented and evaluated, with the results validated by an expert.
Springer Link: http://link.springer.com/article/10.1007/s10618-016-0475-9

Exact and efficient top-K inference for multi-target prediction by querying separable linear relational models
Michiel Stock, Krzysztof Dembczynski, Bernard De Baets, Willem Waegeman

Room: Belvedere
2016-09-22; 17:20 - 17:40
Authors:
- Michiel Stock
- Krzysztof Dembczynski
- Bernard De Baets
- Willem Waegeman
Abstract:

Many complex multi-target prediction problems that concern large target spaces are characterised by a need for efficient prediction strategies that avoid the computation of predictions for all targets explicitly. Examples of such problems emerge in several subfields of machine learning, such as collaborative filtering, multi-label classification, dyadic prediction and biological network inference. In this article we analyse efficient and exact algorithms for computing the top-K predictions in the above problem settings, using a general class of models that we refer to as separable linear relational models. We show how to use those inference algorithms, which are modifications of well-known information retrieval methods, in a variety of machine learning settings. Furthermore, we study the possibility of scoring items incompletely, while still retaining an exact top-K retrieval. Experimental results in several application domains reveal that the so-called threshold algorithm is very scalable, performing often many orders of magnitude more efficiently than the naive approach.
Springer Link: http://link.springer.com/article/10.1007/s10618-016-0456-z

Generalized random shapelet forests
Isak Karlsson, Panagiotis Papapetrou, Henrik Bostrom.

Room: 300B
2016-09-22; 12:40 - 13:00
Authors:
- Isak Karlsson
- Panagiotis Papapetrou
- Henrik Bostrom.
Abstract:

Shapelets are discriminative subsequences of time series, usually embedded in shapelet-based decision trees. The enumeration of time series shapelets is, however, computationally costly, which in addition to the inherent difficulty of the decision tree learning algorithm to effectively handle high-dimensional data, severely limits the applicability of shapelet-based decision tree learning from large (multivariate) time series databases. This paper introduces a novel tree-based ensemble method for univariate and multivariate time series classification using shapelets, called the generalized random shapelet forest (gRSF) algorithm. The algorithm generates a set of shapelet-based decision trees, where both the choice of instances used for building a tree and the choice of shapelets are randomized. For univariate time series, it is demonstrated through an extensive empirical investigation that the proposed algorithm yields predictive performance comparable to the current state-of-the-art and significantly outperforms several alternative algorithms, while being at least an order of magnitude faster. Similarly for multivariate time series, it is shown that the algorithm is significantly less computationally costly and more accurate than the current state-of-the-art.
Springer Link: http://link.springer.com/article/10.1007/s10618-016-0473-y

Irrevocable-choice algorithms for sampling from a stream
Yan Zhu, Eamonn Keogh

Room: 1000B
2016-09-20; 17:40 - 18:00
Authors:
- Yan Zhu
- Eamonn Keogh
Abstract:

The problem of sampling from data streams has attracted significant interest in the last decade. Whichever sampling criteria is considered (uniform sample, maximally diverse sample, etc.), the challenges stem from the relatively small amount of memory available in the face of unbounded streams. In this work we consider an interesting extension of this problem, the framework of which is stimulated by recent improvements in sensing technologies and robotics. In some situations it is not only possible to digitally sense some aspects of the world, but to physically capture a tangible aspect of that world. Currently deployed examples include devices that can capture water/air samples, and devices that capture individual insects or fish. Such devices create an interesting twist on the stream sampling problem, because in most cases, the decision to take a physical sample is irrevocable. In this work we show how to generalize diversification sampling strategies to the irrevocable-choice setting, demonstrating our ideas on several real world domains.
Springer Link: http://link.springer.com/article/10.1007/s10618-016-0472-z

Locating the Contagion Source in Networks with Partial Timestamps
Kai Zhu, Zhen Chen, Lei Ying

Room: 1000B
2016-09-20; 11:40 - 12:00
Authors:
- Kai Zhu
- Zhen Chen
- Lei Ying
Abstract:

This paper studies the problem of identifying a single contagion source when partial timestamps of a contagion process are available. We formulate the source localization problem as a ranking problem on graphs, where infected nodes are ranked according to their likelihood of being the source. Two ranking algorithms, cost-based ranking (CR) and tree-based ranking (TR), are proposed in this paper. Experimental evaluations with synthetic and real-world data show that our algorithms significantly improve the ranking accuracy compared with four existing algorithms.
Springer Link: http://link.springer.com/article/10.1007/s10618-015-0435-9

Mining Rooted Ordered Trees under Subtree Homeomorphism
Mostafa Haghir Chehreghani, Maurice Bruynooghe

Room: 300B
2016-09-20; 15:30 - 15:50
Authors:
- Mostafa Haghir Chehreghani
- Maurice Bruynooghe
Abstract:

Mining frequent tree patterns has many applications in different areas such as XML data, bioinformatics and World Wide Web. The crucial step in frequent pattern mining is frequency counting, which involves a matching operator to find occurrences (instances) of a tree pattern in a given collection of trees. A widely used matching operator for tree-structured data is subtree homeomorphism, where an edge in the tree pattern is mapped onto an ancestor-descendant relationship in the given tree. Tree patterns that are frequent under subtree homeomorphism are usually called embedded patterns. In this paper, we present an efficient algorithm for subtree homeomorphism with application to frequent pattern mining. We propose a compact data-structure, called occ, which stores only information about the rightmost paths of occurrences and hence can encode and represent several occurrences of a tree pattern. We then define efficient join operations on the occ data-structure, which help us count occurrences of tree patterns according to occurrences of their proper subtrees. Based on the proposed subtree homeomorphism method, we develop an effective pattern mining algorithm, called TPMiner. We evaluate the efficiency of TPMiner on several real-world and synthetic datasets. Our extensive experiments confirm that TP-Miner always outperforms well-known existing algorithms, and in several cases the improvement with respect to existing algorithms is significant.
Springer Link: http://link.springer.com/article/10.1007/s10618-015-0439-5

Optimizing Network Robustness by Edge Rewiring: A General Framework
Hau Chan, Leman Akoglu

Room: 1000B
2016-09-20; 11:20 - 11:40
Authors:
- Hau Chan
- Leman Akoglu
Abstract:

Spectral measures have long been used to quantify the robustness of real-world graphs. For example, spectral radius (or the principal eigenvalue) is related to the effective spreading rate of dynamic processes (e.g., rumor, disease, information propagation) on graphs. Algebraic connectivity (or the Fiedler value), which is a lower bound on the node and edge connectivity of a graph, captures the “partitionability” of a graph into disjoint components. In this work we address the problem of modifying a given graph’s structure under a given budget so as to maximally improve its robustness, as quantified by spectral measures. We focus on modifications based on degree-preserving edge rewiring, such that the expected load (e.g., airport flight capacity) or physical/hardware requirement (e.g., count of ISP router traffic switches) of nodes remain unchanged. Different from a vast literature of measure-independent heuristic approaches, we propose an algorithm, called EdgeRewire, which optimizes a specific measure of interest directly. Notably, EdgeRewire is general to accommodate six different spectral measures. Experiments on real-world datasets from three different domains (Internet AS-level, P2P, and airport flights graphs) show the effectiveness of our approach, where EdgeRewire produces graphs with both (i) higher robustness, and (ii) higher attack-tolerance over several state-of-the-art methods.
Springer Link: http://link.springer.com/article/10.1007/s10618-015-0447-5

Reproducible research
Scalable Time Series Classification
Patrick Schäfer

Room: 300B
2016-09-22; 12:20 - 12:40
Authors:
- Patrick Schäfer
Abstract:

Time series classification tries to mimic the human understanding of similarity. When it comes to long or larger time series datasets, state-of-the-art classifiers reach their limits because of unreasonably high training or testing times. One representative example is the 1-nearest-neighbor DTW classifier (1-NN DTW) that is commonly used as the benchmark to compare to. It has several shortcomings: it has a quadratic time complexity in the time series length and its accuracy degenerates in the presence of noise. To reduce the computational complexity, early abandoning techniques, cascading lower bounds, or recently, a nearest centroid classifier have been introduced. Still, classification times on datasets of a few thousand time series are in the order of hours. We present our Bag-Of-SFA- Symbols in Vector Space (BOSS VS) classifier that is accurate, fast and robust to noise. We show that it is significantly more accurate than 1-NN DTW while being multiple orders of magnitude faster. Its low computational complexity combined with its good classification accuracy makes it relevant for use cases like long or large amounts of time series or real-time analytics.
Springer Link: http://link.springer.com/article/10.1007/s10618-015-0441-y
Download links: Code Data

Reproducible research
Skopus: Mining top-k sequential patterns under leverage
Francois Petitjean, Tao Li, Nikolaj Tatti, Geoffrey I. Webb.

Room: 300B
2016-09-22; 15:10 - 15:30
Authors:
- Francois Petitjean
- Tao Li
- Nikolaj Tatti
- Geoffrey I. Webb.
Abstract:

This paper presents a framework for exact discovery of the top-k sequential patterns under Leverage. It combines (1) a novel definition of the expected support for a sequential pattern — a concept on which most interestingness measures directly rely — with (2) SkOPUS: a new branch-and-bound algorithm for the exact discovery of top-k sequential patterns under a given measure of interest. Our interestingness measure employs the partition approach. A pattern is interesting to the extent that it is more frequent than can be explained by assuming independence between any of the pairs of patterns from which it can be composed. The larger the support compared to the expectation under independence, the more interesting is the pattern. We build on these two elements to exactly extract the k sequential patterns with highest leverage, consistent with our definition of expected support. We conduct experiments on both synthetic data with known patterns and real-world datasets; both experiments confirm the consistency and relevance of our approach with regard to the state of the art.
Springer Link: http://link.springer.com/article/10.1007/s10618-016-0467-9
Download links: Code Data

Top-k overlapping densest subgraphs
Esther Galbrun, Aristides Gionis, Nikolaj Tatti.

Room: 1000A
2016-09-21; 12:10 - 12:30
Authors:
- Esther Galbrun
- Aristides Gionis
- Nikolaj Tatti.
Abstract:

Finding dense subgraphs is an important problem in graph mining and has many practical applications. At the same time, while large real-world networks are known to have many communities that are not well-separated, the majority of the existing work focuses on the problem of finding a single densest subgraph. Hence, it is natural to consider the question of finding the top-k densest subgraphs. One major challenge in addressing this question is how to handle overlaps: eliminating overlaps completely is one option, but this may lead to extracting subgraphs not as dense as it would be possible by allowing a limited amount of overlap. Furthermore, overlaps are desirable as in most real-world graphs there are vertices that belong to more than one community, and thus, to more than one densest subgraph. In this paper we study the problem of finding top-k overlapping densest subgraphs, and we present a new approach that improves over the existing techniques, both in theory and practice. First, we reformulate the problem definition in a way that we are able to obtain an algorithm with constant-factor approximation guarantee. Our approach relies on using techniques for solving the max-sum diversification problem, which however, we need to extend in order to make them applicable to our setting. Second, we evaluate our algorithm on a collection of benchmark datasets and show that it convincingly outperforms the previous methods, both in terms of quality and efficiency.
Springer Link: http://link.springer.com/article/10.1007/s10618-016-0464-z

Reproducible research
Using regression makes extraction of shared variation in multiple datasets easy
Jussi Korpela, Andreas Henelius, Lauri Ahonen, Arto Klami, Kai Puolamäki.

Room: 300A
2016-09-22; 14:50 - 15:10
Authors:
- Jussi Korpela
- Andreas Henelius
- Lauri Ahonen
- Arto Klami
- Kai Puolamäki.
Abstract:

In many data analysis tasks it is important to understand the relationships between different datasets. Several methods exist for this task but many of them are limited to two datasets and linear relationships. In this paper, we propose a new efficient algorithm, termed cocoreg, for the extraction of variation common to all datasets in a given collection of arbitrary size. cocoreg extends redundancy analysis to more than two datasets, utilizing chains of regression functions to extract the shared variation in the original data space. The algorithm can be used with any linear or non-linear regression function, which makes it robust, straightforward, fast, and easy to implement and use. We empirically demonstrate the efficacy of shared variation extraction using the cocoreg algorithm on five artificial and three real datasets.
Springer Link: http://link.springer.com/article/10.1007/s10618-016-0465-y
Download links: Code

Springer Machine Learning

A topological insight into restricted Boltzmann machines
Decebal Constantin Mocanu, Elena Mocanu, Phuong H. Nguyen, Madeleine Gibescu, Antonio Liotta

Room: 1000B
2016-09-22; 16:40 - 17:00
Authors:
- Decebal Constantin Mocanu
- Elena Mocanu
- Phuong H. Nguyen
- Madeleine Gibescu
- Antonio Liotta
Abstract:

Restricted Boltzmann Machines (RBMs) and models derived from them have been successfully used as basic building blocks in deep artificial neural networks for automatic features extraction, unsupervised weights initialization, but also as density estimators. Thus, their generative and discriminative capabilities, but also their computational time are instrumental to a wide range of applications. Our main contribution is to look at RBMs from a topological perspective, bringing insights from network science. Firstly, here we show that RBMs and Gaussian RBMs (GRBMs) are bipartite graphs which naturally have a small-world topology. Secondly, we demonstrate both on synthetic and real-world datasets that by constraining RBMs and GRBMs to a scale-free topology (while still considering local neighborhoods and data distribution), we reduce the number of weights that need to be computed by a few orders of magnitude, at virtually no loss in generative performance. Thirdly, we show that, for a fixed number of weights, our proposed sparse models (which by design have a higher number of hidden neurons) achieve better generative capabilities than standard fully connected RBMs and GRBMs (which by design have a smaller number of hidden neurons), at no additional computational costs.
Springer Link: http://link.springer.com/article/10.1007/s10994-016-5570-z

Adaptive Trajectory Analysis of Replicator Dynamics for Data Clustering
Morteza Haghir Chehreghani

Room: 300A
2016-09-20; 17:20 - 17:40
Authors:
- Morteza Haghir Chehreghani
Abstract:

We study the use of replicator dynamics for data clustering and structure identification. We investigate that replicator dynamics, while running, reveals informative transitions that correspond to the significant cuts over data. Occurrence of such transitions is significantly faster than the convergence of replicator dynamics. We exploit this observation to design an efficient clustering algorithm in two steps: i) Cut Identification, and ii) Cluster Pruning. We propose an appropriate regularization to accelerate the appearance of transitions which leads to an adaptive replicator dynamics. A main computational advantage of this regularization is that the optimal solution of the corresponding objective function can be still computed via performing a replicator dynamics. Our experiments on synthetic and real-world datasets show the effectiveness of our algorithm compared to the alternatives.
Springer Link: http://link.springer.com/article/10.1007/s10994-016-5573-9

Reproducible research
ALR^n: accelerated higher-order logistic regression
Nayyar Zaidi, Geoffrey Webb, Francois Petitjean, Mark Carman, Jesus Crequides

Room: 1000A
2016-09-20; 15:10 - 15:30
Authors:
- Nayyar Zaidi
- Geoffrey Webb
- Francois Petitjean
- Mark Carman
- Jesus Crequides
Abstract:

This paper introduces Accelerated Logistic Regression: a hybrid generative- discriminative approach to training Logistic Regression with high-order features. We present two main results: (1) that our combined generative-discriminative approach significantly improves the efficiency of Logistic Regression and (2) that incorporating higher order features (i.e.\ features that are the Cartesian products of the original features) reduces the bias of Logistic Regression, which in turn significantly reduces its error on large datasets. We assess the efficacy of Accelerated Logistic Regression by conducting an extensive set of experiments on 75 standard datasets. We demonstrate its competitiveness, particularly on large datasets, by comparing against state-of-the-art classifiers including Random Forest and Averaged $n$-Dependence Estimators.
Springer Link: http://link.springer.com/article/10.1007/s10994-016-5574-8
Download links: Code Data

Cost-sensitive boosting algorithms: Do we really need them?
Nikolaos Nikolaou, Narayanan Unny Edakunni, Meelis Kull, Peter Flach, Gavin Brown

Room: 1000A
2016-09-21; 15:30 - 15:50
Authors:
- Nikolaos Nikolaou
- Narayanan Unny Edakunni
- Meelis Kull
- Peter Flach
- Gavin Brown
Abstract:

We provide a unifying perspective for two decades of work on cost-sensitive Boosting algorithms. When analyzing the literature 1997-2016, we find 15 distinct cost-sensitive variants of the original algorithm; each of these has its own motivation and claims to superiority -- so who should we believe? In this work we critique the Boosting literature using four theoretical frameworks: Bayesian decision theory, the functional gradient descent view, margin theory, and probabilistic modelling. Our finding is that only three algorithms are fully supported -- and the probabilistic model view suggests that all require their outputs to be calibrated for best performance. Experiments on 18 datasets across 21 degrees of imbalance support the hypothesis -- showing that once calibrated, they perform equivalently, and outperform all others. Our final recommendation -- based on simplicity, flexibility and performance -- is to use the original Adaboost algorithm with a shifted decision threshold and calibrated probability estimates.
Springer Link: http://link.springer.com/article/10.1007/s10994-016-5572-x

Factorizing LambdaMART for cold start recommendations
Phong Nguyen, Jun Wang, A. Kalousis

Room: 300A
2016-09-22; 12:20 - 12:40
Authors:
- Phong Nguyen
- Jun Wang
- A. Kalousis
Abstract:

Recommendation systems often rely on point-wise loss metrics such as the mean squared error. However, in real recommendation settings only few items are presented to a user. This observation has recently encouraged the use of rank-based metrics. LambdaMART is the state-of-the-art algorithm in learning to rank which relies on such a metric. Motivated by the fact that very often the users' and items' descriptions as well as the preference behavior can be well summarized by a small number of hidden factors, we propose a novel algorithm, LambdaMART Matrix Factorization (LambdaMART-MF), that learns latent representations of users and items using gradient boosted trees. The algorithm factorizes lambdaMART by defining relevance scores as the inner product of the learned representations of the users and items. We regularise the learned latent representations so that they reflect the user and item manifolds as these are defined by their original feature based descriptors and the preference behavior. Finally we also propose to use a weighted variant of NDCG to reduce the penalty for similar items with large rating discrepancy. We experiment on two very different recommendation datasets, meta-mining and movies-users, and evaluate the performance of LambdaMART-MF, with and without regularization, in the cold start setting as well as in the simpler matrix completion setting. The experiments show that the factorization of LambdaMart brings significant performance improvements both in the cold start and the matrix completion settings. The incorporation of regularisation seems to have a smaller performance impact.
Springer Link: http://link.springer.com/article/10.1007/s10994-016-5579-3

Fast and scalable Lasso via stochastic Frank-Wolfe methods with a convergence guarantee
Emanuele Frandi, Ricardo Ñanculef, Stefano Lodi, Claudio Sartori, Johan A. K. Suykens

Room: 1000B
2016-09-20; 15:10 - 15:30
Authors:
- Emanuele Frandi
- Ricardo Ñanculef
- Stefano Lodi
- Claudio Sartori
- Johan A. K. Suykens
Abstract:

Frank-Wolfe (FW) algorithms have been often proposed over the last few years as efficient solvers for a variety of optimization problems arising in the field of Machine Learning. The ability to work with cheap projection-free iterations and the incremental nature of the method make FW a very effective choice for many large-scale problems where computing a sparse model is desirable. In this paper, we present a high-performance implementation of the FW method tailored to solve large-scale Lasso regression problems, based on a randomized iteration, and prove that the convergence guarantees of the standard FW method are preserved in the stochastic setting. We show experimentally that our algorithm outperforms several existing state of the art methods, including the Coordinate Descent algorithm by Friedman et al. (one of the fastest known Lasso solvers), on several benchmark datasets with a very large number of features, without sacrificing the accuracy of the model. Our results illustrate that the algorithm is able to generate the complete regularization path on problems of size up to four million variables in less than one minute.
Springer Link: http://link.springer.com/article/10.1007/s10994-016-5578-4

Interpretable Domain Adaptation via Optimization over the Stiefel Manifold
Christian Poelitz, Wouter Duivesteijn, Katharina Morik

Room: 300A
2016-09-22; 17:00 - 17:20
Authors:
- Christian Poelitz
- Wouter Duivesteijn
- Katharina Morik
Abstract:

In domain adaptation, the goal is to find common ground between two, potentially differently distributed, data sets. By finding common concepts present in two sets of words pertaining to different domains, one could leverage the performance of a classifier for one domain for use on the other domain. We propose a solution to the domain adaptation task, by efficiently solving an optimization problem through Stochastic Gradient Descent. We provide update rules that allow us to run Stochastic Gradient Descent directly on a matrix manifold: the steps compel the solution to stay on the Stiefel manifold. This manifold encompasses projection matrices of word vectors onto low-dimensional latent feature representations, which allows us to interpret the results: the rotation magnitude of the word vector projection for a given word corresponds to the importance of that word towards making the adaptation. Beyond this interpretability benefit, experiments show that the Stiefel manifold method performs better than state-of-the-art methods.
Springer Link: http://link.springer.com/article/10.1007/s10994-016-5577-5

Learning to Control a Structured-Prediction Decoder for Detection of HTTP-Layer DDoS Attackers
Uwe Dick, Tobias Scheffer

Room: 1000B
2016-09-22; 11:20 - 11:40
Authors:
- Uwe Dick
- Tobias Scheffer
Abstract:

We focus on the problem of detecting clients that attempt to exhaust server resources by flooding a service with protocol-compliant HTTP requests. Attacks are usually coordinated by an entity that controls many clients. Modeling the application as a structured-prediction problem allows the prediction model to jointly classify a multitude of clients based on their cohesion of otherwise inconspicuous features. Since the resulting output space is too vast to search exhaustively, we employ greedy search and techniques in which a parametric controller guides the search. We apply a known method that sequentially learns the controller and the structured-prediction model. We then derive an online policy-gradient method that finds the parameters of the controller and of the structured-prediction model in a joint optimization problem; we obtain a convergence guarantee for the latter method. We evaluate and compare the various methods based on a large collection of traffic data of a web-hosting service.
Springer Link: http://link.springer.com/article/10.1007/s10994-016-5581-9

Lifted generative learning of Markov logic networks
Jan Van Haaren, Guy Van den Broeck, Wannes Meert, Jesse Davis

Room: 300B
2016-09-20; 12:40 - 13:00
Authors:
- Jan Van Haaren
- Guy Van den Broeck
- Wannes Meert
- Jesse Davis
Abstract:

Markov logic networks (MLNs) are a well-known statistical relational learning formalism that combines Markov networks with first-order logic. MLNs attach weights to formulas in first-order logic. Learning MLNs from data is a challenging task as it requires searching through the huge space of possible theories. Additionally, evaluating a theory’s likelihood requires learning the weight of all formulas in the theory. This in turn requires performing probabilistic inference, which, in general, is intractable in MLNs. Lifted inference speeds up probabilistic inference by exploiting symmetries in a model. We explore how to use lifted inference when learning MLNs. Specifically, we investigate generative learning where the goal is to maximize the likelihood of the model given the data. First, we provide a generic algorithm for learning maximum likelihood weights that works with any exact lifted inference approach. In contrast, most existing approaches optimize approximate measures such as the pseudo-likelihood. Second, we provide a concrete parameter learning algorithm based on first-order knowledge compilation. Third, we propose a structure learning algorithm that learns liftable MLNs, which is the first MLN structure learning algorithm that exactly optimizes the likelihood of the model. Finally, we perform an empirical evaluation on three real-world datasets. Our parameter learning algorithm results in more accurate models than several competing approximate approaches. It learns more accurate models in terms of test-set log-likelihood as well as prediction tasks. Furthermore, our tractable learner outperforms intractable models on prediction tasks suggesting that liftable models are a powerful hypothesis space, which may be sufficient for many standard learning problems.
Springer Link: http://link.springer.com/article/10.1007/s10994-015-5532-x

On the Need for Structure Modelling in Sequence Prediction
Niall Twomey, Tom Diethe, Peter Flach

Room: 1000A
2016-09-21; 11:10 - 11:30
Authors:
- Niall Twomey
- Tom Diethe
- Peter Flach
Abstract:

There is no uniform approach in the literature for modelling sequential correlations in sequence classification problems. It is easy to find examples of unstructured models (e.g. logistic regression) where correlations are not taken into account at all, but there are also many examples where the correlations are explicitly incorporated into a -- potentially computationally expensive -- structured classification model (e.g. conditional random fields). In this paper we lay theoretical and empirical foundations for clarifying the types of problem which necessitate direct modelling of correlations in sequences, and the types of problem where unstructured models that capture sequential aspects solely through features are sufficient. The theoretical work in this paper shows that the rate of decay of auto-correlations within a sequence is related to the excess classification risk that is incurred by ignoring the structural aspect of the data. This is an intuitively appealing result, demonstrating the intimate link between the auto- correlations and excess classification risk. Drawing directly on this theory, we develop well-founded visual analytics tools that can be applied a priori on data sequences and we demonstrate how these tools can guide practitioners in specifying feature representations based on auto-correlation profiles. Empirical analysis is performed on three sequential datasets. With baseline feature templates, structured and unstructured models achieve similar performance, indicating no initial preference for either model. We then apply the visual analytics tools to the datasets, and show that classification performance in all cases is improved over baseline results when our tools are involved in defining feature representations.
Springer Link: http://link.springer.com/article/10.1007/s10994-016-5571-y

Probabilistic Inference for Determining Options in Reinforcement Learning
Christian Daniel, Herke van Hoof, Jan Peters, Gerhard Neumann

Room: 1000A
2016-09-22; 10:00 - 10:30
Authors:
- Christian Daniel
- Herke van Hoof
- Jan Peters
- Gerhard Neumann
Abstract:

Tasks that require many sequential decisions or complex solutions are hard to solve using conventional reinforcement learning algorithms. Based on the semi Markov decision process setting (SMDP) and the option framework, we propose a model which aims to alleviate these concerns. Instead of learning a single monolithic policy, the agent learns a set of simpler sub-policies as well as the initiation and termination probabilities for each of those sub-policies. While existing option learning algorithms frequently require manual specification of components such as the sub-policies, we present an algorithm which infers all relevant components of the option framework from data. Furthermore, the proposed approach is based on parametric option representations and works well in combination with current policy search methods, which are particularly well suited for continuous real-world tasks. We present results on SMDPs with discrete as well as continuous state-action spaces. The results show that the presented algorithm can combine simple sub-policies to solve complex tasks and can improve learning performance on simpler tasks.
Springer Link: http://link.springer.com/article/10.1007/s10994-016-5580-x

Rényi divergence minimization based co-regularized multiview clustering
Shalmali Joshi, Joydeep Ghosh, Mark Reid, Oluwasanmi Koyejo

Room: 300A
2016-09-20; 17:00 - 17:20
Authors:
- Shalmali Joshi
- Joydeep Ghosh
- Mark Reid
- Oluwasanmi Koyejo
Abstract:

Multiview clustering is a framework for grouping objects given multiple views, e.g. text and image views describing the same set of entities. This paper introduces co- regularization techniques for multiview clustering that explicitly minimize a weighted sum of divergences to impose coherence between per-view learned models. Specifically, we iteratively minimize a weighted sum of divergences between posterior memberships of clusterings, thus learning view-specific parameters that produce similar clusterings across views. We explore a flexible family of divergences, namely Rènyi divergences for co-regularization. An existing method of probabilistic multiview clustering is recovered as a special case of the proposed method. Extensive empirical evaluation suggests improved performance over a variety of existing multiview clustering techniques as well as related methods developed for information fusion with multiview data.
Springer Link: http://link.springer.com/article/10.1007/s10994-016-5543-2

Demo Track

A Tool for Subjective and Interactive Visual Data Exploration
Bo Kang, Kai Puolamaki, Jefrey Lijffijt, Tijl De Bie

Room: Belvedere
2016-09-22; 18:00 - 18:02
Authors:
- Bo Kang (Ghent University)
- Kai Puolamaki (Finnish Institute of Occupational Health)
- Jefrey Lijffijt (Ghent University)
- Tijl De Bie (Ghent University)
Abstract:

We present SIDE, a tool for Subjective and Interactive Visual Data Exploration, which lets users explore high dimensional data via subjectively informative 2D data visualizations. Many existing visual analytics tools are either restricted to specific problems and domains or they aim to find visualizations that align with user's belief about the data. In contrast, our generic tool computes data visualizations that are surprising given a user's current understanding of the data. The user's belief state is represented as a set of projection tiles. Hence, this user-awareness offers users an efficient way to interactively explore yet-unknown features of complex high dimensional datasets.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46131-1_1

Bipeline: a Web-based Visualization Tool for Biclustering of Multivariate Time Series
Ricardo Cachucho, Kaihua Liu, Siegfried Nijssen, Arno Knobbe

Room: Belvedere
2016-09-20; 18:00 - 18:02
Authors:
- Ricardo Cachucho (LIACS, Leiden University)
- Kaihua Liu (LIACS, Leiden University)
- Siegfried Nijssen (LIACS, Leiden University)
- Arno Knobbe
Abstract:

Large amounts of multivariate time series data are being generated every day. Understanding this data and finding patterns in it is a contemporary task. To find prominent patterns present in multivariate time series, one can use biclustering, that is looking for patterns both in subsets of variables that show coherent behavior and in a number of time periods. For this, an experimental tool is needed. Here, we present Bipeline, a web-based visualization tool that provides both experts and non-experts with a pipeline for experimenting with multivariate time series biclustering. With Bipeline, it is straightforward to save experiments and try different biclustering algorithms, enabling users to intuitively go from pre-processing to visual analysis of biclusters.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46131-1_3

Coordinate transformations for characterization and cluster analysis of spatial configurations in football
Gennady Andrienko, Natalia Andrienko, Guido Budziak, Tatiana von Landesberger, Hendrik Weber

Room: Belvedere
2016-09-20; 18:02 - 18:04
Authors:
- Gennady Andrienko (Fraunhofer Institute IAIS)
- Natalia Andrienko (Fraunhofer Institute IAIS)
- Guido Budziak (TU Eindhoven)
- Tatiana von Landesberger (TU Darmstadt)
- Hendrik Weber (DFL Deutsche Fussball Liga GmbH)
Abstract:

Current technologies allow movements of the players and the ball in football matches to be tracked and recorded with high accuracy and temporal frequency. We demonstrate an approach to analyzing football data with the aim to find typical patterns of spatial arrangement of the field players. It involves transformation of original coordinates to relative positions of the players and the ball with respect to the center and attack vector of each team. From these relative positions, we derive features for characterizing spatial configurations in different time steps during a football game. We apply clustering to these features, which groups the spatial configurations by similarity. By summarizing groups of similar configurations, we obtain representation of spatial arrangement patterns practiced by each team. The patterns are represented visually by density maps built in the teams’ relative coordinate systems. Using additional displays, we can investigate under what conditions each pattern was applied.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46131-1_6

DANCer: dynamic attributed network with community structure generator
Christine Largeron, Oualid Benyahia, Baptiste Jeudy, Osmar Zaiane

Room: Belvedere
2016-09-22; 18:02 - 18:04
Authors:
- Christine Largeron (University of Lyon)
- Oualid Benyahia (Jean Monnet University)
- Baptiste Jeudy
- Osmar Zaiane (University of Alberta)
Abstract:

We propose a new generator for dynamic attributed networks with community structure which follow the known properties of real-world networks such as preferential attachment, small world and homophily. After the generation, the different graphs forming the dynamic network as well as its evolution can be displayed in the interface. Several measures are also computed to evaluate the properties verified by each graph. Finally, the generated dynamic network, the parameters and the measures can be saved as a collection of files.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46131-1_9

Exploratory Analysis of Text Collections Through Visualization and Hybrid Biclustering
Nicolas Médoc, Mohammad Ghoniem, Mohamed Nadif

Room: Belvedere
2016-09-20; 18:04 - 18:06
Authors:
- Nicolas Médoc (LIST)
- Mohammad Ghoniem (LIST)
- Mohamed Nadif (University Paris Descartes)
Abstract:

We propose a visual analytics tool to support analytic journalists in the exploration of large text corpora. Our tool combines graph modularity-based diagonal biclustering to extract high-level topics with overlapping bi-clustering to elicit fine-grained topic variants. A hybrid topic treemap visualization gives the analyst an overview of all topics. Coordinated sunburst and heatmap visualizations let the analyst inspect and compare topic variants and access document content on demand.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46131-1_13

Finding incident-related social media messages for emergency awareness
Alexander Nieuwenhuijse, Jorn Bakker, Mykola Pechenizkiy

Room: Belvedere
2016-09-22; 18:04 - 18:06
Authors:
- Alexander Nieuwenhuijse (Coosto)
- Jorn Bakker (Coosto)
- Mykola Pechenizkiy (Eindhoven University of Technology)
Abstract:

An information retrieval framework is proposed which searches for incident-related social media messages in an automated fashion. Using P2000 messages as an input for this framework and by extracting location information from text, using simple natural language processing techniques, a search for incident-related messages is conducted. A machine learned ranker is trained to create an ordering of the retrieved messages, based on their relevance. This provides an easy accessible interface for emergency response managers to aid them in their decision making process.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46131-1_15

GMMbuilder - User-Driven Discovery of Clustering Structure for Bioarchaeology
Markus Mauder, Eirini Ntoutsi, Yulia Bobkova

Room: Belvedere
2016-09-20; 18:06 - 18:08
Authors:
- Markus Mauder (LMU Munich)
- Eirini Ntoutsi (Leibniz Universitaet Hannover)
- Yulia Bobkova (LMU Munich)
Abstract:

We present the GMMbuilder tool that allows domain scientists to build Gaussian Mixture Models (GMM) that adhere to domain specific knowledge. Domain experts use the tool to generate different models, extract stable object communities across these models and use these communities to interactively design a final clustering model that explains the data but also considers prior beliefs and expectations of the domain experts.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46131-1_2

h(odor): Interactive Discovery of Hypotheses on the Structure-Odor Relationship in Neuroscience
Guillaume Bosc, Marc Plantevit, Moustafa Bensafi, Jean-Francois Boulicaut, Mehdi Kaytoue

Room: Belvedere
2016-09-22; 18:06 - 18:08
Authors:
- Guillaume Bosc (INSA de Lyon)
- Marc Plantevit (University Lyon 1)
- Moustafa Bensafi (CRNL)
- Jean-Francois Boulicaut (LIRIS-CNRS)
- Mehdi Kaytoue (INSA Lyon)
Abstract:

From a molecule to the brain perception, olfaction is a complex phenomenon that remains to be fully understood in neuroscience. Latest studies reveal that the physico-chemical properties of volatile molecules can partly explain the odor perception. Neuroscientists are then looking for new hypotheses to guide their research: physico-chemical descriptors distinguishing a subset of perceived odors. To answer this problem, we present the platform h(odor) that implements descriptive rule discovery algorithms suited for this task. Most importantly, the ol- faction experts can interact with the discovery algorithm to guide the search in a huge description space w.r.t their non-formalized background knowledge thanks to an ergonomic user interface.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46131-1_4

INSIGHT: Dynamic Traffic Management Using Heterogeneous Urban Data
Nikolaos Zygouras, Nikolaos Panagiotou, Ioannis Katakis, Dimitrios Gunopulos, Nikos Zacheilas, Ioannis Mpoutsis, Vana Kalogeraki, stephen Lynch, Brendan O'Brien, Dermot Kinane, Jakub Marecek, Jia Yuan Yu, Rudi Verargo, Elizabeth Daly, Nico Piatkowksi, Thomas Liebig, Christian Bockermann, Katharina Morik, Matthias Weidlich, Francois Schnitzler, Avigdor Gal, Shie Mannor, Hendrik Stange, Werner Halft, Gennady Andrienko

Room: Belvedere
2016-09-20; 18:08 - 18:10
Authors:
- Nikolaos Zygouras (National and Kapodistrian Univ)
- Nikolaos Panagiotou (National and Kapodistrian University of Athens)
- Ioannis Katakis (National and Kapodistrian Univ)
- Dimitrios Gunopulos (National and Kapodistrian University of Athens)
- Nikos Zacheilas (Athens University of Economics and Business)
- Ioannis Mpoutsis (Athens University of Economics and Business)
- Vana Kalogeraki (Athens University of Economics and Business)
- stephen Lynch (Dublin City Council)
- Brendan O'Brien (Dublin City Council)
- Dermot Kinane (Dublin City Council)
- Jakub Marecek (IBM Research, Ireland)
- Jia Yuan Yu (Concordia University)
- Rudi Verargo (IBM Research, Ireland)
- Elizabeth Daly (IBM Research, Ireland)
- Nico Piatkowksi (TU Dortmund)
- Thomas Liebig (Technical University of Dortmund, Germany)
- Christian Bockermann (Technical University of Dortmund, Germany)
- Katharina Morik (Technical University of Dortmund)
- Matthias Weidlich (Humboldt-Universität zu Berlin)
- Francois Schnitzler (Technicolor)
- Avigdor Gal (Technion - Israel Institude of Technology, Israel)
- Shie Mannor (Technion - Israel Institude of Technology, Israel)
- Hendrik Stange (Fraunhofer IAIS, Germany)
- Werner Halft (Fraunhofer IAIS, Germany)
- Gennady Andrienko (Fraunhofer Institute IAIS)
Abstract:

In this demo we present INSIGHT, a system that provides traffic event detection in Dublin by exploiting Big Data and Crowdsourcing techniques. Our system is able to process and analyze input from multiple heterogeneous urban data sources.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46131-1_5

Learning Language Models from Images with ReGLL
Leonor Becerra-Bonache, Hendrik Blockeel, María Galván, Francois Jacquenet

Room: Belvedere
2016-09-20; 18:10 - 18:12
Authors:
- Leonor Becerra-Bonache
- Hendrik Blockeel (Katholike Universiteit Leuven)
- María Galván
- Francois Jacquenet (University of Saint-Etienne)
Abstract:

In this demonstration, we present ReGLL, a system that is able to learn language models taking into account the perceptual context in which the sentences of the model are produced. Thus, ReGLL learns from pairs (Context, Sentence) where: Context is given in the form of an image whose objects have been identified, and Sentence gives a (partial) description of the image. ReGLL uses Inductive Logic Programming Techniques and learns some mappings between n-grams and first order representations of their meanings. The demonstration shows some applications of the language models learned, such as generating relevant sentences describing new images given by the user and translating some sentences from one language to another without the need of any parallel corpus.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46131-1_12

Leveraging Spatial Abstraction in Traffic Analysis and Forecasting with Visual Analytics
Natalia Andrienko, Gennady Andrienko, Salvatore Rinzivillo

Room: Belvedere
2016-09-20; 18:12 - 18:14
Authors:
- Natalia Andrienko (Fraunhofer Institute IAIS)
- Gennady Andrienko (Fraunhofer Institute IAIS)
- Salvatore Rinzivillo (Istituto di Scienza e Tecnologie dell'Informazione, Area della Ricerca CNR)
Abstract:

By applying spatio-temporal aggregation to traffic data consisting of vehicle trajectories, we generate a spatially abstracted transportation network, which is a directed graph where nodes stand for territory compartments (areas in geographic space) and links (edges) are abstractions of the possible paths between neighboring areas. From time series of traffic characteristics obtained for the links, we reconstruct mathematical models of the interdependencies between the traffic intensity (a.k.a. traffic flow or flux) and mean velocity. Graphical representations of these interdependencies have the same shape as the fundamental diagram of traffic flow through a physical street segment, which is known in transportation science. This key finding substantiates our approach to traffic analysis, forecasting, and simulation leveraging spatial abstraction. We present the process of data-driven generation of traffic forecasting and simulation models, in which each step is supported by visual analytics techniques.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46131-1_7

Ranking Researchers through Collaboration Pattern Analysis
Mario Cataldi, Luigi Di Caro, Claudio Schifanella

Room: Belvedere
2016-09-22; 18:08 - 18:10
Authors:
- Mario Cataldi (Université Paris 8)
- Luigi Di Caro (Università di Torino)
- Claudio Schifanella (RAI Research Centre)
Abstract:

The academic world utterly relies on the concept of scientific collaboration. As in every collaborative network, however, the production of research articles follows hidden co-authoring principles as well as temporal dynamics which generate latent, and complex, collaboration patterns. In this paper, we present an online advanced tool for real-time rankings of computer scientists under these perspectives.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46131-1_11

SITS-P2miner: Pattern-Based Mining of Satellite Image Time Series
Tuan Nguyen, Nicolas MEGER, Christophe Rigotti, Catherine Pothier, Remi Andreoli

Room: Belvedere
2016-09-22; 18:10 - 18:12
Authors:
- Tuan Nguyen (Université Savoie Mont Blanc)
- Nicolas MEGER (Université Savoie Mont Blanc)
- Christophe Rigotti (INSA Lyon)
- Catherine Pothier (INSA Lyon)
- Remi Andreoli (Bluecham S.A.S.)
Abstract:

This paper presents a mining system for extracting patterns from Satellite Image Time Series. This system is a fully-ﬂedged tool comprising four main modules for pre-processing, pattern extraction, pattern ranking and pattern visualization. It is based on the extraction of grouped frequent sequential patterns and on swap randomization.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46131-1_14

The SPMF Open-Source Data Mining Library Version 2
Philippe Fournier-Viger, Chun-Wei Lin, Antonio Gomariz, Ted Gueniche, Azadeh Soltani, Zhihong Deng, Thanh Lam Hoang

Room: Belvedere
2016-09-20; 18:14 - 18:16
Authors:
- Philippe Fournier-Viger (Harbin Institute of Technology)
- Chun-Wei Lin (Harbin Institute of Technology)
- Antonio Gomariz (University of Murcia)
- Ted Gueniche (University of Moncton)
- Azadeh Soltani (University of Mashhad)
- Zhihong Deng (Peking University)
- Thanh Lam Hoang (IBM)
Abstract:

SPMF is an open-source data mining library, specialized in pattern mining, oﬀering implementations of more than 120 data mining algorithms. It has been used in more than 310 research papers to solve applied problems in a wide range of domains from authorship attribution to restaurant recommendation. Its implementations are also commonly used as benchmarks in research papers, and it has also been integrated in several data analysis software programs. After three years of development, this paper introduces the second major revision of the library, named SPMF 2, which provides (1) more than 60 new algorithm implementations (including novel algorithms for sequence prediction), (2) an improved user interface with pattern visualization (3) a novel plug-in system, (4) improved performance, and (5) support for text mining.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46131-1_8

Topy: Real-time Story Tracking via Social Tags
Gevorg Poghosyan, M. Atif Qureshi, Georgiana Ifrim

Room: Belvedere
2016-09-22; 18:12 - 18:14
Authors:
- Gevorg Poghosyan (Insight Centre for Data Analyt)
- M. Atif Qureshi (Insight Centre for Data Analytics)
- Georgiana Ifrim (Insight Centre for Data Analytics)
Abstract:

We present the Topy system, which automates real-time story tracking by utilizing crowd-sourced tagging on social media platforms. The system employs a state-of-the-art Twitter hashtag recommender to continuously annotate news articles with hashtags, a rich meta-data source that allows connecting articles under drastically different timelines than typical keyword based story tracking systems. Employing social tags for story tracking has the following advantages: (1) story tracking via social annotation of news enables the detection of emerging concepts and topic drift; (2) hashtags allow going beyond topics by grouping articles based on connected themes (e.g., #rip, #blacklivesmatter, #icantbreath); (3) hashtags allow linking articles that focus on subplots of the same story (e.g., #palmyra, #isis, #refugeecrisis).
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46131-1_10

TwitterCracy: Exploratory Monitoring of Twitter Streams for the 2016 U.S. Presidential Election Cycle
Muhammad Atif Qureshi, Arjumand Younus, Derek Greene

Room: Belvedere
2016-09-22; 18:14 - 18:16
Authors:
- Muhammad Atif Qureshi (Insight, UCD)
- Arjumand Younus (Insight, UCD)
- Derek Greene (Insight, UCD)
Abstract:

We present TwitterCracy, an exploratory search system that allows users to search and monitor across the Twitter streams of political entities. Its exploratory capabilities stem from the application of lightweight time-series based clustering together with biased PageRank to extract facets from tweets and presenting them in a manner that facilitates exploration.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46131-1_16

Industrial Track

Concept Neurons - Handling Drift Issues for Real-Time Industrial Data Mining
Luis Matias, Joao Gama, Joao Moreira

Room: 1000B
2016-09-23; 17:05 - 17:30
Authors:
- Luis Matias (NEC)
- Joao Gama (University of Porto)
- Joao Moreira (LIAAD-INESC TEC)
Abstract:

Learning from data streams is a challenge faced by data science professionals from multiple industries. Most of them struggle hardly on applying traditional Machine Learning algorithms to solve these problems. It happens so due to their high availability on ready-to-use software libraries on big data technologies (e.g. SparkML). Nevertheless, most of them cannot cope with the key characteristics of this type of data such as high arrival rate and/or non-stationary distributions. In this paper, we introduce a generic and yet simplistic framework to fill this gap denominated Concept Neurons. It leverages on a combination of continuous inspection schemas and residual-based updates over the model parameters and/or the model output. Such framework can empower the resistance of most of induction learning algorithms to concept drifts. Two distinct and hence closely related flavors are introduced to handle different drift types. Experimental results on successful distinct applications on different domains along transportation industry are presented to uncover the hidden potential of this methodology.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46131-1_18

Active Learning with Rationales for Identifying Operationally Significant Anomalies in Aviation
Manali Sharma, Kamalika Das, Mustafa Bilgic, Bryan Matthews, David Nielsen, Nikunj Oza

Room: 1000B
2016-09-23; 12:05 - 12:30
Authors:
- Manali Sharma (Illinois Institute of Technology)
- Kamalika Das (NASA Ames Research Center)
- Mustafa Bilgic (Illinois Institute of Technology)
- Bryan Matthews (SGT Inc., NASA Ames)
- David Nielsen (MORi Associates, NASA Ames)
- Nikunj Oza (NASA Ames)
Abstract:

A major focus of the commercial aviation community is discovery of unknown safety events in flight operational data. Data-driven unsupervised anomaly detection methods are better at capturing unknown safety events compared to rule-based methods which only look for known violations. However, not all statistical anomalies that are discovered by these unsupervised anomaly detection methods are operationally significant (e.g., represent a safety concern). Subject Matter Experts (SMEs) have to spend significant time reviewing these statistical anomalies individually to identify a few operationally significant ones. In this paper we propose an active learning algorithm that incorporates SME feedback in the form of rationales to build a classifier that can distinguish between uninteresting and operationally significant anomalies. Experimental evaluation on real aviation data shows that our approach improves detection of operationally significant events by as much as 75% compared to the state-of-the-art. The learnt classifier also generalizes well to additional validation data sets.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46131-1_25

Automatic Detection of Non-Biological Artifacts in ECGs Acquired During Cardiac Computed Tomography
Rustem Bekmukhametov, Sebastian Pölsterl, Nassir Navab, thomas Allmendinger, Minh-duc Doan

Room: 1000B
2016-09-23; 17:30 - 17:55
Authors:
- Rustem Bekmukhametov (Technische Universität München)
- Sebastian Pölsterl (Technische Universität München)
- Nassir Navab (Technische Universität München)
- thomas Allmendinger (Siemens Healthcare GmbH)
- Minh-duc Doan (Siemens Healthcare GmbH)
Abstract:

Cardiac computed tomography is a non-invasive technique to image the beating heart. One of the main concerns during the procedure is the total radiation dose imposed on the patient. Prospective electrocardiogram (ECG) gating methods may notably reduce the radiation exposure. However, very few investigations address accompanying problems encountered in practice. Several types of unique non-biological factors, such as the dynamic electrical field induced by rotating components in the scanner, influence the ECG and can result in artifacts that can ultimately cause prospective ECG gating algorithms to fail. In this paper, we present an approach to automatically detect non-biological artifacts within ECG signals, acquired in this context. Our solution adapts discord discovery, robust PCA, and signal processing methods for detecting such disturbances. It achieved an average area under the precision-recall curve (AUPRC) and receiver operating characteristics curve (AUROC) of 0.996 and 0.997 in our cross-validation experiments based on 2,581 ECGs. External validation on a separate hold-out dataset of 150 ECGs, annotated by two domain experts (88% inter-expert agreement), yielded average AUPRC and AUROC scores of 0.890 and 0.920. Our solution is deployed to automatically detect non-biological anomalies within a continuously updated database, currently holding over 120,000 ECGs.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46131-1_24

Do Street Fairs Boost Local Businesses? A Quasi-Experimental Analysis Using Social Network Data
Ke Zhang, Konstantinos Pelechrinis

Room: 1000B
2016-09-23; 15:50 - 16:15
Authors:
- Ke Zhang (University of Pittsburgh)
- Konstantinos Pelechrinis (University of Pittsburgh)
Abstract:

Local businesses and retail stores are a crucial part of local economy. Local governments design policies for facilitating the growth of these businesses that can consequently have positive externalities on the local community. However, many times these policies have completely opposite from the expected results (e.g., free curb parking instead of helping businesses has been illustrated to actually hurt them due to small turnover per spot). Hence, it is important to evaluate the outcome of such policies in order to provide educated decisions for the future. In the era of social and ubiquitous computing, mobile social media, such as Foursquare, form a platform that can help towards this goal. Data from these platforms capture semantic information of human mobility from which we can distill the potential economic activities taking place. In this paper we focus on street fairs (e.g., arts festivals) and evaluate their ability to boost economic activities in their vicinity. In particular, we collected data from Foursquare for the three month period between June 2015 and August 2015 from the city of Pittsburgh. During these period several street fairs took place. Using these events as our case study we analyzed the data utilizing propensity score matching and a quasi-experimental technique inspired by the difference-in differences method. Our results indicate that street fairs provide positive externalities to nearby businesses. We further analyzed the spatial reach of this impact and we find that it can extend up to 0.6 miles from the epicenter of the event.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46131-1_22

Dynamic Traffic Management for Smart Cities
Nikolaos Zygouras, Nikolaos Panagiotou, Ioannis Katakis, Dimitrios Gunopulos, Nikos Zacheilas, Ioannis Mpoutsis, Vana Kalogeraki, stephen Lynch, Brendan O'Brien

Room: 1000B
2016-09-23; 10:45 - 11:20
Authors:
- Nikolaos Zygouras (National and Kapodistrian Univ)
- Nikolaos Panagiotou (National and Kapodistrian University of Athens)
- Ioannis Katakis (National and Kapodistrian Univ)
- Dimitrios Gunopulos (National and Kapodistrian University of Athens)
- Nikos Zacheilas (Athens University of Economics and Business)
- Ioannis Mpoutsis (Athens University of Economics and Business)
- Vana Kalogeraki (Athens University of Economics and Business)
- stephen Lynch (Dublin City Council)
- Brendan O'Brien (Dublin City Council)
Abstract:

Urban data management is already an essential element of modern cities. The authorities can build on the variety of automatically generated information and develop intelligent services that improve citizens daily life, save environmental resources or aid in coping with emergencies. From a data mining perspective, urban data introduce a lot of challenges. Data volume, velocity and veracity are some obvious obstacles. However, there are even more issues of equal importance like data quality, resilience, privacy and security. In this paper we describe the development of a set of techniques and frameworks that aim at effective and efficient urban data management in real settings. To do this, we collaborated with the city of Dublin and worked on real problems and data. Our solutions were integrated in a system that was evaluated and is currently utilized by the city.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46131-1_23

ECG Monitoring in Wearable Devices by Sparse Models
Diego Carrera, Beatrice Rossi, Pasqualina Fragneto, Daniele Zambon, Giacomo Boracchi

Room: 1000B
2016-09-23; 16:40 - 17:05
Authors:
- Diego Carrera (Politecnico di Milano)
- Beatrice Rossi (STMicroelectronics)
- Pasqualina Fragneto (STMicroelectronics)
- Daniele Zambon (STMicroelectronics)
- Giacomo Boracchi (Politecnico di Milano)
Abstract:

Because of user movements and activities, heartbeats recorded from wearable devices typically feature a large degree of variability in their morphology. Learning problems, which in ECG monitoring often involve learning a user-specific model to describe the heartbeat morphology, become more challenging. Our study, conducted on ECG tracings acquired from the Pulse Sensor -- a wearable device from our industrial partner -- shows that dictionaries yielding sparse representations can be reliably used to model heartbeats acquired in typical wearable-device settings. In particular, we show that sparse representations allow to effectively detect heartbeats having an anomalous morphology. Remarkably, the whole ECG monitoring can be executed online on the device, and the dictionary can be conveniently reconfigured at each device positioning, possibly relying on an external host.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46131-1_21

Engine Misfire Detection With Pervasive Mobile Audio
Joshua Siegel, Sumeet Kumar, Isaac Ehrenberg, Sanjay Sarma

Room: 1000B
2016-09-23; 15:25 - 15:50
Authors:
- Joshua Siegel (MIT)
- Sumeet Kumar (MIT)
- Isaac Ehrenberg (MIT)
- Sanjay Sarma (MIT)
Abstract:

We address the problem of detecting whether an engine is misfiring by using machine learning techniques on transformed audio data collected from a smartphone. We recorded audio samples in an uncontrolled environment and extracted Fourier, Wavelet and Mel-frequency Cepstrum features from normal and abnormal engines. We then implemented Fisher score and Relief score based variable ranking to obtain an informative reduced feature set for training and testing classification algorithms. Using this feature set, we were able to obtain a model accuracy of over 99% using a linear SVM applied to outsample data. This application of machine learning to vehicle subsystem monitoring simplifies traditional engine diagnostics, aiding vehicle owners in the maintenance process and opening up new avenues for pervasive mobile sensing and automotive diagnostics.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46131-1_26

Finding Dynamic Co-evolving Zones in Spatial-temporal time series Data
Yun Cheng, Xiucheng Li, Yan Li

Room: 1000B
2016-09-23; 12:30 - 12:55
Authors:
- Yun Cheng (Air Scientific)
- Xiucheng Li
- Yan Li
Abstract:

Co-evolving patterns exist in many Spatial-temporal time series Data, which shows invaluable information about evolving patterns of the data. However, due to the sensor readings' spatial and temporal heterogeneity, how to find the stable and dynamic co-evolving zones remains an unsolved issue. In this paper, we proposed a novel divide-and-conquer strategy to find the dynamic co-evolving zones that systematically leverages the heterogeneity challenges. The precision of spatial inference and temporal prediction improved by 7% and 8% respectively by using the found patterns, which shows the effectiveness of the found patterns. The system has also been deployed with the Haidian Ministry of Environmental Protection, Beijing, China, providing accurate spatial-temporal predictions and help the government make more scientific strategies for environment treatment.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46131-1_20

PULSE: A Real Time System for Crowd Flow Prediction at Metropolitan Subway Stations
Ermal Toto, Elke Rundensteiner, Yanhua Li, Kajal Claypool, Richard Jordan, Mariya Ishutkina, Jun Luo, Fan Zhang

Room: 1000B
2016-09-23; 12:55 - 13:20
Authors:
- Ermal Toto (Worcester Polytechnic Inst.)
- Elke Rundensteiner (WPI)
- Yanhua Li (Worcester Polytechnic Institute)
- Kajal Claypool (MIT Lincoln Laboratory)
- Richard Jordan (MIT Lincoln Laboratory)
- Mariya Ishutkina (MIT Lincoln Laboratory)
- Jun Luo (Shenzhen Institute of Advanced Technology)
- Fan Zhang (Shenzhen Institute of Advanced Technology)
Abstract:

The fast pace of urbanization has given rise to complex transportation networks, such as subway systems, that deploy smart card readers generating detailed transactions of mobility. Predictions of human movement based on these transaction streams represents tremendous new opportunities from optimizing fleet allocation of on-demand transportation such as UBER and LYFT to dynamic pricing of services. However, transportation research thus far has primarily focused on tackling other challenges from traffic congestion to network capacity. To take on this new opportunity, we propose a real-time framework, called PULSE (Prediction Framework For Usage Load on Subway SystEms), that offers accurate multi-granular arrival crowd flow prediction at subway stations. PULSE extracts and employs two types of features such as streaming features and station profile features. Streaming features are time-variant features including time, weather, and historical traffic at subway stations (as time-series of arrival/departure streams), where station profile features capture the time-invariant unique characteristics of stations, including each station's peak hour crowd flow, remoteness from the downtown area, and mean flow, etc. Then, given a future prediction interval, we design novel stream feature selection and model selection algorithms to select the most appropriate machine learning models for each target station and tune that model by choosing an optimal subset of stream traffic features from other stations. We evaluate our PULSE framework using real transaction data of 11 million passengers from a subway system in Shenzhen, China. The results demonstrate that PULSE greatly improves the accuracy of predictions at all subway stations by up to 49% over baseline algorithms.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46131-1_19

Using Social Media to Promote STEM Education: Matching College Students with Role Models
Ling He, Jiebo Luo, Lee Murphy

Room: 1000B
2016-09-23; 11:40 - 12:05
Authors:
- Ling He (University of Rochester)
- Jiebo Luo (University of Rochester)
- Lee Murphy
Abstract:

STEM (Science, Technology, Engineering, and Mathematics) fields have become increasingly central to U.S. economic competitiveness and growth. The shortage in the STEM workforce has brought promoting STEM education upfront. The rapid growth of social media usage provides a unique opportunity to predict users' real-life identities and interests from online texts and photos. In this paper, we propose an innovative approach by leveraging social media to promote STEM education: matching Twitter college student users with diverse LinkedIn STEM professionals using a ranking algorithm based on the similarities of their demographics and interests. We share the belief that increasing STEM presence in the form of introducing career role models who share similar interests and demographics will inspire students to develop interests in STEM related fields and emulate their models. Our evaluation on 2,000 real college students demonstrated the accuracy of our ranking algorithm. We also design a novel implementation that recommends matched role models to the students.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46131-1_17

Nectar Track

A KDD Process for Discrimination Discovery
Salvatore Ruggieri, Franco Turini

Room: Belvedere
2016-09-20; 17:20 - 17:40
Authors:
- Salvatore Ruggieri (University of Pisa)
- Franco Turini (University of Pisa)
Abstract:

The acceptance of analytical methods for discrimination discovery by practitioners and legal scholars can be only achieved if the data mining and machine learning communities will be able to provide case studies, methodological refinements, and the consolidation of a KDD process. We summarize here an approach along these directions.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46131-1_28

Data Mining Meets HCI: Data and Visual Analytics of Frequent Patterns
C Leung, Christopher Carmichael , Yaroslav Hayduk , Fan Jiang

Room: Belvedere
2016-09-20; 12:40 - 13:00
Authors:
- C Leung (University of Manitoba)
- Christopher Carmichael (University of Manitoba)
- Yaroslav Hayduk (University of Manitoba)
- Fan Jiang (University of Manitoba)
Abstract:

As a popular data mining tasks, frequent pattern mining discovers implicit, previously unknown and potentially useful knowledge in the form of sets of frequently co-occurring items or events. Many existing data mining algorithms return to users with long textual lists of frequent patterns, which may not be easily comprehensible. As a picture is worth a thousand words, having a visual means for humans to interact with computers would be beneficial. This is when human-computer interaction (HCI) research meets data mining research. In particular, the popular HCI task of data and result visualization could help data miners to visualize the original data and to analyze the mined results (in the form of frequent patterns). In this paper, we present a few systems for data and visual analytics of frequent patterns, which integrate (i) data analytics and mining with (ii) data and result visualization.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46131-1_37

From Plagiarism Detection to Bible Analysis: The Potential of Machine Learning for Grammar-Based Text Analysis
Michael Tschuggnall, Günther Specht

Room: Belvedere
2016-09-20; 16:40 - 17:00
Authors:
- Michael Tschuggnall (University of Innsbruck)
- Günther Specht (University of Innsbruck)
Abstract:

The amount of textual data available from digitalized sources such as free online libraries or social media posts has increased drastically in the last decade. In this paper, the main idea to analyze authors by their grammatical writing style is presented. In particular, tasks like authorship attribution, plagiarism detection or author profiling are tackled using the presented algorithm, revealing promising results. Thereby all of the presented approaches are ultimately solved by machine learning algorithms.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46131-1_27

Learning from Software Project Histories
Verena Honsel, Steffen Herbold, Jens Grabowski

Room: Belvedere
2016-09-20; 11:20 - 11:40
Authors:
- Verena Honsel (University of Goettingen)
- Steffen Herbold (University of Goettingen)
- Jens Grabowski (University of Goettingen)
Abstract:

In software project planning project managers have to keep track of several things simultaneously including the estimation of the consequences of decisions about, e.g., the team constellation. The application of machine learning techniques to predict possible outcomes is a widespread research topic in software engineering. In this paper, we summarize our work in the field of learning from project history.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46131-1_32

Local Exceptionality Detection on Social Interaction Networks
Martin Atzmueller

Room: Belvedere
2016-09-20; 15:30 - 15:50
Authors:
- Martin Atzmueller (University of Kassel)
Abstract:

Local exceptionality detection on social interaction networks includes the analysis of resources created by humans (e.g. social media) as well as those generated by sensor devices in the context of (complex) interactions. This paper provides a structured overview on a line of work comprising a set of papers that focus on data-driven exploration and modeling in the context of social network analysis, community detection and pattern mining.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46131-1_39

Machine learning challenges for single cell data
Sofie Van Gassen, Tom Dhaene, Yvan Saeys

Room: Belvedere
2016-09-20; 12:00 - 12:20
Authors:
- Sofie Van Gassen (VIB Inflammation Research Center, Ghent and Ghent University, iMinds)
- Tom Dhaene (Ghent University, iMinds)
- Yvan Saeys (VIB Inflammation Research Center, Ghent and Ghent University)
Abstract:

Recent technological advances in the fields of biology and medicine allow measuring single cells into unprecedented depth. This results in new types of high-throughput datasets that shed new lights on cell development, both in healthy as well as diseased tissues. However, studying these biological processes into greater detail crucially depends on novel computational techniques that efficiently mine single cell data sets. In this paper, we introduce machine learning techniques for single cell data analysis: we summarize the main developments in the field, and highlight a number of interesting new avenues that will likely stimulate the design of new types of machine learning algorithms.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46131-1_34

Machine Learning for Crowdsourced Spatial Data
Musfira Jilani, Padraig Corcoran, Michela Bertolotto

Room: Belvedere
2016-09-20; 15:50 - 16:10
Authors:
- Musfira Jilani (University College Dublin)
- Padraig Corcoran (Cardiff University)
- Michela Bertolotto (University College Dublin)
Abstract:

Recent years have seen a significant increase in the number of applications requiring accurate and up-to-date spatial data. In this context crowdsourced maps such as OpenStreetMap (OSM) have the potential to provide a free and timely representation of our world. However, one factor that negatively influences the proliferation of these maps is the uncertainty about their data quality. This paper presents structured and unstructured machine learning methods to automatically assess and improve the semantic quality of streets in the OSM database.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46131-1_38

Multi-Target Classification: Methodology and Practical Case Studies
Mark Last

Room: Belvedere
2016-09-20; 17:00 - 17:20
Authors:
- Mark Last (Ben-Gurion University of the Negev)
Abstract:

Most classification algorithms are aimed at predicting the value or values of a single target (class) attribute. However, some real-world classification tasks involve several targets that need to be predicted simultaneously. The Multi-Objective Info-Fuzzy Network (M-IFN) algorithm builds an ordered (oblivious) decision-tree model for a multi-target classification task. After summarizing the principles and the properties of the M-IFN algorithm, this paper reviews three case studies of applying M-IFN to practical problems in industry and science.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46131-1_35

Practical Bayesian Inverse Reinforcement Learning for Robot Navigation
Billy Okal, Kai Arras

Room: Belvedere
2016-09-20; 11:00 - 11:20
Authors:
- Billy Okal (University of Freiburg)
- Kai Arras (University of Freiburg)
Abstract:

IRL provides a concise framework for learning behaviors from human demonstrations; and is highly desired in practical and difficult to specify tasks such as normative robot navigation. However, most existing IRL algorithms are often ladened with practical challenges such as representation mismatch and poor scalability when deployed in real world tasks. Moreover, standard RL representations often do not allow for incorporation of task constraints common for example in robot navigation. In this paper, we present an approach that tackles these challenges in a unified manner and delivers a learning setup that is both practical and scalable. We develop a graph-based spare representation for RL and a scalable IRL algorithm based on sampled trajectories. Experimental evaluation in simulation and from a real deployment in a busy airport demonstrate the strengths of the learning setup over existing approaches.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46131-1_33

Resource-Aware Steel Production Through Data Mining
Hendrik Blom, Katharina Morik

Room: Belvedere
2016-09-20; 11:40 - 12:00
Authors:
- Hendrik Blom (TU Dortmund University)
- Katharina Morik (TU Dortmund University)
Abstract:

Today's steel industry is characterized by overcapacity and increasing competitive pressure. There is a need for continuously improving processes, with a focus on consistent enhancement of efficiency, improvement of quality and thereby better competitiveness. About 70% of steel is produced using the BF-BOF (Blast Furnace - Blow Oxygen Furnace) route worldwide. The BOF is the first step of controlling the composition of the steel and has an impact on all further processing steps and the overall quality of the end product. Multiple sources of process-related variance and overall harsh conditions for sensors and automation systems in general lead to a process complexity that is not easy to model with thermodynamic or metallurgical approaches. In this paper we want to give an insight how to improve the output quality with machine learning based modeling and which constraints and requirements are necessary for an online application in real-time.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46131-1_31

Time and Again: Time Series Mining via Recurrence Quantification Analysis
Stephan Spiegel, Norbert Marwan

Room: Belvedere
2016-09-20; 14:50 - 15:10
Authors:
- Stephan Spiegel (Technical University Berlin)
- Norbert Marwan (Potsdam Institute for Climate Impact Research)
Abstract:

Recurrence quantification analysis (RQA) was developed in order to quantify differently appearing recurrence plots (RPs) based on their small-scale structures, which generally indicate the number and duration of recurrences in a dynamical system. Although RQA measures are traditionally employed in analyzing complex systems and identifying transitions, recent work has shown that they can also be used for pairwise dissimilarity comparisons of time series. We explain why RQA is not only a modern method for nonlinear data analysis but also is a very promising technique for various time series mining tasks.
Springer Link: http://link.springer.com/chapter/10.1007/978-3-319-46131-1_30

Accepted papers

Conference Track Journal Track Demo Track Industrial Track Nectar Track

Monday - Morning

Preference-based Pattern Mining
Go to website

Organizers: Bruno Cremilleux, Marc Plantevit, Arnaud Soulet
Description:

This tutorial focuses on the recent shift from constraint-based pattern mining to preference-based pattern mining. Constraint-based pattern mining is now a mature domain of data mining that makes it possible to handle various different pattern domains (e.g., itemsets, sequences, graphs) with a large variety of constraints thanks to solid theoretical foundations and an efficient algorithmic machinery. Even though, it has been realized for a long time that it is difficult for the end-user to model her interest in term of constraints and above to overcome the well-known thresholding issue, researchers have only recently intensified their study of methods for finding high-quality patterns according to the user's preferences. In this tutorial, we discuss the need of preferences in pattern mining, the principles and methods of the use of preferences in pattern mining. Many methods are derived from constraint-based pattern mining by integrating utility functions or interestingness measures as quantitative preference model. This approach transforms pattern mining in an optimization problem guided by user specified preferences. However, in practice, the user has only a vague idea of what useful patterns could be. The recent research field of interactive pattern mining relies on the automatic acquisition of these preferences.
Website: http://liris.cnrs.fr/~mplantev/doku/doku.php?id=preferencebasedpatternminingtutorial

Machine Learning Meets Philosophy: From Epistemology to Ethics
Go to website

Organizers: Marcello Pelillo, Teresa Scantamburlo
Description:

In recent years there has been a revival of interest around the philosophical issues underpinning machine learning research, from both the computer scientist’s and the philosopher’s camps. This suggests that the time is ripe to attempt establishing a long-term dialogue between the two communities with a view to foster cross-fertilization of ideas. The goal of this tutorial is to provide a timely and coherent picture of the state of the art in the field and to stimulate a discussion and a debate within our community. This could be an opportunity for reflection, reassessment and eventually some synthesis, with the aim of providing the field a self-portrait of where it currently stands and where it is going as a whole. Our discussion will be focused on both the epistemological and the ethical perspectives. On the one hand, we will explore the problem of induction and causality around which machine learning and philosophy have built some of the most intriguing and passionate discussions, offering a number of insights that are still inspiring today’s research (e.g., inductivism vs. falsificationism, Bayesianism, model selection, etc.). On the other hand, we will consider how machine learning is raising profound philosophical questions concerning the ethical implications of data-driven decision making, which reminds us of Wiener’s visionary lesson. As we shall see, specific causes for concern include the opacity of learning algorithms (transparency and accountability), the potential discriminative effects of algorithmic inference (fairness) and the disclosure of personal data (privacy). We shall assume no pre-existing knowledge of philosophy by the audience, thereby making the tutorial self-contained and understandable by a non-expert.
Website: http://www.dsi.unive.it/~scantamburlo/PhilML/index.html

Comparing competing algorithms: Bayesian versus frequentist hypothesis testing
Go to website

Organizers: Giorgio Corani, Alessio Benavoli, Janez Demsar
Description:

Hypothesis testing in machine learning – for instance to establish whether the performance of two algorithms is significantly di↵erent – is usually performed using null hypothesis significance tests (nhst). Yet the nhst methodology has well-known drawbacks. For instance, the claimed statistical significances do not necessarily imply practical significance. Moreover, nhst cannot verify the null hypothesis and thus cannot recognize equivalent classifiers. Most important, nhst does not answer the question of the researcher: which is the probability of the null and of the alternative hypothesis, given the observed data? Bayesian hypothesis tests overcome such problems. They compute the posterior probability of the null and the alternative hypothesis. This allows to detect equivalent classifiers and to claim statistical significances which have a practical impact. We will review Bayesian counterpart of the most commonly test adopted in machine learning, such as the correlated t-test and the signed-rank test. We will also show software implementing such test for the most common platforms (R, Python, etc.)
Website: http://ipg.idsia.ch/tutorials/2016/bayesian-tests-ml/

Monday - Afternoon

Context-Aware Knowledge Discovery: Opportunities, Techniques and Applications
Go to website

Organizers: Cesar Ferri, Peter Flach, Meelis Kull, Nicolas Lachiche
Description:

Traditionally, knowledge discovery aims to learn patterns from given data that are expected to apply to future data. Often there is relevant contextual information that, although it can have a considerable effect on the quality of the results, is rarely taken into account as it is not directly represented in the training data. This tutorial aims to elucidate the role of context and context changes in the knowledge discovery process, and to demonstrate how recent research advances in contextaware machine learning and data mining can be put to practical use. The tutorial will cover the main types of context changes, including changes in costs, data distribution and others. Participants will develop basic skills in choosing the appropriate modelling techniques and visualisation tools for the construction, selection, adaptation and understanding of versatile and contextaware models. We will discuss how data mining methodologies such as CRISPDM can be extended to take context change into account. The tutorial will not only equip the attendees with new technical and methodological knowledge, but also encourage an anticipatory attitude towards context change.
Website: https://github.com/REFRAME/tutorial

Learning from Hospital Data and Learning from Cohorts
Go to website

Organizers: Panagiotis Papapetrou, Myra Spiliopoulou
Description:

Data mining is intensively used in medicine and healthcare. Electronic Health Records (EHRs) are perceived as big medical data. On them, scientists strive to perform predictions on patients' progress while in the hospital, to detect adverse drug effects, and to identify phenotypes of correlated diseases (as they occur in a hospital), among other learning tasks. Next to EHRs, medical research is no less interested in learning from cohort data, i.e., from a carefully selected set of persons with and without the outcome under observation. From these data, which are small in numbers but have a big number of dimensions, scientists want, e.g., to predict how people with and without a disease evolve, to assess how they respond to a treatment, and to identify phenotypes of a disease as it occurs in the population. In this tutorial, we discuss learning on hospital data and learning on cohorts. We begin by introducing key terms and then discuss example objectives for mining on hospital data and on cohort data. Then, we focus on specific application areas. For the cohort data, we present examples of exploratory analysis on population-based and clinical studies, with emphasis on the role of time in these studies. For the hospital data, we present examples of learning from time-stamped data, heterogeneous data, and then focus on the problem of discovering adverse drug effects.
Website: http://www.kmd.ovgu.de/ecml_pkdd16_medical_mining_tutorial.html

Friday - Morning

Large-scale Learning from Data Streams in Evolving Environments
Go to website

Organizers: Moamar Sayed-Mouchaweh, João Gama, Hamid Bouchachia
Description:

This tutorial aims at discussing the problem of learning from data streams generated by evolving nonstationary processes. It will overview the advances of techniques, methods and tools that are dedicated to manage, exploit and interpret data streams in non-stationary environments. In particular, the event will examine the problems of modeling, prediction, and classification based on learning from data streams by: - Defining the problem of learning from data streams in evolving and complex non-stationary environments, its interests, and challenges, - Providing a general scheme of methods and techniques treating the problem of learning of data streams in evolving and complex non-stationary environments, - Presenting the majors methods and techniques treating the problem of learning of data streams in evolving and complex non-stationary environments, - Discussing the application of these methods and techniques in various real-world problems. This tutorial is combined with a workshop treating the same topic, which will be held in the afternoon.
Website: https://sites.google.com/site/streamevolv2016tutorial/program

The data scientist's guide for writing papers
Go to website

Organizers: Nikolaj Tatti
Description:

The goal of this tutorial is to provide guidelines and good practices on how to write scientific papers, with emphasis on writing technical sections. The tutorial consists of two parts. In the first part we will go over general philosophy for designing and typesetting mathematical notation, describing experiments, typesetting pseudo-code and tables, as well as, writing proofs. In addition, we will highlight typical mistakes done in computer science papers. The second part will focus on designing and typesetting high-quality visualizations. Here, our goal is two-fold: (i) we provide tools and ideas for designing complex non-standard visualizations, (ii) we provide guidelines on how to typeset standard plots, and highlight common errors done in data mining papers.
Website: https://users.ics.aalto.fi/ntatti/howtowrite2016/

Friday - Afternoon

An Introduction to Redescription Mining
Go to website

Organizers: Esther Galbrun, Pauli Miettinen
Description:

A biologist interested in bioclimatic habitats of species needs to find geographical areas that admit two characterizations, one in terms of their climatic profile and one in terms of the occupying species. For an ethnographer, matching the terms used by individuals of an ethnic group to call one another to their genealogical relationships might help elucidate the meaning of kinship terms they use.

These are just two examples of a general problem setting where we need to identify correspondences between data that have different nature (species vs. climate, kinship terminology vs. genealogical linkage). To identify the correspondences over binary data sets, Ramakrishnan et al. proposed redescription mining in 2004. Subsequent research has extended the problem formulation to more complex correspondences and data types, making it applicable to wide variety of data analysis tasks.

This tutorial is targeted at attendees with no prior knowledge of redescription mining as well as seasoned experts on the field. We will present the intuition behind redescription mining, formulation variants, and links to existing data analysis techniques such as classification and subgroup discovery. We will also discuss the current algorithms, not forgetting applications, for instance, to the identification of bio-climatic niches or in circuit design.
Website: http://siren.mpi-inf.mpg.de/tutorial/main/

Probabilistic logics in machine learning
Go to website

Organizers: Fabrizio Riguzzi
Description:

The combination of logic and probability proved very useful for modeling domains with complex and uncertain relationships among entities. Machine learning approaches based on such combinations have recently achieved important results, originating the fields of Statistical Relational Learning, Probabilistic Inductive Logic Programming and, more generally, Statistical Relational Artificial Intelligence. This tutorial will briefly introduce probabilistic logic programming and probabilistic description logics and overview the main systems for learning these formalisms both in terms of parameters and of structure. The tutorial includes a significant hands-on experience with the systems ProbLog2, PITA, TRILL and SLIPCOVER using their online interfaces.
Website: http://ds.ing.unife.it/~friguzzi/PLML-ECML/

Learning Bayesian Networks for Complex Relational Data
Go to website

Organizers: Oliver Schulte, Ted Kirkpatrick
Description:

Many organizations maintain critical data in a relational database. The tutorial describes the statistical and algorithmic challenges of constructing models from such data, compared to the single-table data that is the traditional emphasis of machine learning. We extend the popular model of Bayesian networks to relational data, integrating probabilistic associations across all tables in the database. Extensive research has shown that such models, accounting for relationships between instances of the same entity (such as actors featured in the same movie) and between instances of different entities (such as ratings of movies by viewers), have greater predictive accuracy than models learned from a single table. We illustrate these challenges and their solutions in a real-life running example, the Internet Movie Database. The tutorial shows how Bayesian networks support several relational learning tasks, such as querying multi-relational frequencies, linkbased classification, and relational anomaly detection. We describe how standard machine learning concepts can be extended and adapted for relational data, such as model likelihood, sufficient statistics, model selection, and statistical consistency. The tutorial addresses researchers with a background in machine learning who wish to apply graphical models to relational data.
Website: https://oschulte.github.io/srl-tutorial-slides/

Monday - Full-day

MLSA Machine Learning and Data Mining for Sports Analytics
Go to website

AALTD 2nd ECML/PKDD Workshop on Advanced Analytics and Learning on Temporal Data
Go to website

MIDAS The First Workshop on MIning DAta for financial applicationS
Go to website

NFmcp 5th International Workshop on New Frontiers in Mining Complex Patterns
Go to website

Organizers: Annalisa Appice, Michelangelo Ceci, Corrado Loglisci, Elio Masciari, Zbigniew W. Ras
Description:

Modern automatic systems are able to collect huge volumes of data, often with a complex structure (e.g. multi-table data, XML data, web data, time series and sequences, graphs and trees). This fact poses new challenges for current information systems with respect to storing, managing and mining these big sets of complex data. The purpose of this workshop is to bring together researchers and practitioners of data mining who are interested in the advances and latest developments in the area of extracting patterns from big and complex data sources like blogs, event or log data, biological data, spatio-temporal data, social networks, mobility data, sensor data and streams, and so on. The workshop aims at integrating recent results from existing fields such as data mining, statistics, machine learning and relational databases to discuss and introduce new algorithmic foundations and representation formalisms in pattern discovery. We are interested in advanced techniques which preserve the informative richness of data and allow us to efficiently and efficaciously identify complex information units present in such data. A non-exclusive list of topics for the complex pattern mining research is reported in the following:
- Foundations on pattern mining, pattern usage, and pattern understanding
- Mining stream, time-series and sequence data
- Mining networks and graphs
- Mining biological data
- Mining dynamic and evolving data
- Mining environmental and scientific data
- Mining heterogeneous and ubiquitous data
- Mining multimedia data
- Mining multi-relational data
- Mining semi-structured and unstructured data
- Mining spatio-temporal data
- Mining Big Data
- Social Media Analytics
- Ontology and metadata
- Privacy preserving mining
- Semantic Web and Knowledge Databases
Website: http://www.di.uniba.it/~loglisci/NFmcp2016/index.html

Monday - Morning

SDDM The 2nd ECML/PKDD 2016 workshop on Statistically Sound Data Mining
Go to website

Organizers: Wilhelmiina Hämäläinen, Geoff Webb
Description:

The field of statistics has developed sophisticated well-founded methods for inference from data. While some of these place computational or practical limits that made them infeasible to apply directly to many data mining problems, the field of data mining has much to gain from a more sophisticated understanding of the strengths and limitations of these techniques and from greater utilization of them where they are appropriate. The workshop topic is extremely important and topical, because a clear trend towards statistically sound data mining is currently emerging. It seems that a paradigm shift from the so-called Method-first to the Patterns-first approach is currently occurring. So, instead of defining easily searchable but possibly low-value or spurious patterns, one should rather design algorithms for statistically meaningful and valid patterns. This paradigm shift is strongly supported by the application fields, where the data mining methods are all the time gaining in popularity. Researchers of application fields cannot build their work on arbitrary, false or spurious discoveries, but they require guarantees of statistical significance and validity. The goal of this workshop is to bring together researchers from data mining, machine learning and statistics as well as application fields to discuss the problems and share ideas on statistically sound data mining.
Website: https://sites.google.com/site/whamalaipages/ssdm2016

Monday - Afternoon

SoGood Data Science for Social Good
Go to website

Friday - Full-day

DLPM Deep Learning for Precision Medicine
Go to website

Organizers: Bertram Müller-Myhsok, Cesare Furlanello
Description:

Deep Learning is expected to have a disruptive impact for functional genomics, with applications of high industrial and ethical relevance in pharmacogenomics and toxicogenomics. Examples in miRNA prediction already demonstrated the potential for deriving implicit features with high predictive accuracy. Following the emerging thread of interest emerged at the NIPS MLCB 2016 workshop, we wish to discuss about the best options for the adoption of deep learning models, both for improved accuracy as well as for better biomedical understanding from identification of patterns from internal features. Questions such as endtoend modeling from structure to functionality and biological impact as well as architectures for integration of genotype, expression and epigenetics would be of immediate interest for the workshop. We aim to create a connection between machine learning experts and leaders in the Precision Medicine initiatives in Europe and the USA. In particular, the workshop aims to link experts from the FDA SEQC2 initiative on Precision Medicine, which will pave the way for defining optimal procedures for the development of actionable drugs that can target phenotypeselected patient groups. We wish to discuss also technical challenges such as working with very large cohorts (e.g. from 60K to 300K subjects In molecular psychiatry studies) that are now amenable for modeling with deep learning. Further, family cohorts will challenge machine learning and bioinformatics experts for new efficient solutions. In summary, both methodological aspects from deep learning, machine learning, information technology, statistics as well as actual applications, pitfalls and (medical) needs are to be featured.
Website: https://dlpm2016.fbk.eu/

MML International Workshop on Music and Machine Learning
Go to website

Organizers: Rafael Ramirez, Darrell Conklin, José M. Iñesta
Description:

Machine learning has permeated nearly every area of music informatics, driven by a profusion of recordings available in digital audio formats, steady improvements to the accessibility and quality of symbolic corpora, availability of powerful algorithms in standard machine learning toolboxes, and theoretical advances in machine learning and data mining. As complexity of the problems investigated by researchers on machine learning and music increases, there is a need to develop new algorithms and methods to solve these problems. As a consequence, research on machine learning and music is an active and growing field reflected in international meetings such as the prior iterations of the International Workshop on Machine Learning and Music (MML). The expected outcome of the workshop is to promote fruitful multidisciplinary collaboration among researchers (computer scientists, musicians, and musicologists) who are using machine learning techniques in musical applications, by providing the opportunity to discuss ongoing work in the area. The expected attendees are active researchers in machine learning and music who have special interest in content-based music processing. Researchers from other disciplines (e.g. Cognitive Science, Musicology, and Music Psychology) are also welcome to contribute to the workshop.
Website: https://sites.google.com/site/musicmachinelearning16/

DMNLP 3rd Edition of the Workshop on Interactions between Data Mining and Natural Language Processing
Go to website

Organizers: Peggy Cellier, Thierry Charnois, Adreas Hotho, Stan Matwin, Marie-Francine Moens, Yannick Toussaint
Description:

Recently, a new field has emerged taking benefit of both Data Mining and NLP domains. The objective of DMNLP is thus to provide a forum to discuss how Data Mining can be interesting for NLP tasks, providing symbolic knowledge, but also how NLP can enhance data mining approaches by providing richer and/or more complex information to mine and by integrating linguistic knowledge directly in the mining process. The workshop aims at bringing together researchers from both communities in order to stimulate discussions about the cross-fertilization of those two research fields. The idea of this workshop is to discuss future directions and new challenges emerging from the cross-fertilization of Data Mining and NLP and in the same time initiate collaborations between researchers of both communities.
Website: http://dmnlp.loria.fr

Friday - Morning

DARE 4th International Workshop on Data Analytics for Renewable Energy Integration
Go to website

Organizers: Wei Lee Woon, Zeyar Aung, Stuart Madnick
Description:

In recent years, climate change, the depletion of natural resources and the resultant rises in energy costs have led to an increased focus on renewable sources of energy like wind and solar. A lot of research has been devoted to the technologies used to extract energy from these sources; however, once generated, this energy would have to be stored and distributed to consumers in a way that is efficient and cost effective – which would mean making use of traditional power distribution networks. Unfortunately these networks are frequently designed with large, centralized power stations in mind, while in contrast renewable energy sources tend to be spatially distributed and subject to large temporal variations in generation capacity. As such, a concerted research effort is required to integrate these resources into the existing infrastructure. This research is inherently multidisciplinary as it utilizes techniques from a large range of disciplines. Apart from electrical engineering, addressing this challenge will involve a particularly large number of computational aspects which call for the judicious use of techniques drawn from the domains of data analytics, pattern recognition and machine learning. Examples of relevant issues which are of critical importance include:
1. Forecasting of renewable energy supply and demand
2. Automated fault prediction, detection and identification
3. Mining of electricity users’ data to design customized tariffs and marketing plans
4. Cyber-Security of Smart Grid facilities
5. Decision support and coordination for supporting demand response applications
Website: http://dare2016.dnagroup.org/

MLLS 4th Workshop on Machine Learning in Life Sciences
Go to website

Organizers: Bartosz Krawczyk, Michał Woźniak
Description:

Life sciences, ranging from medicine, biology and genetics to biochemistry and pharmacology have developed rapidly in previous years. Computerization of those domains allowed to gather and store enormous collections of data. Analysis of such vast amounts of information without any support is impossible for human being. Therefore recently machine learning and pattern recognition methods have attracted the attention of broad spectrum of experts from life sciences domain. The aim of this Workshop is to stress the importance of interdisciplinary collaboration between life and computer sciences and to provide an international forum for both practitioners seeking new cutting-edge tools for solving their domain problems and theoreticians seeking interesting and real-life applications for their novel algorithms. We are interested in novel machine learning technologies, designed to tackle complex medical, biological, chemical or environmental data that take into consideration the specific background knowledge and interactions between the considered problems. We look for novel applications of machine learning and pattern recognition tools to contemporary life sciences problems, that will shed light on their strengths and weaknesses. We are interested in new methods for data visualization and methods for accessible presentation of results of machine learning analysis to life scientists. We welcome new findings in the intelligent processing of non-stationary medical, biological and chemical data and in proposals for efficient fusion of information coming from multiple sources. Papers on efficient analysis and classification of bid data (understood as both massive volumes and high-dimensionality problems) will be of special interest to this Workshop.
Website: http://mlls.kssk.pwr.edu.pl/

Friday - Afternoon

DADA 1st International Workshop on Domain Adaptation for Dialog Agents
Go to website

STREAMEVOLV-2016 Learning from Data Streams in Large-Scale Evolving Environments: Challenges, Methods and Applications
Go to website

Organizers: Moamar Sayed-Mouchaweh, João Gama, Hamid Bouchachia, Rita Ribeiro
Description:

The volume of data is rapidly increasing due to the development of the technology of information and communication. This data comes mostly in the form of streams. Learning from this ever-growing amount of data requires flexible learning models that self-adapt over time. In addition, these models must take into account many constraints: (pseudo) real-time processing, high-velocity, and dynamic multi-form change such as concept drift and novelty. This workshop welcomes novel research about learning from data streams in evolving environments. It will provide the researchers and participants with a forum for exchanging ideas, presenting recent advances and discussing challenges related to data streams processing. It solicits original work, already completed or in progress. Position papers are also considered. This workshop is combined with a tutorial treating the same topic and taking place the morning of the same day.
Website: https://sites.google.com/site/streamevolv2016/

ECML PKDD 2016 PhD Forum Monday, September 19, 2016

Introduction and Objectives

The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD) includes a PhD Forum on machine learning and knowledge discovery.

The purpose of this forum is to provide an environment specifically for junior PhD students to exchange ideas and experiences with peers in an interactive atmosphere and to get constructive feedback from senior researchers in data mining, machine learning, and related areas.

The focus of the discussion at the PhD Forum would be the work in progress of junior PhD students, with 1-3 years of research experience, towards their dissertation.

During the forum, senior researchers with experience in supervising and examining PhD students will be able to participate and provide feedback and advice to the participants.

It will be an excellent opportunity for developing person-to-person networks to the benefit of the PhD students in their future careers.

PhD Forum Program


08:45	Welcome & Remarks

09:00	Keynote Presentation on "How to Survive and Enjoy PhD" (abstract) 'Don't Panic – The Grad Student's Guide to a PhD in Data Science' by Jilles Vreeken Pursuing a PhD can be daunting, scary, stressful, and otherwise tricky business. It's not like anything you've done it before, right? However, with a bit of preparation, a few tips and tricks, and a dash of luck, pursuing a PhD can also easily be the best thing evah. In this talk I aim to give you that preparation, and share the tips and tricks I know of. And hopefully also entertain. Doing a PhD is already serious enough, after all.


09:35	Mini-Panel 1 (Q&A mentors and students)


10:15	Spotlight talks (2 minutes each) A Dependency-dependent Bound for Sums of Dependent Random Variables Alexander Zimin, Christoph Lampert (IST Austria)* Knowledge Integration using Latent Vector Representations Steffen Thoma (KIT)* Mining LBSNs and other ubiquitous data sources for crime prediction Cristina Kadar (ETH Zurich)* Policy Transfer using Value Function as Prior Information Samy Aittahar, Aivar Sootla, Damien Ernst (Université de Liège)* Real-time News Story Detection and Tracking with Hashtags KGevorg Poghosyan, Georgiana Ifrim (Insight Centre for Data Analytics)* Tensor Decomposition Techniques for Temporal Graph Mining Anna Sapienza, Laetitia Gauvin, Ciro Cattuto (ISI Foundation)* Tools and Learning Systems for Probabilistic Structured Data Giuseppe Cota, Riccardo Zese, Elena Bellodi, Evelina Lamma, Fabrizio Riguzzi (Universita di Ferrara) Towards Representation Learning with Tractable Probabilistic Graphical Models Antonio Vergari, Nicola Di Mauro, Floriana Esposito (University of Bari) Using Ensemble Methods to Boost Time Series Forecasting Accuracy Sascha Krstanovic, Heiko Paulheim (University of Mannheim)* Very Simple Classifier: a Concept Binary Classifier to Investigate Features Based on Subsampling and Locality Luca Masera, Enrico Blanzieri (University of Trento)*

10:40	Coffee break

11:00	Poster session (morning spotlights)



12:40	Lunch break


14:20	Keynote Presentation on "Experience Sharing" (abstract) 'How to Waste your PhD Years' by Dafna Shahaf Tips and tricks for having a bad career in research, including advice for writing bad papers, giving bad talks, and having zero impact. This talk focuses mostly on Computer Science PhDs, but many of the principles discussed are applicable elsewhere.


14:55	Mini-Panel 2 (Q&A mentors and students)


15:35	Spotlight talks (2 minutes each) Application of Convolutional Neural Networks in modelling of pharmaceutical properties Mikołaj Mizera, Maciej Ostrowicz, Judyta Cielecka-Piontek (Poznan University of Medical Sciences)* Deep Structured Models for Protein Structure Studies with Nuclear Magnetic Resonance Spectroscopy Piotr Klukowski, Adam Gonczarek (Wroclaw Univ. of Sci. and Tech.)* DGRMiner: Anomaly Detection and Explanation in Dynamic Graphs Karel Vaculík, Luboš Popelínský (Masaryk University)* Global optimization of energy consumption in the built environment Hussain Kazmi, Johan Driesen (KU Leuven)* Incremental One-Class Models for Data Classification Takoua Kefi, Riadh Ksantini (University of Windsor), Mohamed Kaâniche, Adel Bouhoula (Higher School of Communication of Tunis) Mining Approximate Functional Dependencies from Data Panagiotis Mandros, Mario Boley, Jilles Vreeken (MPI for Informatics) Nystrom Sketches Daniel Perry, Braxton Osting, Ross Whitaker (University of Utah)* SPAN: An MDL-Based Algorithm for Supervised Feature Segmentation Alexander Marx, Jilles Vreeken (MPI for Informatics)* Trust-based Recommender System Monika Rakoczy (Telecom SudParis)* Using Pairwise Comparisons and Active Learning for Preference Learning Nunung Qomariyah, Dimitar Kazakov (University of York)*

16:00	Coffee break

16:20	Poster session (afternoon spotlights)

18:00	Closing

Invited Speakers

We are proud to have Dafna Shahaf and Jilles Vreeken as our keynote speakers.

How to Waste your PhD Years

Dafna Shahaf

Dafna Shahaf is an Assistant Professor at the School of Computer Science and Engineering at the Hebrew University of Jerusalem. Her research is about making sense of massive amounts of data. She designs algorithms that help people understand the underlying structure of complex topics, connect the dots between pieces of information, and turn data into insight. She has also worked on computational methods for areas that are considered inherently human, such as humor. Her work has received multiple awards, including Best Research Paper at KDD, Early Career Award at IJCAI. She received a PhD in Computer Science from Carnegie Mellon University, and was a postdoctoral fellow at Microsoft Research and Stanford University.

Don't Panic – The Grad Student's Guide to a PhD in Data Science

Jilles Vreeken

Jilles Vreeken leads the Exploratory Data Analysis group at the DFG cluster of excellence on Multimodal Computing and Interaction at the Saarland University, Saarbrücken, Germany. In addition, he is a Senior Researcher in D5, the Databases and Information Systems group of prof. Gerhard Weikum, of the Max Planck Institute for Informatics.

His research interests include data mining and machine learning, exploratory data analysis and pattern mining. He is particularly interested in developing well-founded theory and efficient methods for extracting informative models and characteristic patterns from large data, and putting these to good use. He has authored over 55 conference and journal papers, 3 book chapters, won a few awards.

He is program co-chair for ECML PKDD 2016, was publicity co-chair for IUI 2015, sponsorship co-chair for ECML PKDD 2014, workshop co-chair of IEEE ICDM 2012. He co-organised eight workshops and four tutorials. He is a member of the editorial board of Data Mining and Knowledge Discovery (DAMI) and of the ECML PKDD Journal Track Guest Editorial Board, in addition he regularly reviews for TKDD, KAIS, TKDE, as well as for KDD, ICDM, SDM, ECML PKDD.

He obtained his M.Sc. in Computer Science from Universiteit Utrecht, the Netherlands. He pursued his Ph.D. at the same university under supervision of Arno Siebes, and defended his thesis 'Making Pattern Mining Useful' in 2009. Between 2009 and 2013 he was a post-doctoral researcher at the University of Antwerp, supported by a Post-doctoral Fellowship of the Research Foundation – Flanders (FWO).

The keynotes will be 35 minutes long, including questions.

Program Committee (Thank you!)

Leman Akoglu (Stony Brook University)
Mohammad Al Hasan (Indiana University - Purdue University)
Toon Calders (Antwerp University)
Tijl De Bie (Ghent University)
Carlotta Domeniconi (George Mason University)
Peter Flach (University of Bristol)
Aristides Gionis (Aalto University)
Sebastian Goebl (Ludwig-Maximilians-Universität München)

Kristian Kersting (University of Dortmund)
Zhenhui Li (Penn State University)
Huan Liu (Arizona State University)
Donato Malerba (University of Bari)
Pauli Miettinen (Max-Planck Institute for Informatics)
Bernhard Pfahringer (University of Waikato)
Guido Sanguinetti (University of Edinburgh)
Arno Siebes (Universiteit Utrecht)

Organizers

Leman Akoglu (Carnegie Mellon University)
Tijl de Bie (Ghent University)

The PhD Forum is organized in conjunction with ECML PKDD 2016.

ECML/PKDD 2016 Discovery Challenge

Bank Card Usage Analysis

The ECML/PKDD Discovery Challenge 2016 on Bank Card Usage Analysis asks you to predict the user behavior of the OTP Bank Hungary, a key bank in CEE Region. We give you one year list of card payment events with geolocation information.

The Bank wants to know wich branch will be visited by each customer to be able to optimize proactive contact list and plan distribution.

The customer will be proactively called in campaigns from the branch that will be visited with the highest probability. The bank expects higher conversion rates in branch campaigns if the call is made in the branch mostly prefered by the customer.

Challenge Website

https://dms.sztaki.hu/ecml-pkkd-2016

Program 2016-09-19

14:20 - 14:50: Introduction of the task, announcement of the winners, prize ceremony
14:50 - 15:05: Task 1 winner presentation
15:05 - 15:20: Task 2 winner presentation
15:20 - 15:30: Task 1 second place presentation
15:30 - 15:40: Task 2 second place presentation
15:40 - 15:50: Task 1 third place presentation
15:50 - 16:00: Task 2 third place presentation

SPHERE Challenge: Activity Recognition with Multimodal Sensor Data

Obesity, depression, stroke, falls, cardiovascular and musculoskeletal disease are some of the biggest health issues and fastest-rising categories of health-care costs. The financial expenditure associated with these is widely regarded as unsustainable and the impact on quality of life is felt by millions of people in the UK each day. Smart technologies can unobtrusively quantify activities of daily living, and these can provide long-term behavioural patterns that are objective, insightful measures for clinical professionals and caregivers.

To this end the EPSRC-funded “Sensor Platform for HEalthcare in Residential Environment (SPHERE)” Interdisciplinary Research Collaboration (IRC) has designed a multi-modal sensor system driven by data analytics requirements. The system is under test in a single house, and will be deployed in a general population of 100 homes in Bristol (UK). The data sets collected will be made available to researchers in a variety of communities.

Data is collected from the following three sensing modalities:

wrist-worn accelerometer;
RGB-D cameras (i.e. video with depth information); and
passive environmental sensors.

With these sensor data, we can learn patterns of behaviour, and can track the deterioration/progress of persons that suffer or recover from various medical conditions. To achieve this, we focus activity recognition over multiple tiers, with the two main prediction tasks of SPHERE including:

prediction of Activities of Daily Living (ADL) (e.g. tasks such as meal preparation, watching television); and
prediction of posture/ambulation (e.g. walking, sitting, transitioning).

Reliable predictions of ADL allows us to model behaviour and of residents over time, e.g. what does a typical day consist of, what times are particular activities performed etc. Prediction of posture and ambulation will complement ADL predictions, and can inform us about the physical well-being of the participant, how mobile/responsive is the participant, how activie/sedintary, etc.

Challenge Website

https://www.drivendata.org/competitions/sphere

Program 2016-09-19

16:20 - 16:40: Challenge Presentation
16:40 - 17:55: Challenger Presentations
17:55 - 18:00: Award Ceremony

cQA Challenge: Learning to Re-Rank Questions for Community Question Answering

Due to the extended use of Web forums, such as Yahoo! Answers or Stackoverflow, there has been a renewed interest in Community Question Answering (cQA). cQA combines traditional question answering with a modern Web scenario, where users pose questions hoping to get the right answers from other users. The most critical problem arises when a new question is asked in the forum. If the user's question is similar (even semantically equivalent) to a previously posted question, she/he should not wait for answers or for another user to address her/him to the relevant thread already archived in the forum. An automatic system can search for previously-posted relevant questions and instantaneously provide the found information.

In this challenge, given a new question and a set of questions previously posted to a forum, together with their corresponding answer threads, a machine learning model must rank the forum questions according to their relevance against the new user question.

Even if this task involves both Natural Language Processing (NLP) and Information Retrieval, the challenge focuses on the machine learning aspects of reranking the relevant questions. Therefore, we provide both the initial rank and the feature representation of training and test examples to the participants. We extract features from the text of the user and forum questions using advanced NLP techniques, e.g., syntactic parsing. Most interestingly, we also provide the Gram matrices of tree kernels applied to advanced structural tree representation. A few other features express the relevance of the thread comments, associated with the forum questions, against the user question.

Participants are expected to exploit these data for building novel and effective machine learning models for reranking the initial question list in a better rank according to Mean Average Precision (MAP).

Challenge Website

http://alt.qcri.org/ecml2016/

Program 2016-09-23

11:40 - 11:45: Opening
11:45 - 12:00: Task and Data Description
G. Da San Martino
12:00 - 12:25: Structural Kernel Methods for Recognizing Similar Questions in cQA
S. Filice
12:25 - 12:35: Statistics on the Challenge
A. Barron Cedeno
12:35 - 12:55: SVM-based Modeling with Pairwise Transformation for Learning to Re-Rank
J. Kubota, T. Unoki, K. Umezawa
12:55 - 13:15: Learning to Rank Questions for Community Question Answering with Ranking SVM
M. Nguyen, V. Phan, T. Nguyen, M. Nguyen
13:15 - 13:25: Concluding remarks

NetCla: The ECML-PKDD Network Classification Challenge

In recent years, there have been many proposals pushing for the use of Machine Learning (ML) in automatic network management. This challenge is one of the first explorations of ML for automatic network analysis. Our goal is to promote the use of ML for network-related tasks in general and, at the same time, to assess the participants' ability to quickly build a learning-based system showing a reliable performance. Additionally, one difficulty of using ML for network-related applications is the lack of datasets for training and evaluating different algorithms. The challenge provides one of the few datasets for this field, which may become a reference point for future and more advanced research.

As this is one of the first initiative in network classification, we started with a relatively simple multi-class single label classification task, where the labels are standard applications and signals are static network parameters. A more detailed description can be found on the challenge website.

Challenge Website

http://www.neteye-blog.com/netcla-the-ecml-pkdd-network-classification-challenge/

Program 2016-09-23

10:00 - 10:05: Opening
10:05 - 10:20: Machine Learning for 5G applications
Olga Uryupina
10:20 - 10:30: NetEye: Monitoring User Network Experience
S. Greiner
10:30 - 10:45: Data, Task and System Statistics
D. Bonadiman
10:45 - 11:20: Descriptions of Outstanding Participant Approaches

Apps

Conference app

We created an event for the conference on EventBase with all details about the program. You can browse it by downloading the EventBase app from the app store (Android, iOS, Blackberry, Mobile Web) and searching for "ECML-PKDD 2016". Please contact paolo.dragone@unitn.it for further requests.

Conference Matcher

To help you decide which sessions to attend you can use the ECML-PKDD 2016 Matcher tool at https://www.simsift.com/ecmlpkdd2016/. The tool lets you rank the papers at ECML-PKDD 2016 by their similarity to your own research by simply entering the URL of your homepage or online bibliography author page (DBLP recommended) or by entering some text or keywords representing your current interests.

Proceedings

ECML PKDD 2016, part I; LNAI 9851 (download PDF, ePub)
ECML PKDD 2016, part II; LNAI 9852 (download PDF, ePub)
ECML PKDD 2016, part III; LNAI 9853 (download PDF, ePub)
Machine Learning: Special Issue of the ECMLPKDD 2016 Journal Track
Data Mining and Knowledge Discovery: Special Issue of the ECMLPKDD 2016 Journal Track

No Shows

Joint Learning of Entity Semantics and Relation Pattern for Relation Extraction
Zheng Suncong, Xu Jiaming, Bao Hongyun, Qi Zhenyu, Zhang Jie, Hao Hongwei, Xu bo
Efficient Bayesian Maximum Margin Multiple Kernel Learning
Changying Du, Changde Du, Guoping Long, Xin Jin, Yucheng Li
Detecting Public Influence on News Using Topic-aware Dynamic Granger Test
Lei Hou, Juanzi Li, Xiao-Li Li, Jianbin Jin
Collaborative Expert Recommendation for Community-Based Question Answering
Congfu Xu, Xin Wang, Yunhui Guo
Query Log Mining for Inferring User Tasks and Needs
Rishabh Mehrotra, Emine Yilmaz
Personality-Based User Modeling for Music Recommender Systems
Bruce Ferwerda, Markus Schedl
Incremental Commute Time Using Random Walks and Online Anomaly Detection
Khoa Nguyen, Sanjay Chawla
Sequential Labeling with Online Deep Learning: Exploring Model Initialization
Gang Chen, Ran Xu, Sargur Srihari
Differentially Private User Data Perturbation with Multi-Level Privacy Controls
Yilin Shen, Rui Chen, Hongxia Jin, Ninghui Li
Learning beyond Predefined Label Space via Bayesian Nonparametric Topic Modelling
Changying Du, Fuzhen Zhuang, Jia He, Qing He, Guoping Long
Sparse Topical Analysis of Dyadic Data using Matrix Tri-factorization
Ranganath B. N.