Introduction

The cross-disciplinary research topic on multiple clustering has received significant attention in recent years. However, since it is relatively young, important research challenges still remain. Specifically, we observe an emerging interest in discovering multiple clustering solutions from very high dimensional and complex databases. Detecting alternatives while avoiding redundancy is a key challenge for multiple clustering solutions. Toward this goal, important research issues include: how to define redundancy among clusterings; whether existing algorithms can be modified to accommodate the finding of multiple solutions; how many solutions should be extracted; how to select among far too many possible solutions; how to evaluate and visualize results; how to most effectively help the data analysts in finding what they are looking for. Recent work tackles this problem by looking for non-redundant, alternative, disparate or orthogonal clusterings. Research in this area benefits from well-established related areas, such as ensemble clustering, constraint-based clustering, frequent pattern mining, theory on result summarization, consensus mining, and general techniques coping with complex and high dimensional databases.

Objectives

The aim of the MultiClust mini-symposium, to be held in conjunction with the 2014 SIAM Data Mining Conference, is to establish a venue for the growing community interested in multiple clustering solutions and in the different research topics related to this field. The mini-symposium will increase the visibility of the topic itself, but also bridge the closely related research areas such as ensemble clustering, co-clustering, clustering with constraints, and frequent pattern mining. In particular, we solicit discussions on approaches for solving emerging problems such as clustering ensembles, semi-supervised clustering, subspace/projected clustering, co-clustering, and multi-view clustering. Of particular interest will be a discussion panel that can draw new and insightful connections between these techniques, and ideas that contribute to the achievement of a unified framework that combines two or more of these techniques.

Summary of previous editions of the MultiClust workshop

The workshop series “MultiClust” has attracted considerable attention at its previous venues: KDD 2010 (11 submissions, 49 attendees), ECML PKDD 2011 (12 submissions, 40 attendees), SIAM Data Mining 2012 (8 submissions, 23 attendees). The first workshop on MultiClust was held at KDD 2010. The workshop was of high quality and attracted a large number of participants. Well-known researchers gave invited talks discussing related research directions in ensemble and alternative clustering. Overall, the final discussion revealed high interest in this topic. This positive experience continued at the following workshops. Notably, in 2012, a special issue of the Springer Journal on “Machine Learning” was dedicated to this topic.
The 2013 MultiClust workshop, held in conjunction with KDD 2013, was the merger of the 2012 3Clust (which was held at PAKDD 2012 – 8 submissions, 35 attendees) and the 2012 MultiClust workshops. The merger resulted in an emphasis on the similarity and possibly the unification of the different research areas concerned with the general problem of multiple clusterings. The technical program of the 2013 MultiClust workshop demonstrates once more the strong interest of different research communities. The 2013 MultiClust workshop had five peer-reviewed papers covering multiple research directions, and also had two excellent speakers giving invited talks that provided an overview on challenges in related fields: Michael Berthold (University of Konstanz, Germany) and Shai Ben-David (University of Waterloo, Canada). A report on the 2013 MultiClust workshop is forthcoming in SIGKDD Explorations.

Why a mini-symposium?   As discussed above, each of the previous workshop editions was successful in providing a venue for researchers to share their expertise in advanced data clustering fields, address open questions, identify emerging trends and challenges in those fields, and explore unified approaches to clustering problems. The special issue of the Machine Learning Journal represents an additional evidence of the success behind the MultiClust idea. Now that good MultiClust papers are being published in mainstream conferences and journals, the field has achieved sufficient maturity and the time has come to “graduate” the MultiClust workshops to a mini-symposium of invited talks given by experts in the field. We believe that this setting will provide a suitable stage for fruitful discussions that will both generate follow-up interest and push forward the state-of-the-art in data clustering. We have already contacted well-known researchers, and they have agreed to give talks at the mini-symposium. The names are provided below.

Being a core knowledge discovery task, clustering remains an important research topic at data mining conferences. In particular, in the last years the SDM conferences have been a premier stage for the presentation of progress made in the area of multiple clustering solutions and related research areas. Hosting the mini-symposium at SDM-14 will strengthen the connection between this highly active research topic and the knowledge discovery community, and will attract interested researchers and students from related disciplines.

Target audience.  The target audience consists of researchers and practitioners working on clustering. Besides the researchers directly working on non-redundant clustering, alternative clustering, ensemble clustering, subspace clustering, and clustering with constraints, we will also actively encourage other researchers to attend the mini-symposium.

Topics of interest

● Clustering Ensembles
● Co-clustering Ensembles
● Subspace/Projected Clustering
● Semi-supervised Clustering
● Multiview / Alternative Clustering
● Handling Redundancy in Clustering Results
● Bayesian Learning for Clustering
● Model Selection Issues: How Many Clusters?
● Co-clustering with External Knowledge for Relational Learning
● Probabilistic Clustering with Constraints
● Kernels for Semi-supervised Clustering
● Active Learning of Constraints in Clustering Ensembles
● Constraint-based Clustering for Uncertain Data Management and Mining
● Integration of Frequent Pattern Mining in (Semi-supervised) Multi-view Clustering
● Evaluation Criteria for Multi-view Data Clustering
● Benchmark Data for Multi-view Data Clustering
● Incorporating User Feedback in Semi-supervised Clustering
● Clustering Ensembles for Uncertain Data Management and Mining
● Multiple clusterings and multi-view data in Heterogeneous Information Networks
● Applications (document mining; health care; privacy and trustworthiness; etc.)

Program

[...coming soon]

A preliminary list of invited speakers is as follows:
● Arindam Banerjee (University of Minnesota, Minneapolis): Clustering with Linear Programming
● Ricardo J. G. B. Campello (University of São Paulo at São Carlos, Brazil): Clustering Evaluation and Validation: Some Results, Challenges, and Research Questions
● Xiaoli Fern (Oregon State University): Utilizing multiple clusterings: beyond consensus clustering
● George Karypis (University of Minnesota, Minneapolis): [to be announced]

Short bio of the organizers

Ira Assent is an Associate Professor at Aarhus University, Denmark since 2010. She earner her Ph.D. in 2008 at RWTH Aachen University, Germany, with a thesis on “Efficient Adaptive Retrieval and Mining in Large Multimedia Databases”.
Besides serving in many PC committees and as reviewer for several journals, she co-organized the MultiClust workshop in 2011 (in conjunction with ECMLPKDD), the MultiClust workshop in 2013 (in conjunction with KDD), and is guest editor of the special issue of the Machine Learning journal “MultiClust: Discovering, Summarizing and Using Multiple Clusterings”.

Carlotta Domeniconi is an Associate Professor in the Department of Computer Science at George Mason University since August 2008. She was an Assistant Professor in the same department during August 2002 – August 2008. Her research interests include machine learning, pattern recognition, and data mining, with applications in text mining and bioinformatics. She has published extensively in premier journals and conferences in machine learning and data mining. She was the program co-Chair of SDM in 2012. Dr. Domeniconi has served as PC member for KDD, ICDM, SDM, ECML-PKDD, and AAAI, and she is an Associate Editor of the IEEE Transactions of Neural Networks and Learning Systems Journal. Dr. Domeniconi is a recipient of an ORAU Ralph E. Powe Junior Faculty Enhancement Award. She has worked as PI or co-PI on projects supported by the US Army, the Air Force, and the DoD. Her research has been in part supported by an NSF CAREER Award.

Francesco Gullo is a research scientist at Yahoo! Research Barcelona. He received his Ph.D. from the University of Calabria (Italy) in January 2010. During his Ph.D., in 2009, he was an intern for five months at the George Mason University (GMU), Fairfax VA, USA. After completing his Ph.D., he was a research fellow at the University of Calabria, until September 2011, when he joined Yahoo! Research, first as a postdoctoral researcher, and, since September 2013, as a research scientist.
His research interests fall into the areas of data mining and machine learning, and currently include, among others, graph mining/querying and (social) web mining.
He has been co-chair of the 4th MultiClust workshop @KDD’13, and the 3Clust workshop @PAKDD ‘12. He has also served as a PC member of major CS conferences, including WWW, WSDM, CIKM, SDM.

Andrea Tagarelli is an assistant professor of computer science at the University of Calabria (Italy) since 2006. He obtained his Ph.D. in Computer and Systems Engineering, in 2006. His Ph.D. thesis work focused on information and knowledge extraction from semistructured text data. He was visiting researcher at the Department of Computer Science & Engineering, University of Minnesota at Minneapolis, in 2007. His research interests include topics in knowledge discovery and text/data mining, with applications to Web and semistructured data, social networks, spatio-temporal databases, and bioinformatics. Besides serving as a reviewer for leading journals in knowledge discovery, data and knowledge engineering, information systems, he has been PC member of premier conferences in data mining fields, including SDM, CIKM, ECML-PKDD, ASONAM. He was coorganizer of the 3Clust workshop in 2012 (in conjunction with PAKDD) and of the MultiClust workshop in 2013 (in conjunction with KDD).

Arthur Zimek is a research and teaching assistant in the database systems and data mining group of Hans-Peter Kriegel at the Ludwig-Maximilians-Universität München, Germany. 2012–2013 he was a postdoctoral fellow in the department for computing science at the University of Alberta, Edmonton, AB, Canada. He finished his Ph.D. thesis in computing science on “Correlation Clustering” in summer 2008. He received the “Best Paper Honorable Mention Award” at the SDM 2008 and the “Best Demonstration Paper Award” at SSTD 2011 together with his co-authors. He received the “SIGKDD Doctoral Dissertation Award (runner-up)” in 2009. His research interests include data mining for high-dimensional data and structured data.
For several years he has served in the program committees of SIGKDD and ECMLPKDD and as reviewer for journals like ACM TKDD, IEEE TKDE, Data Min. Knowl. Disc., Machine Learning etc. For CIKM 2013, he served as a senior PC member. Zimek was coorganizer of the MultiClust workshops 2012 (in conjunction with SDM) and 2013 (in conjunction with KDD).

Contact information for all organizers

Ira Assent
Department of Computer Science, Aarhus University
Aabogade 34, 8200 Aarhus N, Denmark
ira@cs.au.dk
+45-22962341

Carlotta Domeniconi
Department of Computer Science, George Mason University
4400 University Drive, MS 4A5, Fairfax, VA 22310, USA
carlotta@cs.gmu.edu
001-703-993-1697

Francesco Gullo
Yahoo! Research
Av. Diagonal, 177, 08018 Barcelona, Spain
gullo@yahoo-inc.com
+34 93 183 8891

Andrea Tagarelli
Department of Computer Engineering, Modeling, Electronics, and Systems Science, University of Calabria
via Pietro Bucci 41C, 87036 Arcavacata di Rende, Italy
tagarelli@dimes.unical.it
+39 0984 494751

Arthur Zimek
Institut für Informatik, Ludwig-Maximilians-Universität München
Oettingenstr. 67, 80538 Munich, Germany
zimek@dbs.ifi.lmu.de
+49 89 2180 9325

Leave a Reply