This year I participated in the European summer school in IR which took place in wonderful Thessaloníki. The whole event was perfectly organized, lecturers gave really interesting speeches and last but not least, I met a lot of people sharing the same interests with me. Event though I was taking notes, I wanted to go over some of the slides and posters presented during the school and keep them somewhere for future reference. Having found them, why not to list them all in one place?

Evangelos Kanoulas: Experimental design for collection-based comparative evaluation of search engines

Information retrieval effectiveness evaluation typically takes one of two forms: batch experiments based on static test collections, or online experiments tracking user’s interactions with a live system. Test collection experiments are sometimes viewed as introducing too many simplifying assumptions to accurately predict the usefulness of a system to its users. As a result, there is great interest in creating test collections that better model the variability encountered in real-life search scenarios. This includes experimenting over a variety of queries, corpora or even users and their interactions with the search results. In this talk I will discuss different ways of incorporating user behaviour in batch experimentation, how to model the variance introduced to measurements of effectiveness, and how to extend our statistical significance test arsenal to allow comparing search algorithms.
(source: ESSIR website)

Julio Gonzalo: Tutorial on clustering & filtering evaluation metrics

… in this talk we will review and compare the most popular evaluation metrics for Clustering and Filtering tasks. In order to compare and assess the adequacy of metrics, we will specify a few intuitive formal constraints for each task, which every suitable metric should satisfy. The analysis leads to some practical conclusions: for the clustering problem, there is only a metric pair (Bcubed Precision and Recall) that satisfies all formal constraints. For filtering metrics, we end up distinguishing three metric families which have mutually exclusive properties. The results provide useful guidance to select the most adequate evaluation metric for each application scenario.
(source: ESSIR website)

Katja Hofmann: Machine Learning for IR

Machine Learning (ML) forms the basis of many Information Retrieval (IR) technologies, ranging from early work on text classification to recent approaches to entity linking, sentiment detection, and document ranking. In addition to serving as a key application area for ML, IR continuously pushes ML towards novel approaches.
In this talk I discuss and exemplify the dual role of IR as both a consumer of ML technology, and as a driver towards new challenging ML problems. I start with an overview of typical ML applications to IR, including an overview of learning to rank approaches. In the second part of the lecture I focus on a recent trend towards online learning approaches that allow continuous learning from user interactions. I discuss existing solutions, and conclude by highlighting open questions and directions for future research
(source: ESSIR website)

Michalis Vazirgiannis: Graph-of-words: boosting text mining with graphs

The Bag-of-words model has been the dominant approach for IR and Text mining for many years assuming the word independence and the frequencies as the main feature for feature selection and for query to document similarity. Although the long and successful usage, bag-of-words ignores words’ order and distance within the document – weakening thus the expressive power of the distance metrics. We propose graph-of-word, an alternative approach that capitalizes on a graph representation of documents and challenges the word independence assumption by taking into account words’ order and distance. We applied graph-of-word in various tasks such as ad-hoc Information Retrieval, Single-Document Keyword Extraction, Text Categorization and Sub-event Detection in Textual Streams. In all cases the the graph of word approach, assisted by degeneracy at times, outperforms the state of the art base lines in all cases.
(source: ESSIR website)

Fabrizio Sebastiani: Text Classification, Sentiment Analysis & Opinion Mining

Text Classification (TC) is a basic enabling technology in nowadays’ IR, since many text-related prediction tasks can be framed in terms of classification. As a result, scores of applications (ranging from webpage/website classification under folksonomies to author identification for texts of uncertain paternity) have a TC engine under the hood. Modern text classification methods rely on supervised machine learning; according to this paradigm, a general-purpose learning algorithms learns the characteristics a text should have in order to be classified under class X, by analysing a set of texts which were previously classified as belonging or not belonging to X by a human. This tutorial will discuss the main steps towards the construction of a text classifier, from the generation of vectorial representations of the texts, to training a classifier from examples, to evaluating its accuracy on benchmark datasets.

Until 15 years ago, text classification was almost a synonym of “classification by topic”, i.e., classifying textual documents according to what they are about. More recently, the classification of texts according to dimensions other than topic (e.g., by language, as in language identification; by author, as in authorship attribution) has also been investigated. The most important among these dimensions is certainly sentiment, as when classifying a product review according to whether it expresses a positive or a negative opinion towards the topic. Sentiment classification is an instance of a more general task called “opinion mining”, which encompasses all tasks having to do with the analysis of text according to the sentiments and opinions expressed therein. The key difference between classification by topic and classification by sentiment lies in the way vectorial representations of the texts. This tutorial will explore these key differences by discussing the text representation techniques adopted in state-of-the-art sentiment classification systems, with particular emphasis on systems that tackle text arising within social media.
(source: ESSIR website)

Some other resources mentioned @ ESSIR