[ home > Call for Special Sessions]

Audio-visual search for the consumer - how to generalise the results?
Jussi Karlgren

Many approches aim to provide audio-visual retrieval and search for the end consumer. Some are commercial in themselves, some are industrially motivated but not designed to be market-driven, others are academic and research oriented. The results given by these projects are often interesting, sometimes useful, and in some cases successful in the market or in terms of consumer uptake. But what are the general results we can build upon for future projects?
One of the defining structural bases of textual retrieval research and success of its commercial instantiations is its effective and widely accepted evaluation framework. Topically oriented retrieval hinges on an adequate topical analysis of text, on a commonly accepted target notion of topical relevance, and an underlying and implicit set of use cases ranging from "full recall for a domain expert" to "above the fold for the casual browser" retrieval. These underpinnings are less well suited to multi-media retrieval. We don't quite know why users retrieve objects; the topical content of objects is arguably less important than many other aspects of their content; relevance is less adequate as target notion and recall is difficult to model in a dynamic and changing stream of information.
This session on audio-visual search for the consumer is intended to bring together approaches, both industrial and academic, to explore which underlying assumptions they make of their user community and target user groups. Can these various starting points be used to formulate common notions, to be able to share evaluation and assessments results for research projects? Can some general results of usability and potential for uptake be proposed?
This session looks for presentation to address one or more of the following central topics (or some variation of them, or, indeed, to question them!):

1. What viable use cases for consumer oriented audio-visual search do we have?
Use cases are a useful research vehicle for the generalisation and sharing of research results. Use cases are informally held descriptions of how a system is intended to be used or how it might be used, formulated without addressing technology, as a goal oriented set of interactions between external actors, primarily users, and the system and answers question such as Who does what and why? Evaluation across projects, systems, and programs will be considerably simplified through cross-program formulation of use cases. Can the starting points of the various projects we know about today be used to formulate common use cases for furthering future research?

2. Target notions for evaluation - what will replace relevance?
The concept of relevance in textual retrieval lies at the convergence of understanding users, information needs, items of information, and interaction. It ties together most every development and research project in context sensitive information access. Relevance is a function of task, collection characteristics, user preferences and background, situation, tool, temporal constraints, and untold other factors.
But in the case of consumer oriented audio-visual media, it is not as clear what relevance can be understood to mean. Whily topical content may be a parameter whereby end consumers select information objects to fulfil their viewing or entertainment needs, it certainly is not the only one and arguably not even the most important one. What target notion should we use for evaluation in its stead?

3. Features of information objects - what should we look for to describe our information objects saliently?
Content analysis is the central task for textual information retrieval, but moving to audio-visual data it becomes much more arduous, and the features to look for become in some ways much more arbitrarily chosen. Textual content wears its semantics on its sleeve, in the sense that words are a useful content descriptor to start analysis from. Images and video in particulare have no such obvious content descriptors. Where should we look? Context? Usage? Metadata? Other features of similarity? What features are most attractive to the user community at hand?