Audio-visual search for the consumer - how to generalise the results?
Jussi Karlgren
Many approches aim to provide audio-visual retrieval and search for
the end consumer. Some are commercial in themselves, some are
industrially motivated but not designed to be market-driven, others
are academic and research oriented. The results given by these
projects are often interesting, sometimes useful, and in some cases
successful in the market or in terms of consumer uptake. But what are
the general results we can build upon for future projects?
One of the defining structural bases of textual retrieval research
and success of its commercial instantiations is its effective and
widely accepted evaluation framework. Topically oriented retrieval
hinges on an adequate topical analysis of text, on a commonly
accepted target notion of topical relevance, and an underlying and
implicit set of use cases ranging from "full recall for a domain
expert" to "above the fold for the casual browser" retrieval. These
underpinnings are less well suited to multi-media retrieval. We don't
quite know why users retrieve objects; the topical content of objects
is arguably less important than many other aspects of their content;
relevance is less adequate as target notion and recall is difficult
to model in a dynamic and changing stream of information.
This session on audio-visual search for the consumer is intended to
bring together approaches, both industrial and academic, to explore
which underlying assumptions they make of their user community and
target user groups. Can these various starting points be used to
formulate common notions, to be able to share evaluation and
assessments results for research projects? Can some general results
of usability and potential for uptake be proposed?
This session looks for presentation to address one or more of the
following central topics (or some variation of them, or, indeed, to
question them!):
1. What viable use cases for consumer oriented audio-visual search do
we have?
Use cases are a useful research vehicle for the generalisation and
sharing of research results. Use cases are informally held
descriptions of how a system is intended to be used or how it might
be used, formulated without addressing technology, as a goal oriented
set of interactions between external actors, primarily users, and the
system and answers question such as Who does what and why? Evaluation
across projects, systems, and programs will be considerably
simplified through cross-program formulation of use cases. Can the
starting points of the various projects we know about today be used
to formulate common use cases for furthering future research?
2. Target notions for evaluation - what will replace relevance?
The concept of relevance in textual retrieval lies at the convergence
of understanding users, information needs, items of information, and
interaction. It ties together most every development and research
project in context sensitive information access. Relevance is a
function of task, collection characteristics, user preferences and
background, situation, tool, temporal constraints, and untold other
factors.
But in the case of consumer oriented audio-visual media, it is not as
clear what relevance can be understood to mean. Whily topical content
may be a parameter whereby end consumers select information objects
to fulfil their viewing or entertainment needs, it certainly is not
the only one and arguably not even the most important one. What
target notion should we use for evaluation in its stead?
3. Features of information objects - what should we look for to
describe our information objects saliently?
Content analysis is the central task for textual information
retrieval, but moving to audio-visual data it becomes much more
arduous, and the features to look for become in some ways much more
arbitrarily chosen. Textual content wears its semantics on its
sleeve, in the sense that words are a useful content descriptor to
start analysis from. Images and video in particulare have no such
obvious content descriptors. Where should we look? Context? Usage?
Metadata? Other features of similarity? What features are most
attractive to the user community at hand?

