In the contest of AVIR project the University of Brescia has developed an application in order to show results of procedures for automatic analysis and indexing of audio-visual material. This application is only a demo and so its aim is not to give an end-user tool, but only to demonstrate one way to use the metadata generated automatically by a Service Provider System.
This work has been developed also in conjunction with the MPEG-7 activity, in order to carry out a possible Description Scheme, which provides a hierarchical description of the time sequential structure of a multimedia document (ToC), suitable for browsing, together with an "Analytical Index" (AI) of the key items of the document, suitable for retrieval.
This page describes only the ToCAI demo application: for further information about ToCAI Description Scheme you can look at bibliography at the end of this page.
The initial idea originates from the structures adopted to describe information content in technical books. Indeed one is able to easily understand the sequential organization of the book by looking at the Table of Content, while a quick search of elements of interest can be achieved by means of the Analytical Index of keywords. The ToCAI allows a similar mechanism to address multimedia material in the analytical index, with a couple of extensions: it allows to retrieve information at any given level of abstraction, which is not normally the case in a book (each keyword in the index points normally to the page numbers only, not the sections or paragraphs where the topic of interest can be found). It also allows to arrange the items of the Analytical Index (key items) according to ordering criteria relevant to the involved application context.
The interface structure is the same both for ToC part and AI part. It is composed by the following items:
(A) - The part (ToC or AI) actually in use is indicated in the form caption.
(B) - The left panel contains a hierarchical view of the analyzed data, which are organized according to a temporal segmentation in ToC part and some kind of semantic in AI part.
(C) - The right panel contains a player of the audio-visual data.
(D) - Buttons to enable typical viewing functionalities (play, forward, backward, stop, ...).
(E) - Button to enable the ordering functionalities.
(F) - Button to switch between ToC and AI part.
(G) - Five little frames which contain key images for representing the content of the part selected in the (B) tree.
(H) - Button to scroll the key images list when the lenght of this list is more than five .
(I) - Supplemental space to show additional camera motion information.
The ToC describes the temporal structure of an audio-visual document at multiple levels of abstraction. The lower levels provide a detailed characterization of the temporal structure of the audio-video document while the higher ones have the role to offer a more compact description with associated semantics. Actually the temporal segmentation is organized like a tree, with three levels: higher level is the scene level (a scene is a multimedia segment with own semantic) for instance a goal in a soccer match or a dialog in a movie. At below level there are shots (a shot is defined by a sequence of frames captured from a unique and continuous record of camera). At lowest level there are microsegments, which contain a sequence of frame having homogeneous camera movements (like zoom in/out, pan left/right and so on). The descriptors automatic extraction allows to obtain both shots and microsegments decomposition, while the scenes decomposition at the moment is performed by hand (some techniques using cross-modal analysis are actually under investigation and initial results are expected at the end of this project).
Using the tree in the left panel it is possible to browse the temporal segmentation. Each segment (program, scene or shot) is represented by a folder, like a directory tree. Opening a folder the segmentation of the underlying level is showed in the five frames (G) using key images (a key image is an image representing a segment in the AV material, typically a frame belonging to the segment).
In figure below is showed a typical screenshot of a ToCAI running in ToC part. The tree in the left panel shows the temporal segmentation: it is possible to see 3 scenes ("Typical jobs in Santillana", "Downtown Tour" and "San Sebastian church") and the last one is open to show shots which compose it. In the five frames keys representing the microsegments which compose the shot number 17 are showed (in this case only two microsegments are present in the shot and so three frames are empty).
At microsegments level it is possible to have information about mosaic and camera movements, where a mosaic represents the background in a shot. Each image contains a small button with a M on his back. This button allows the mosaic visualization, which is showed in a window. The mosaic gives information about all the background and it is a way to collect more information than in the usual visualization. In figure below for example the whole image is reconstructed: in such way it is possible to have all the information about the background without viewing all the shot.
In the area indicated with letter I there are five frames, one for each image, which are used to give information about the camera movements. There are six small circles (which simulate leds) in each frame in order to show the main camera movements contained in the microsegment: when the red colour in a circle is brilliant the corrispondent movement is present, otherwise it is not present. Only 6 camera movements over 14 are represented, to avoid information overloading. To view a more detailed report of camera movements it is possible to click on the ^ button: in this way it will appair a window like the following.
In this diagram each kind of movement in the microsegment is highlighted using a colour and a measure of its magnitude is showed.
Clearly this extraction is not useful for a normal end-user, but it can be useful for a operator which can be interested to view only microsegments containing, for instance, pan left component (in a soccer match for example this can be useful to find segments contaning a player in fast movement).
Other ordering keys are avaible according to the context: to view the ordering keys avaible is sufficient to click on E button and the following window will appair.
In this case it is possible to order the segment according to their dominant hue computed over a key frame. This ordering key can be applied on shots of 'Jornal da Noite' portughese news: in this way it is possible for example to cluster all the shots containing the speaker despite of their temporal position in the multimedia stream, as you see in the figure below.
Another example is in the next screenshot. Using a measure of the loudness in each shot a significant ordering is obtained. In the first frame there is a key frame of the first Morientes' goal, in the second there is the beginning of the match and in the third there is the second Morientes' goal. In this way a possible interesting order is obtained based on automatically extracted features (all the features are extracted automatically).
The ToCAI application has been developed at University of Brescia by Signals and Communications group (Riccardo Leonardi,Pierangelo Migliorati, Lorenzo Rossi, Nicola Adami, Alessandro Bugatti). The development enviroment used is Visual Basic 6.0. The player is Window Media Player 7.0 and the metadata are stored in a Access database.
The algorithms for the shots segmentation have been implemented in our University under Linux.
Microsegmentation and camera movements extraction have been implemented in LEP laboratories.