4.2 Storage System Technologies

4.2.1 External Interface

The Interconnect System is the gateway for applications running on clients like DVB Set Top Boxes or PCs towards the SMASH Storage Device working as a remote server.

Figure 2: The SMASH Storage Device as a Remote Server for Applications

The Serial Bus IEEE 1394, which fulfils all the requirements for the SMASH storage device and its applications, has been chosen as the interconnect system for the combo. Very few devices, designed for just a single application, were on the market with 1394 interfaces at the beginning of the project. Meanwhile more and more devices with an IEEE1394 interface have been announced or launched, so that from today’s point of view the choice made at the beginning of the project has been proven to be the right one.

The 1394 digital bus as specified by IEEE only provides the basic layers for data transport. It specifies the "physical"-, the "link"- and the "transaction" layer. No provisions were available for the needs of real-time data streams like DVB (MPEG2 TS) or DVC. An extension to meet the "streaming" requirements has meanwhile been standardized as IEC 61883. This format adds another layer on top of IEEE1394, the "Common Interface Packet", which provides mechanisms to adapt various data formats to 1394. To apply these mechanisms to applications and its real-time stream formats was one of the basic tasks of the project. As a result one important building block of an "open interface" has been developed and proven by a reference implementation.

For verification a demonstrator based on a commercial DVB Set Top Box and a PC based storage device has been set up (see Fig. 3). In the course of this project the real-time interface based on 1394 has been designed for the demonstrator to support MPEG 2 transport stream for the DVB recorder application.

Figure 3: Demonstrator for IEEE 1394 based DVB/MPEG transmission

The second building block, required for an "open interface" is the control module. Basic commands for remotely controlling the SMASH storage have been specified and implemented as a the first step. By experiments with the demonstrator, the feasibility of the concept using the IEEE 1394 Digital Interface as the Interconnect System for the SMASH combo has successfully been proven.

This has lead to the deliverable "Specification of the Interconnect System", which defines the Application Programmable Interfaces (API) of the "open interface" as used for the SMASH Combo. The API allows application programmers to make use of the features of the SMASH device without being bothered by details of the technical implementation of the instrument.

To support the project applications (see section 4.1) the SMASH server should be able to handle the following types of information or files: text, graphics, animation, digital audio and video.

To access these file types the application programmer can use function calls of the API. To cope with the different types of files followings the API consist of the following main groups:

The low level API can handle complete tapes, partitions of a tape and a hard disk as a cache for tape partitions. In addition to basic (hard-)disk commands, e.g. open and close files, read and write files, commands for accessing information from tape are provided, e.g.: lookup, restore, archive and archive(FileList)

The files system API is provided by classes in the java.io package. This enables to handle also new files defined in future for new applications.

The streams API can handle real time AV data like MPEG1, AVI, Quicktime, or MPEG2 partial transport stream. Following commands have been implemented: Play, Pause, Resume, Stop, Record and Status

The operational management API has been introduced to deal with application specific implementations of the generic commands, e.g. OpenSession, CloseSession, List, Free, Mount, Unmount, Initialize.

                                                                                                                                                                     SMASH Storage Device

Figure 4: Present (hatched blocks) and proposed interconnect for the final SMASH storage device

 4.2.2 Prototype

An important part of the SMASH project has been the realisation of a realistic experimental set-up. The experimental SMASH system serves the DVB-VCR, Web-VCR and Remote Education applications, that are described in section 4.3. The technical concept of this storage server is described below.

In a SMASH application, a multimedia client presents the user with an intuitive interface for high-level control over his data including video, audio, hypertext documents and more. In our current system the SMASH client software runs either on set top boxes or on PC's This client communicates with the data server, or Combo, which does the actual data handling. A single SMASH Combo can serve several clients. In a SMASH system the tasks of the Combo server include:

The client’s main tasks are

Figure 5: Client – Server approach in general Combo software architecture, used in both DVB-VCR and Remote Education applications

The basic software architecture of our client / server SMASH applications is graphically presented in figure 5. The client is running an application that uses the Combo remotely through its API. On the Combo, the server software handles the communication with the clients, and passes commands on to a controller, that does the actual work: scheduling data transport, managing device I/O. etc.

In the beginning of the project we have spend sufficient time analysing the requirements and defining the type of experiments we wanted to carry out with the prototype. We opted for a flexible system, which allowed us the following possibilities: easy experimenting with the system, be able to support user trials, to suggest improvements to the concept, easy and flexible hardware interfacing of several special functions and an easy software development environment. Hence a PC platform is the obvious choice for our SMASH prototype.

Figure 6 shows the basic hardware architecture of the system. The server PC is running the Linux operating system, a choice based on its multi-tasking capabilities and its relatively easy hardware control. It has the hard disk and a Tandberg MLR-3 tape drive, as well as set-top boxes and dedicated hardware to deal with DVB video streams coming in and going out. The process of keyframe extractions is handled on a dedicated TriMedia™ board. This board should also be part of the server, but unfortunately Linux support was not available in time, and we decided to connect the board to the client PC instead, as a temporary solution.

The client PC runs Windows 95, because only for this OS a TriMedia™ driver is available. The TriMedia™ board is used for extracting Electronic Program Guide (EPG) information and keyframes from a DVB transport stream in the DVB-VCR application. The client sends this extracted meta data to the MMDB at the server. For communication between the two PC’s we use an ethernet link, which will be replaced by IEEE 1394 when boards and drivers become available.

The requirements for the Combo server differ in the DVB-VCR, Web-VCR and RE applications. Both the Web-VCR and RE applications require just a hard disk and tape drive, but no DVB video streams or keyframe extraction are required. The hard disk file system is exported to a client using Samba or NFS.

Figure 6: Hardware architecture of the Combo system; the TriMedia™ will be moved from the client PC to the server PC in the near future. TriMedia™, MPEG-2 I/O and STB’s are not required for Remote Education

Figure 7: SMASH Combo server with settop boxes, tape and disk drives and a simple graphical user interface of the DVB-VCR demonstrator: The key-frames represent the contents of the tape.

In the DVB-VCR application, DVB streams are received from satellite by a dish antenna connected to a set top box (STB) from which the MPEG 2 data is collected by a custom-made PCI board for recording and playback. During playback, the stored MPEG 2 data is passed back to a STB for decoding and displaying on a conventional TV. For processing multiple streams (such as simultaneous recording and playback), a second TV/STB/MPEG I/O chain is present. In figure 7 a photograph is shown of the DVB-VCR system.

4.2.3 Visual Navigation

The SMASH storage device allows the recording of huge amounts of information. In particular, in the DVB-VCR application, large volumes of MPEG compressed video will be stored. The retrieval of specific pieces of video material or the browsing of the available stored video therefore becomes time consuming, if not inefficient. In the DVB-VCR application enhanced user interfaces give textual and visual cues on the contents and structure of the recorded programs so that browsing and navigation in stored programs becomes more efficient and user friendly [15]. An impression of such a user interface is given in Figure 8.

Figure 8: Impression of User Interface for the Video Browsing and Navigation.

Since the large amounts of stored MPEG compressed video programs are not easily accessible for browsing and navigation, the user interfaces should rely not on the actually recorded video streams, but on an abstracted version thereof [2, 7, 8, 12]. The solution chosen was to extract upon recording time already relevant meta-information and store this information separately in a multimedia data base for later usage. This multimedia data base itself is stored on the hard disk of the Combo system so that random and quick access to the abstracted information is possible [11, 26].

The meta-information that has been studied in most detail are key frames. Other meta-information currently being retrieved and stored on-line are DVB-SI and subtitles. Key frames are those frames in a television program that represent certain relevant actions. The steps that are taken to obtain the key frames on-line are (see Figure 9): MPEG-TS demultiplexing, partial decoding of I-frames, shot change detection using these partially decoded I-frames, and finally selection of key-frame(s) from a shot [4, 6, 17]. The selected frames are stored in a multimedia database, together with proper references to the stored MPEG streams and any additional meta-information. Though the detection and extraction of key frames is a process not as computationally intensive as full decoding of MPEG video, the initially developed (mainly for design and testing purposes) software-only implementation ran in significantly less than real-time. Real-time key frame detection and extraction on live DVB broadcasts has been realized in the project by using the Philips’ video signal processor TriMedia. This hardware implementation also allowed for large scale performance evaluation of the selected detection and extraction algorithms. In Table 1 a summary is given of one of the crucial steps, namely the real-time shot change detector, on various types of programs.

Figure 9: System view on key information (key frames) extraction and organization

 

Sequence name Jurassic Park Monty Python Dutch News Commercials Football
Genre Movie Comedy series News Commercials Sport
Duration (minutes) 116 23 16 2 5
KF dimensions 90x72 66x72 66x72 66x72 66x72
GOP 12 12 12 12 12
I-frames 14503 2891 1990 247 631
Key Frames 1084 179 181 43 39
% of KF 7.47 6.19 9.1 17.42 6.18
Average, KF every 6.42 s 7.75 s 5.27 s 2.75 s 7.76 s
Missed KF 57 8 1 - 0
False alarms 24 12 12 - 3
% of missed KF 5.26 4.47 0.55 - 0.00
% of false alarms 2.21 6.70 6.63 - 7.69

Table 1: Evaluation results of real-time shot change detection.

After abstracting the recorded program, postprocessing steps can be carried out on the key frames in order to organize them logically. It is necessary to impose such a meta-structure on the key-frames in order to reduce the number of key frames to be used in the user interface (see entry "Key Frames" in Table 1), and in order to break container-like programs (broadcast news, sports programs, etc.) into self-contained items. Two types of programs have been considered in this respects, namely movies – in which case the key frames are hierarchically organized to obtain an episode-like structure – and broadcast news – in which case the key frames are used to delineate the news items.

The method for organizing key frames in episodes (or logical story units – LSU’s) works as follows [18, 33]. Within a single movie episode there is a high temporal consistency of the visual content of key frames. It can therefore be expected that within an episode every now and then similar visual content elements (scenery, background, people, faces, dresses, specific patterns, etc.) appear and some of them most likely repeat. Clearly the matching of the content may not happen immediately in successive video shots, but most likely there will be a match within a certain time interval. Based on this observation, an LSU can be defined as a series of temporally contiguous video shots, characterized by overlapping links that connect shots with similar visual content elements. Figure 10 illustrates the application of this definition in a detection algorithm. Figure 11 shows a hierarchical structure in which the key frames are organized under the LSU structure.

Figure 10: LSU characterized by overlapping links connecting similar video shots.

Figure 11: Key frames from a movie represented using LSUs.

For broadcast news we have used the following approach [36, 37]. News items are often encapsulated between (and sometimes mixed with) shots of the news reader or anchor person. The anchor person shots are visually characterized by studio background and by one or two news readers appearing separately or together, also with some possible variations of a camera angle and changes in news icons appearing in a screen corner (see figure 12). These shots can be static or dynamic (containing some camera operations like zooming or panning) and contain certain percentage of same or similar visual features. After detecting shot boundaries the presence of a news reader can be detected in the key frames, from which the structure of the news program –including the news topic boundaries– can be detected.

Figure 12: Structure of a news program. Different colors of blocks belonging to anchorperson shots indicate their different types

4.2.4 Copy Protection

Though there are many advantages to the domestic availability of large capacity storage devices such as the SMASH Combo server, one major disadvantage is that illegal copying of high-quality broadcasted materials becomes very easy. It is therefore necessary to not only develop technical solutions to the storage problems, but at the same time to investigate and develop ways of avoiding illegal copying, i.e., copy protection methods [3, 13, 24]. The importance of copy protection systems is stressed by representatives of the consumer electronics and motion picture industries who have agreed to seek legislation concerning digital video recorders that would protect both intellectual property and consumers’ rights. A recommendation is submitted to the US Congress, which allows copyright proprietors to prohibit copying from pay-per-view, video-on-demand programming and pre-recorded media. In European context, DAVIC is looking for standardization of copy protection mechanisms.

Figure 13: Concept for copy protection in a SMASH.

For the SMASH Combo server a copy protection system for MPEG compressed video has been developed [16, 29]. The copy protection method is based on labeling techniques and allows only one copy of digital multimedia (video) data (Figure 13). Labeling or watermarking of visual information is based on embedding hidden signals (labels) in the pictorial information, such that it cannot easily be removed without seriously affecting the quality of the labeled information. Upon recording time, the MPEG streams to be recorded are checked for the presence of a watermark. If the watermark is present, the Combo device refuses to record that particular MPEG stream. If the watermark is not present, the Combo device will record the MPEG stream, but only after embedding the watermark. This prevents further copying of that information by other Combo systems.

There are several requirements which the labeling technique should satisfy. The labeling technique should not be too complex and operate in real-time on compressed data. Its presence should not affect the size and the quality of the multimedia data. The strength of the copy protection scheme relies on the robustness of the labeling technique. In other words: it should be impossible to remove the label without affecting the quality of the multimedia data significantly.

Two new watermarking algorithms have been developed, specifically intended for MPEG video copy protection, together with a model that allows for the optimal tuning of the method’s parameters. At the same time exploratory investigations have been carried out on the likely attacks that "hackers" could carry out on watermarked information as so to remove the watermark and break the copy protection, and of the strength of existing watermarking techniques against these attacks [31, 34].

The most promising watermarking technique that we have developed is based on discarding parts of the (quantized) block-DCT coefficients in the (compressed MPEG I-frames) video stream [32]. Depending on whether a "0" or "1" bit needs to be embedded, the high frequency components of the top-half or bottom-half of an N by N set of (8x8) DCT-blocks are discarded. This difference in "energy" between top and bottom half of the blocks can be detected by the watermark extractor. In order to avoid unnecessary visual artefacts, the DCT blocks in an MPEG I-frame are pseudo-randomly shuffled (Figure 14). The pseudo random generator seed is the "key" to the decoding of the watermark. In case the process is carried out on compressed streams, only partial decoding is needed for the labeling and no re-encoding is required (except for table lookups and re-multiplexing).

Figure 14: Illustration of the pseudo-random block shuffling and the result of embedding a label string: Left: original images, Right: labeled images, Middle: difference images


Back to table of contents