WO2005122081A1

WO2005122081A1 - Watermarking based on motion vectors

Info

Publication number: WO2005122081A1
Application number: PCT/IB2005/051783
Authority: WO
Inventors: Adriaan J. Van Leest
Original assignee: Koninklijke Philips Electronics N.V.
Priority date: 2004-06-08
Filing date: 2005-06-01
Publication date: 2005-12-22
Also published as: TW200611581A

Abstract

The present invention relates to a method, device and computer program product for embedding additional data in a media signal as well as a media signal processing device comprising such a device for embedding additional data. In the method a media signal divided into frames having blocks of a number of signal sample values (c(i,j)) is obtained. A motion vector (V) for a frame associated with a block of signal samples is also obtained. Thereafter additional data coefficients (w(i,j)) to be embedded in said signal samples of said block in dependence of the motion vector, (step 46).

Description

Watermarking based on motion vectors

TECHNICAL FIELD The present invention generally relates to the field of watermarking of media signals, preferably video signals for instance coded according to the MPEG coding scheme. More particularly the present invention is directed towards a method, device and computer program product for embedding additional data in a media signal as well as a media signal processing device having such a device for embedding additional data.

DESCRIPTION OF RELATED ART It is well known to watermark media signals in order to protect the rights of content owners against piracy and fraud. A watermark is here normally a pseudo-random noise code that is inserted in the media signal. In the watermarking process it is necessary that the watermark is not perceptible. A watermark that is embedded in for instance a video signal should then not be visible for an end user. It should however be possible to detect the watermark safely using a watermark detector. At the same time it is often desirable to be able to provide as much energy as possible into the watermarking process in order to provide better watermarks, which makes the watermark more robust to different types of attacks. One known watermarking scheme for a video signal is described in WO- 02/060182. Here a watermark is embedded in an MPEG video signal. An MPEG signal is received and comprises VLC (Variable-Length Coding) coded quantised DCT (Discrete Cosine Transform) samples of a video stream divided into frames, where each frame includes a number of blocks of pixel information. In this watermarking scheme the quantised DCT samples are obtained from the VLC coded stream and the watermark is directly embedded in this domain. A watermark is here embedded in the quantised DCT components of a block of size 8x8 under the use of a bit-rate controller, such that only the small DCT levels are modified with ±1 into a zero value. These values are furthermore only modified if the bit rate of the stream is not increased. It would however be advantageous if the watermarking according to this scheme could be improved for making the watermark more robust.

SUMMARY OF THE INVENTION It is therefore an object of the present invention to provide a watermarking scheme that is more robust. According to a first aspect of the present invention, this object is achieved by a method of embedding additional data in a media signal comprising a number of frames, where at least one frame comprises motion vectors related to a block of signal samples moved in relation to at least one other frame, comprising the steps of: obtaining a media signal divided into frames having blocks of a number of signal sample values, obtaining a motion vector for a frame associated with a block of signal samples, and modifying additional data coefficients to be embedded in said signal samples of said block in dependence of the motion vector. According to a second aspect of the present invention, this object is also achieved by a device for embedding additional data in a media signal, comprising an embedding unit arranged to: obtain a media signal divided into frames having blocks of a number of signal sample values, obtain a motion vector for a frame associated with a block of signal samples, and modify additional data coefficients to be embedded in said signal samples of said block in dependence of the motion vector. According to a third aspect of the present invention, this object is also achieved by a media signal processing device comprising a device for embedding additional data according to the second aspect. According to a fourth aspect of the present invention, this object is also achieved by a computer program product for embedding additional data in a media signal, comprising computer program code, to make a computer do, when said program is loaded in the computer: obtain a media signal divided into frames having blocks of a number of signal sample values, obtain a motion vector for a frame associated with a block of signal samples, and modify additional data coefficients to be embedded in said signal samples of said block in dependence of the motion vector. Claims 2 and 7 are directed towards measuring the length of the motion vector and modifying the depth of the additional data embedding in dependence of the length. The length of the motion vector is an indication of the speed of movement of an object, which can advantageously be used for varying the embedding strength. According to claim 3 the length of the motion vector is compared with a threshold and the embedding depth of the additional data is increased when the length exceeds this threshold. Because a human has difficulties tracking fast moving objects, he also has problems tracking additional data embedded in these objects. This allows the increase of the strength or embedding depth of the additional data. If the additional data is a watermark, this feature thus provides a more robust watermark. According to claim 4, the frame is a frame not used for reference for other frames. Because of this feature the processing required is limited. This feature also ensures that the increased embedding strength provided for a moved object is not retained if the object is later standing still. The present invention has the advantage of optimising the energy of the additional data in the signal, which makes the additional data more robust for different type of attacks. The essential idea of the invention is that the motion vector of a block in media signal is used for deciding the embedding strength for additional data embedded in that block. These and other aspects of the invention will be apparent from and elucidated with reference to the embodiments described hereinafter.

BRIEF DESCRIPTION OF THE DRAWINGS The present invention will now be explained in more detail in relation to the enclosed drawings, where Fig. 1 schematically shows a number of frames of video information in a media signal, Fig. 2 schematically shows one such frame of video information where a watermark has been provided, where the frame is divided into a number blocks, Fig. 3 shows an example of a number of luminance levels in the spatial domain for one intraframe coded block, Fig. 4 shows DCT levels corresponding to the luminance levels in Fig. 3 for the block, Fig. 5 shows the default intra quantizer matrix for the block in Fig. 3 and 4, Fig. 6 shows the scanning of quantised DCT coefficients for obtaining a VLC coded video signal, Fig. 7 shows the default inter quantizer matrix for an interframe coded block, Fig. 8 shows a device for embedding additional data according to the present invention, Fig. 9 shows a flow chart of a method of embedding additional data according to the present invention, and Fig. 10 schematically shows a computer program product comprising computer program code for performing the method according to the invention.

DETAILED DESCRIPTION OF EMBODIMENTS The invention is directed towards the embedding of additional data in a media signal. Such additional data is preferably a watermark. However the invention is not limited to watermarks but can be applied for other types of additional data and also to other fields of use such as transcoding of the media signal. The media signal will in the following be described in relation to a video signal and then an MPEG coded video signal. It should be realised that the invention is not limited to MPEG coding, but other types of coding can just as well be contemplated. A video signal or stream X according to the MPEG standard is schematically shown in Fig. 1. An MPEG stream X comprises a number of transmitted frames or pictures denoted I, B and P. Fig. 1 shows a number of such frames shown one after the other. Under the frames a first line of numbers is shown, where these numbers indicate the display order, i.e. the order in which the information relating to the frames is to be displayed. Below the first line of numbers, there is shown a second line of numbers indicating the transmission and decoding order, i.e. the order in which the frames are received and decoded in order to display a video sequence. Above the frames there are shown arrows that indicate how the frames refer to each other. It should be realised that the stream also includes other information such as overhead information. The different types of frames are divided into I-, B- and P-pictures, where one such picture that is a P-picture is indicated with reference numeral 10. An I-picture is denoted with reference numeral 11. 1-pictures are so-called intraframe coded pictures. These pictures are coded independently of other pictures and thus contain all the information necessary for displaying an image. P- and B-pictures are so called interframe coded pictures that exploit the temporal redundancy between consecutive pictures and they use motion compensation to minimize the prediction error. P-pictures refer to one picture in the past, which previous picture can be an I-picture or a P-picture. B-pictures refers to two pictures one in the past and one in the future, where the picture referred to can be an I- or a P-picture. Because of this the B-picture has to be transmitted after the pictures it refers to, which leads to the transmission order being different than the display order. The principles of coding will now be described in relation to intracoded blocks, because here the principles of the coding are most clearly seen. In an intracoded picture, i.e an I-picture, the frame contains a number of pixels, where the luminance and chrominance are provided for each pixel. In the following, focus will be made on the luminance, since watermarks are embedded into this property of a pixel. Each such frame is further divided into 8x8 pixel blocks of luminance values. One such frame 11 is shown in Fig. 2, which shows an object 12 provided in the stream. As an example, there is here provided twelve 8x8 pixel blocks of luminance values, where there are four such blocks in the horizontal direction and three in the vertical. All of the blocks in the figure are furthermore watermarked, which is here indicated with the letter w in order to show that a watermark is embedded in these blocks. It should be known that watermarks are in general not visible. One of the blocks 14 is highlighted and will be used in relation to the description of the MPEG coding. Fig.3 shows an example of some luminance values y for the block indicated in Fig. 2. In the process of performing the coding of intracoded blocks, a DCT (Discrete

Cosine Transform) operation is performed on these blocks resulting in 8x8 blocks of DCT coefficients. Fig. 4 shows such a DCT coefficient block for the block in Fig. 3. The coefficients contain information on the horizontal and vertical spatial frequencies of the input block. The coefficient corresponding to zero horizontal and vertical frequency is called a DC component, which is the coefficient in the upper left corner of Fig. 4. Typically for natural images these coefficients are not evenly distributed, but the transformation tends to concentrate the energy to the low frequency coefficients, which are in the upper left corner of Fig. 4. Thereafter the AC coefficients in the intracoded block are quantised by applying a quantisation step q * Qi_ntra(m, n)/16. Fig. 5 shows the default quantisation values Qi_ntra used here. The quantisation step q can be set differently from block to block and can vary between 1 and 112. After this quantisation the coefficients in the blocks are serialized into a one dimensional array of 64 coefficients. This serialisation scheme is here a zigzag scheme as shown in Fig. 6, where the first coefficient is the DC component and the last entry represents the highest spatial frequencies in the lower corner on the right side. From the DC component to this latest component the coefficients are connected to each other in a zigzag pattern. The one dimensional array is then compressed or entropy coded using a VLC

(variable length code. This is done through providing a limited number of code words based on the array. Each code word denotes a run of zero values, i.e. the number of zero valued coefficients preceding a quantised DCT coefficient followed by a non zero coefficient of a particular level. This leads to the creation of the following line of code words for the values in Fig. 6:

(0,4),(0,7),(1,-1),(0,1),(0,-1),(0,1),(0,2),(0,1),(2,1),(0,1)(0,-1),(0,-1),(2,1),(3,1,), (10,1),EOB

where EOB indicates the end of the block. These so-called run/level pairs are then converted to digital values using a suitable coding table. In this way the luminance information has been highly reduced. As mentioned above an I-frame only comprises intracoded blocks. P- and B- frames include intercoded blocks where the coefficients represent prediction errors instead. In the overhead information of such a frame there is also provided motion vectors related to the intercoded blocks. It should however be noted that P- and B-frames might also contain intracoded blocks. An intercoded block is, as was mentioned above, handled in a similar manner as an intracoded block when being coded. The difference here is that the DCT coefficients do not represent luminance values but rather prediction errors, which are however treated in the same way as the intracoded coefficients. In the quantisation a quantisation step is applied according to q * Qnon-intra(∞, n)/16. Fig. 7 shows the default quantisation values Q_n0n-intra used here. The quantisation step q can be set differently from block to block and can also here vary between 1 and 112. As is indicated above additional information in the form of a watermark is embedded in the different blocks. A typical algorithm is the so-called run-merge algorithm described in WO-02/060182, which is herein incorporated by reference. According to this document, a watermark w, in the form of a pseudo-random noise sequence, is embedded in the blocks of a frame. A watermark is here provided as a number of identical tiles provided over the whole image and where one tile can have the size of 128x128 pixels. The watermark tile is divided into blocks corresponding to the size of the DCT blocks and transformed into the DCT domain and these DCT blocks are then stored in a watermark buffer. In this algorithm the watermark is embedded in the quantised DCT coefficients under the control of a bit-rate controller. The watermark is embedded by adding ±1 to the smallest quantised DCT level. However since many of the signal coefficients are zero an addition of ±1 may lead to an increased bit rate, which is disadvantageous. There is furthermore a risk that the watermark will be visible. Therefore the watermark is embedded such that no modification of the signal is performed if a modification would lead to an increased bit-rate. Only the smallest quantised DCT levels ±1 are turned into a zero according to the watermark. This can be seen as: ^") = 0 and the budget allows it, l£|_n(i, j ) otherwise,

where l_m is the quantised input DCT level w is the watermark and /_out is the resulting watermarked quantised DCT level. When performing this type of watermarking there is however a need for a watermarking process that allows the watermarking energy to be varied more than previously, especially for intercoded blocks in order to enable the provision of more robust watermarks. A media processing device according to the invention is shown in a block schematic in Fig. 8. The media processing device includes a parsing unit 18, a device for embedding additional data 20 and an output stage 22. The parsing unit is connected to the device 20 as well as to the output stage 22, also the device 20 is connected to the output stage

22. The device 20 includes a first processing unit 26, connected to an embedding unit 28 and a second processing unit 30. A watermark buffer 24 is connected to the embedding unit 28. In operation the parsing unit 18 receives a media signal X in the form of a number of video images or frames including blocks with VLC coded code words. The parsing unit separates the VLC coded code words from other types of information and sends the VLC coded code words to the first processing unit 26 of device 20, which processes the stream X in order to recreate the run-level pairs of each block. The parsing unit 18 also separates motion vectors V associated with intercoded blocks and provided in the overhead information of B- and P-frames and provides these motion vectors V to the embedding unit 28, which obtains them in this way. The run-level pairs received by the first processing unit 26, i.e. the quantised DCT coefficient matrix, are then sent to the embedding unit 28, which in this way obtains this matrix. The embedding unit 28 embeds a watermark stored in the watermark buffer, which will be described in more detail later, provides the watermarked DCT matrix to the second processing unit 30, that VLC codes it and provides it to the combining unit 22 for combination with the other MPEG codes. From the combining unit 22 the watermarked signal X' is then provided. The present invention is here described in relation to intercoded blocks, because that is where the principles of the invention are applied. Intracoded blocks are normally handled as outlined in WO-02/060182, but possibly allowing higher or lower levels than ±1 of the watermark coefficients. The embedding unit 28 receives overhead information from the parsing unit 18 indicating that a block processed is an intercoded block as well as a motion vector relating to the movement of the block and therefore performs the method of the invention on such a block. Let us here assume that the object 12 in Fig. 2 and thus block 14 is moved to another position. The motion vector of this block is then used for influencing the watermarking coefficients. A method according to the present invention is shown in a flow chart in Fig. 9 and will now be described. When the embedding unit 28 is notified that the intercoded block 14 is to be received it starts with determining the length of the motion vector V of the block, step 34. In the process of watermarking a row counter j is first set to one, step 36, and a column counter i is set to one, step 38, where the position of row one and column one point at the DC component. This component has to be treated specially. Therefore if both the column and row counters have the value of one, step 40, a DC component handling is performed, step 42. This is handled by a DC component handling unit (not shown) in a known way. When this has been done, the embedding unit 28 continues to investigate the column counter i to see if it has reached its maximum value, step 50. If the row and column values were not equal to one, step 40, it is investigated if the length of the vector V exceeds a threshold TI, step 44, which is indicated by the expression I V I > TI. If it exceeds the threshold, step 44, the watermarking energy is increased and in the present example doubled, step 46, whereupon watermarking of the coefficient follows, step 48. If the vector length did not exceed the threshold TI, step 44, normal watermarking is performed, step 48. In this embodiment more watermarking levels than ±1 are allowed. However it is preferred that the watermark does not raise the signal level and that a watermark is not added to zero level coefficients. It is furthermore preferred to first dequantise the coefficients and then add the watermark to the dequantised DCT coefficients, also here so that the bit-rate is not increased. Watermarking is thus performed in the quantised DCT domain, which means that the energy level of the DCT component c(ij) is changed with the watermark energy level w(i j). Normally the coefficient level is brought closer to a zero level by adding the watermark. The watermark coefficient w(i,j) for the coefficient c(i,j) is taken from the watermark buffer 24, where it is stored in the DCT domain. The watermark coefficient here has a value that defines the amount and direction (i.e. the sign) that the corresponding dequantized coefficient c(i,j) is allowed to change. When the coefficient in question has been watermarked, step 48, it is investigated if the column counter i has reached its maximum value i_max, step 50. If it has not the counter is incremented by one, step 52, and the method returns to step 40, for checking if the coefficient is the DC coefficient or not. If the counter had reached its maximum value, step 50, it is now investigated if the row counter j has reached its maximum value j_max, step 54. If it has not, step 54, the counter is incremented by one, step 56, and the embedding unit 28 returns to step 38 and sets the column counter i to one, step 38. If the row counter has reached its maximum value, step 54, the method is ended, step 58. This method is then repeated for further intercoded blocks of the media signal. The use of the threshold for modifying the embedding depth uses the fact that objects that are moving fast are hard to perceive for the human eye. This means that the watermarking energy can be increased. In this way relatively more energy is allowed to be provided in the watermark, which is of advantage for the robustness of the watermark. This means that the watermark has a higher probability for surviving different attacks, like several different signal conversions. The above described method can be varied. It is possible to have more thresholds that the motion vector is compared with, each providing a different level of increase of the watermark strength, in order to provide a more fine tuned variation for guaranteeing maximum watermark strength while securing non- visibility of the watermark. It is furthermore possible to limit the method to be applied on B-frames, since these frames are not used as a reference for other frames. This means that when an object stops in a frame, it is then ensured that the increased watermark energy is not propagated into this frame and thus remains invisible. This measure furthermore lowers the complexity of the method. The present invention has been described in relation to a watermark embedding unit. This embedding unit is preferably provided in the form of one of more processors containing program code for performing the method according to the present invention. This program code can also be provided on a computer program medium, like a CD ROM 60, which is generally shown in Fig. 10. The method according to the invention is then performed when the CD ROM is loaded in a computer. The program code can furthermore be downloaded from a server, for example via the Internet. It should be emphasized that the term "comprises/comprising" when used in this specification is taken to specify the presence of stated features, integers, steps or components, but does not preclude the presence or addition of one or more other features, integers, steps, components or groups thereof. It should furthermore be realized that reference signs appearing in the claims should in no way be construed as limiting the scope of the present invention.

Claims

CLAIMS:

1. Method of embedding additional data in a media signal (X) comprising a number of frames, where at least one frame comprises motion vectors (V) related to a block (14) of signal samples moved in relation to at least one other frame, comprising the steps of: obtaining a media signal (X) divided into frames (10, 11) having blocks (14) of a number of signal sample values (c(i,j)), obtaining a motion vector (V) for a frame associated with a block of signal samples, and modifying additional data coefficients (w(i j)) to be embedded in said signal samples of said block in dependence of the motion vector, (step 46).

2. Method according to claim 1, further comprising the step of measuring the length of the motion vector (step 34) and modifying the depth of the additional data embedding in dependence of the length (step 46).

3. Method according to claim 2, further comprising the step of comparing the length with a threshold (step 44) and increasing the additional data depth if the threshold is exceeded.

4. Method according to claim 2, wherein the frame is a frame not used for reference for other frames.

5. Method according to claim 1, wherein the media signal is provided in the DCT domain.

6. Device (20) for embedding additional data in a media signal, comprising an embedding unit (28) arranged to: obtain a media signal (X) divided into frames (10, 11) having blocks (14) of a number of signal sample values (c(ij)), obtain a motion vector (V) for a frame associated with a block of signal samples, and modify additional data coefficients (w(i,j)) to be embedded in said signal samples of said block in dependence of the motion vector.

7. Device according to claim 6, wherein the embedding unit is further arranged to measure the length of the motion vector and modify the depth of the additional data embedding in dependence of the length.

8. Media signal processing device (16) comprising a device (20) for embedding additional data according to claim 8.

9. Computer program product (60) for embedding additional data in a media signal, comprising computer program code, to make a computer do, when said program is loaded in the computer: obtain a media signal (X) divided into frames (10, 11) having blocks (14) of a number of signal sample values (c(ij)), obtain a motion vector (V) for a frame associated with a block of signal samples, and modify additional data coefficients (w(ij)) to be embedded in said signal samples of said block in dependence of the motion vector.