WO2009077930A1

WO2009077930A1 - Resolution-independent watermark detection

Info

Publication number: WO2009077930A1
Application number: PCT/IB2008/055186
Authority: WO
Inventors: Mehmet U. Celik; Aweke N. Lemma; Javier F. Aprea
Original assignee: Koninklijke Philips Electronics N.V.
Priority date: 2007-12-17
Filing date: 2008-12-10
Publication date: 2009-06-25
Also published as: TW200941453A

Abstract

The present invention provides a method for performing watermark detection on a signal x comprising a sequence of samples, the samples having a word length L. A method according to the invention comprises determining whether a subsequence xs of x fulfils a watermark attack suspicion criterion corresponding to a specific watermark attack, and in an affirmative applying a counter-algorithm to a part of the signal to produce a detection sequence, and applying a watermark detection algorithm to the detection sequence to determine whether a watermark is present in the signal or not. The invention is applicable for instance in relation to watermarked audio signals or video signals.

Description

RESOLUTION-INDEPENDENT WATERMARK DETECTION

FIELD OF THE INVENTION

The invention relates to watermark detection in a signal, such as an audio or video signal.

BACKGROUND OF THE INVENTION

Watermarking techniques can be used to embed extra information into a signal, such as an audio signal. The goal is to hide specified data carrying some information into the audio stream such that it is not audible to the human ear (i.e., transparent) and is, at the same time, resistant to removal attacks (i.e., robust). Various watermarking systems can easily be tailored for a wide range of applications, and have a good robustness and audibility behavior.

In consumer digital devices, like the CD, the nominal sample width is 16 bits, giving a resolution of approximately ±32,768 levels. Watermarking systems may carry large amounts of payload, and implementations of watermarking algorithms are being deployed with success on CD-quality audio, i.e. on 16-bit stereo samples at 44.1 kHz. In designing audio watermarking algorithms, 16 bits is, at least at the present time, the resolution of choice. Detection of these watermarks on heavily processed audio, including time-scaling and audio compression at bitrates below 64kbps, has been tackled.

However, for high-end audio equipment one finds 24 bits or 32 bits as a usual sample width. At these resolutions, detecting a watermark in real-time becomes complex and resource-intensive. Thus, in the design of a robust audio watermark for playback control, it is desirable to find ways to handle these higher resolutions. Generally, a straightforward solution would consist of using only the 16 most significant bits for detection and providing them to a conventional 16-bit watermark detector. This process is illustrated in Fig.l and will be described hereinafter. Although it decreases the precision of the watermark, the watermark detectability is not significantly affected.

SUMMARY OF THE INVENTION

The inventors have realized that the abovementioned approach has a significant weakness. If an attacker knows that only, say, the 16 most significant bits of a 24- bit audio signal are considered for the detection, the attacker might use an artifice shown in Fig. 2. This attack consists of reducing the resolution of the audio signal from, say, 24 bits to, say, 16 bits by replacing, in this case, the 16 least significant bits by the 16 most significant bits and using zeros for the 8 most significant bits. Roughly speaking, this corresponds to a down-scaling of the samples by a factor of 2⁸. The word length remains at 24 bits. The audio signal has a reasonable quality - all one would need to do was to increase the volume. However, when this down-scaled signal is further quantized to 16 bits as mentioned above, effective word length will be only 8 bits and the watermark detection performance will be significantly degraded.

The present invention provides an elegant method of countering this potential attack. According to the invention, a signal is selectively pre-processed before being provided to a watermark detector. The method is independent of the signal word length and watermark word length. The invention will be described with reference to an audio signal. However, the invention is applicable to any signal that takes a similar shape and uses the same watermarking principles. This includes for instance video signals.

In a first aspect, the invention provides a method for performing a watermark detection on a signal x comprising a sequence of signal samples, the signal samples having a word length L , the method comprising:

(a) determining whether a subsequence x_s of x , x_s comprising at least two samples, fulfils a watermark attack suspicion criterion corresponding to a specific watermark attack, and in an affirmative: applying a counter-algorithm to a part of the signal to produce a detection sequence x_d , wherein at least some elements of the detection sequence x_d correspond to samples from the signal x ; and applying a watermark detection algorithm to the detection sequence x_d to determine whether a watermark is present in the signal x or not.

The detection sequence comprises at least two samples. Often, the length of the detection sequence will correspond to the watermark symbol length or repetition period.

A subsequence of samples is, for example, a sequence of 10 successive samples of an audio signal as illustrated by element 201 in Fig. 2. As another example, a subsequence could consist of the samples 1, 3, 5, 7, and 9. The term "watermark attack" refers to an attempt to disable a watermark in a signal by applying algorithmic steps to provide a new signal representing the original audio signal but having a weakened watermark.

The watermark suspicion criterion is one that is suitable for the specific watermark detection algorithm - an attack suspicion would arise under different conditions for different watermarking schemes.

The invention is in part conceived to address issues related to detecting a watermark in a signal without requiring use of all the bits in each of the signal samples. Thus, the watermark detection algorithm typically is of a type that operates on samples or sample parts that can be represented by a word length, M , that is shorter than L . Accordingly, the elements in detection sequence x_d have the same word length. For instance, it is presently desirable to be able to handle watermark detection using only 16 bits out of 24 bits, as described above.

To obtain the best audio quality available for a given word length, the average playback volume is adjusted in such a way that the full resolution is used. In other words, the audio is quantized in such a way that the maximum available amplitude levels are used, and thus there will be one or more subsequences that use all available bits (such as 24 bits).

Accordingly, a useful suspicion criterion is that the samples in the subsequence can be represented using a word length which is shorter than the audio signal word length L . If the maximum amplitude in the subsequence can be represented using a word length which is shorter than L , the subsequence might of course simply correspond to a quiet section. It is, however, also a possibility that a watermark attack has been applied.

As is well known, binary words can be interpreted in different ways. For instance, a word length of 8 bits can be interpreted for instance as unsigned integers from 0 to 255, or as numbers from -128 to +127, the two's complements . Using unsigned integers, the word length can be shortened by deleting leading zeros. For instance, the 8-bit representation of the number 33 is "00100001", which can be shortened to the 6-bit word "100001" by deleting the two zeros at bit positions 7 and 8.

When interpreting the words as unsigned integers, the suspicion criterion "the samples in the subsequence can be represented using a word length which is shorter than L " is therefore fulfilled if one or more most significant bits are zero for all samples in the subsequence (let such a number be referred to as N ). The fulfillment of this criterion may indicate that the audio signal samples has been down-quantized by N bits and subsequently been inserted into L -bit words. When using two's complements, the word length can be shortened by deleting all leading zeros, or by deleting all leading 1 's except one. For instance, -63 in two's complements is represented by "11100001" and can be represented by the 6-bit word "100001", which is obtained by deleting two out of the three leading 1 's. A suspicion criterion when the audio samples are represented in two's complements may therefore be considered fulfilled if a highest amplitude in the subsequence can be represented either by a word having at least one leading zero or by a word having at least two leading 1-bits. Thus, when the audio samples in the subsequence are represented using two's-compliments, a suspicion criterion is that there exists an N , 0 < N < L , for which the N most significant bits are zero for all samples in the subsequence, or for which the N + 1 most significant bits are ones (i.e. indicating negative values) for all samples in the subsequence.

It might be advantageous to not analyze each sample, but only some, to reduce energy consumption (processing power, RAM I/O, etc.). A suspicion criterion might for instance be that the maximum value of all the samples in a given subsequence consisting of every third sample of a given segment of the audio signal can be represented with words shorter than L bits.

After detection that a subsequence fulfils a suspicion criterion, a counter- algorithm is applied in what can be considered as an attempt to re-establish parts of the original watermarked audio signal on which x might be based. A detection sequence is provided, and the watermark detection algorithm applied thereto.

A counter-algorithm provides a detection sequence of elements x_d [i ] derived from corresponding samples in x , where the corresponding samples are representable using the word length K . As a simple but useful approach, the elements in x_d are down- quantizations of samples in a sequence of adjacent samples from the audio signal x suspected to have been attacked.

Not all bits of the corresponding samples in x are necessarily used in the detection sequence. Instead, in a general formulation the J most significant bits of at least some of the elements of x_d which have corresponding elements in x correspond to the J most significant bits of a AT -bit representation of said corresponding elements. Here,

J > 1 . In other words, only a number of most significant bits are used in the detection. The J most significant bits in the elements of x_d may for instance be equal to the corresponding J bits in the K -bit representations. It might be useful if the counter-algorithm provides a detection sequence x_d that has at least one sample x_d

which corresponds to a signal sample which is separated from a sample range defined by the subsequence by at most two symbol lengths of said watermark, or is within a distance corresponding to at most two repetition periods of said watermark. This allows for a determining that an attack has been performed on a first segment, and then for applying the watermark detection algorithm on a following section without having to store the first segment. The distance may be shorter or longer, such as one symbol length/repetition period or three symbol lengths/repetition periods. The optimal number can be considered as a tradeoff between detection performance and power consumption. Furthermore, performing the detection in this kind of vicinity to the subsequence fulfilling the suspicion criterion increases the likelihood of finding a watermark, if one has been embedded. This approach is particularly useful if the detection is performed in real time during playback.

In the present context, the "a sample range defined by a subsequence" shall be understood as including those samples in x that fall between an earliest sample in the given subsequence and a latest sample in the given subsequence, as well as said earliest and the latest sample themselves. As an example, samples 4, 5, 6 and 7 in Fig. 2, fall within the sample range defined by a subsequence that consists of samples 4 and 7. The same sample range is defined by the subsequence consisting of samples 4, 6 and 7.

Typically, audio and watermarks are quantized using words of lengths that are multiples of 8. Thus, one will often be handling a situation where L = 24 or 32, and M = 16. This addresses the audio formats and watermarks often applied today.

The length of the sequence used for detection is influenced by the watermark detection algorithm to be applied. Advantageously, the sequence is at least comparable to a symbol length/repetition period of the expected watermark. A second aspect of the invention provides a computer program product which, when executed on suitable computing hardware, carries out at least one of the methods described above.

A third aspect of the invention provides computing hardware enabled to perform at least one of the methods described above.

BRIEF DESCRIPTION OF THE DRAWINGS

Embodiments of the invention will be described, by way of example only, with reference to the drawings. Fig. 1 illustrates the process of limiting a 24-bit word (element 101) to a 16-bit word.

Fig. 2 illustrates an attack on a watermark by shifting the 16 most significant bits of a number of audio samples downwards. Fig. 3 is a flowchart describing an embodiment of a watermark detection method in accordance with the invention.

Fig. 4 illustrates how the original audio signal attacked as illustrated in Fig. 2 could be partly recovered by left-shifting the 8 most significant bits.

Fig. 5 exemplifies a detection sequence for detecting a watermark in a subsequence of samples.

Fig. 6 illustrates an attack where 24 bits are quantized to 20 bits. Fig. 7 illustrates an attack where the signal is down-quantized from 24 bits to 15 bits.

DESCRIPTION OF EMBODIMENTS

Fig. 1 illustrates the process of limiting a 24-bit word (element 101) to a 16-bit word (element 103), including addition (step 111) of an 8-bit dither (element 102). After addition of the dither, the signal is quantized, in step 112, shortening the word length. As described previously, when using an unsigned interpretation, shortening of the word length can be performed simply by retaining only the most significant bits. If the highest amplitude in a sequence of 8-bit words is "00000110" (decimal 6), then the sequence can be represented using just three bits (i.e. 8 levels, from 0 to 7), namely the three least significant bits, simply disregarding the most significant O-bits. When using two's complements, a "low" number has either one or more leading zeros as most significant bits, or multiple 1 's. If the highest amplitude in a sequence of 8-bit two-complements words is "11111101" (decimal -3), 3 bits are sufficient for representing the sequence, since 3 bits provides 8 levels from -4 to +3. Decimal -3 will be represented by "101".

Step 113 illustrates fitting the 16-bit word 103 in to a 24-bit word, 104. This may for instance be done by simply inserting the 16 bits into the 16 least significant bits of the 24-bit word, as illustrated.

Fig. 2 illustrates an attack on a watermark by shifting the 16 most significant bits 202 downwards by 8 bits for a segment of 10 samples (element 201), thereby obtaining a new segment, element 203. The 8 most significant bits 204 in the new segment 203 are zero therefore zero, and if the watermark detector uses the 16 most significant bits for detection, the watermark in the modified segment will be severely degraded. The cost is a reduction of the quality of the signal, though in this example only to a resolution similar to that of CD.

Fig. 3 shows a flowchart describing an embodiment of a watermark detection method in accordance with the invention. After initiating the detection (step 301) the samples of the audio signal are traversed by providing subsequences of samples in step 303. In terms of Fig. 2, a first subsequence could comprise samples 1, 2, and 3. A next subsequence could comprise samples 2, 3 and 4, etc. Alternatively, the first subsequence could comprise samples 1, 3 and 5, the second subsequence samples 3, 5 and 7, etc. The possibilities are endless, but some approaches are more advantageous in terms of speed and processing power than others. In step 305, it is determined whether the subsequence of samples fulfils the suspicion criterion that is being applied. If the criterion is not fulfilled, new samples come under consideration. When using unsigned integers, the detection can be expedited by bitwise OR'ing the samples of the subsequence. Bitwise OR'ing the samples in the subsequence consisting of samples 1 , 2 and 3 in 203 results a 24-bit word in which the 8 most significant bits are zero, which fulfils one suspicion criterion discussed above.

In that case, a counter-algorithm is applied, step 307, to form a detection sequence. The detection sequence may for instance, advantageously, be a down-quantization of the subsequence itself or the sample range that it defines (in case the subsequence consists not of adjacent samples in the audio signal, but perhaps of every 2^nd sample). Finally, the watermark detection algorithm is applied, in step 309, to the detection sequence to determine a presence or not of a watermark. Step 311 and 313 are included to exemplify how the presence of a watermark may be used to control further actions. Step 313 might for instance consist of determining whether a given player is permitted to play music that contains the detected watermark, and if not, stop the playback. Applying the suspicion criterion that the samples in the subsequence can be the represented using a shorter word length, step 305 in Fig. 3 will reveal that in fact, the subsequence 203 in Fig. 2 can be represented using 16 bits, 8 less than are available (namely 24). Fig. 4 illustrates how the original audio signal could be partly recovered by left-shifting the samples by 8 bits, the number of bits that have been determined to be zero. This results in a sequence 401 having the original most significant bits 202 as most significant bits and least significant bits 402, illustrated with zeros (their values are of little importance). The watermark detection algorithm, would now detect a watermark, if one was embedded in the 16 most significant bits of the original signal. According to some embodiments of the invention, not all available bits are used for producing the detection sequence. Modifications are available. For instance, bits 9 through 16 in 401 might instead be 0. The detectability of the watermark might be reduced, but the modification might be desirable for other reasons, for instance in order to reduce the number of bit manipulations required. In the example just mentioned, where bits 9 through 16 have the value 0 rather than the values 202, the process of producing the detection sequence involves fewer bit manipulations since we need to copy only bits 9 through 16 of part 202 rather than all 16 least significant bits 202. Fig. 5 illustrates a detection sequence 501 appropriate in the present example. The sequence 401 is transformed into a signal readable by the watermark detection algorithm, which in this case is assumed to be a 16-bit detector accepting 16-bit samples. The 16 most significant bits are derived 513 to form the detection sequence 501, as illustrated by step 511. The watermark detection algorithm can then be applied to the detection sequence 501 (step 309 in Fig. 3) in order to determine the presence or not of a watermark. As described in relation to Fig. 4, minor modifications might be introduced. Rather than producing an exact copy of the bits, step 513 might produce a sequence in which the three least significant bits are simply set to zero (the three right-most columns in 501 would be 0).

The forming of a detection sequence as described in the example above does not imply that a certain approach shall necessarily be used to determine the detection sequence. The watermark algorithm might instead operate on other bit lengths, step 513 in that case resulting in a sample sequence having a number of bits different from, in this case, 16. Step 513 might also form part of the watermark algorithm, in which case the 24-bit long words of 401 might be suitable for direct transmission to the watermark detection algorithm. In the latter case, the detection sequence is instead sequence 401 (in Fig. 4). The invention is not limited to certain down-quantization lengths. Fig. 6 illustrates an attack where 24 bits are quantized to 20 bits by shifting (step 611) the original audio signal (element 202) 4 bits to the right. In this way, the 20 most significant bits of 201 become the 20 least significant bits in the modified audio signal 603. The four most significant bits in 604 are zero. The method according to the first aspect can determine that the 4 most significant bits (element 604) in the audio signal 603 are zero in all the samples, which fulfils one of the suspicion criteria described previously. The counter-algorithm may for instance shift the bits up by 4 bits to re-establish or almost re-establish the original 20 most significant bits of 201. Applying the watermark detection algorithm to section 602 will likely detect a watermark if one was embedded.

Fig. 7 illustrates an attack where the signal is down-quantized from 24 bits to 15 bits, in this case by right-shifting (step 711) the signal 201 by 9 bits. The invention will determine that the signal only contains 15 significant bits, for instance by detecting that the 9 most significant bits are zero (element 704), and the counter-algorithm will form a detection sequence by left-shifting the samples by 9 bits and extracting the 16 most significant bits to be provided to the watermark detection algorithm. Note that in the present example, bit 9 in the samples in the detection sequence (comprising the bits 702) is zero. Only 15 bits are available to the pre-defined detection algorithm. This increases the likelihood of not detecting a watermark that was actually present in the original signal 201. However, the quality of the signal for playback is also reduced to only 15 bits. This tradeoff goes on. If an attack reduces the number of bits to for instance 8 bits, the counter-algorithm will provide a detection sequence in which only the 8 most significant bits are non-zero. The detectability is reduced significantly, but the playback quality of the signal is also very low.

The invention or some features of the invention can be implemented as computer software running on one or more data processors and/or digital signal processors. The elements and components of an embodiment of the invention may, where applicable, be physically, functionally and logically integrated. The disclosed embodiments are set forth for purposes of exemplification. The selection of embodiments does not imply any sort of limitation. In the claims, the term "comprising" does not exclude the presence of other elements or steps. Additionally, although individual features may have been included in only some of the claims, these may possibly be advantageously combined, where applicable. Thus, the claim dependencies do not in any way imply that a combination of features is not feasible and/or advantageous and/or within the scope of the invention. In addition, singular references do not exclude a plurality. Thus, references to "a", "an", "first", "second" etc. do not preclude a plurality unless specifically stated.

Claims

CLAIMS:

1. A method for determining if a watermark is present in a signal, x , comprising a sequence of signal samples, the samples having a word length L , the method comprising: (a) determining whether a subsequence x_s of x , x_s comprising at least two samples, fulfils a watermark attack suspicion criterion corresponding to a specific watermark attack, and in an affirmative: applying a counter-algorithm to a part of the signal to produce a detection sequence x_d , wherein at least some elements of the detection sequence x_d correspond to samples from the signal x ; and applying a watermark detection algorithm to the detection sequence x_d to determine whether a watermark is present in the signal x or not.

2. A method according to claim 1, wherein the watermark detection is of a type that operates on samples having a word length, M , shorter than L , and a word length of elements in the detection sequence x_d is M .

3. A method according to claim 2, wherein a suspicion criterion is that the samples in the subsequence can be the represented using a word length, K , which is shorter than L .

4. A method according to claim 1, wherein a suspicion criterion is that there exists an N , 0 < N < L , for which the N most significant bits in all samples of the subsequence are zero.

5. A method according to claim 1, wherein if the samples of the subsequence are two's-compliments, a suspicion criterion is that there exists an N , 0 < N < L , for which the N most significant bits in all samples of the subsequence are zero, or for which the N + 1 most significant bits are ones for all samples in the subsequence.

6. A method according to claim 3, wherein the counter-algorithm provides a detection sequence of elements x_d derived from corresponding samples in x , the corresponding samples being representable using the word length K .

7. A method according to claim 3, wherein J most significant bits in at least some of the elements of x_d which have corresponding elements in x correspond to the J most significant bits of a AT -bit representation of said corresponding elements, where J ≥ l .

8. A method according to claim 1, wherein the counter-algorithm provides a detection sequence x_d that has at least one sample x_d

which corresponds to a signal sample in x being separated from the sample range defined by the subsequence by at most two symbol lengths of said watermark, or is within at most two repetition periods of said watermark.

9. A method according to claim 1, wherein the length of the detection sequence x_d corresponds to at least a symbol length/repetition period of the watermark to be detected.

10. A method according to claim 1, wherein the signal comprises an audio signal and/or a video signal.

11. A computer program product which, when executed on suitable computing hardware, carries out at least one method in accordance with one of claims 1-10.

12. Computing hardware enabled to perform at least one method in accordance with one of claims 1-10.