WO2010135082A1

WO2010135082A1 - Localized weak bit assignment

Info

Publication number: WO2010135082A1
Application number: PCT/US2010/033657
Authority: WO
Inventors: Wenyu Jiang; Claus Bauer
Original assignee: Dolby Laboratories Licensing Corporation
Priority date: 2009-05-19
Filing date: 2010-05-05
Publication date: 2010-11-25

Abstract

A target hash value is identified. Identifying a target hash values includes computing a query hash value, partitioning the query hash value into at least a first portion and a second portion, and locally assigning weak bits within the first portion of the query hash value and weak bits within the second portion of the query hash value. The method further includes determining one or more variations of the query hash value with toggling one or more weak bits in the first portion and one or more weak bits in the second portion of the query hash value; and identifying a target hash value that is identical to a variation of the query hash value.

Description

LOCALIZED WEAK BIT ASSIGNMENT

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] This application claims priority to United States Patent Provisional Application No. 61/179,706, filed 19 May 2009, hereby incorporated by reference in its entirety.

TECHNOLOGY

[0002] The present invention relates generally to localized weak bit assignment. More specifically, embodiments of the present invention relate to locally assigning weak bits to each portion of a hash value when searching for the hash value.

BACKGROUND

[0003] Media clips or media content are segments of audio media, video media, or audio/visual (AV) media and include information that is embodied, stored, transmitted, received, processed, or otherwise used with at least one medium. Common media clip formats include FLV format (flash video), Windows Media Video, RealMedia, Quicktime, MPEG, MP3, and DivX. As used herein, the terms "media clips", "media content," "information content," and "content" may be used interchangeably.

[0004] Media clips may be defined with one or more images. For example, video media may be a combination of a set of temporally related frames or images at particular points in time of the video media. Additionally, audio media may be represented as one or more images using many different techniques known in the art. For example, audio information may be captured in a spectrogram. In the spectrogram, the horizontal axis can represent time, the vertical axis can represent frequency, and the amplitude of a particular frequency at a particular time can be represented in a third dimension. Further, in a two dimensional spectrogram, the amplitude may be represented with thicker lines, more intense colors or grey values. Those skilled in the art will appreciate that many different modifications to the above example and other representations may be used to represent an audio clip as an image. [0005] Images that define media content (audio and/or video) may be associated with a corresponding fingerprint ("fingerprint" used interchangeably with and equivalent to "signature"). Some fingerprints of media content may be derived (e.g., extracted, generated) from information within, or which comprises a part of the media content. A media fingerprint embodies or captures an essence of the media content of the corresponding media and may be uniquely identified therewith. Video fingerprints are media fingerprints that may be derived from images or frames of a video clip. Audio fingerprints are media fingerprints that may be derived from images with embedded audio information (e.g., spectrograms). Further, the term media fingerprint may refer to a low bit rate representation of the media content with which they are associated and from which they are derived. [0006] Most applications of content identification using media fingerprints rely on a large database of media fingerprints. Any query fingerprint that is extracted from query media is compared against this database of media fingerprints to identify one or more closest matches. As the size of database increases in terms of number of hours of media, it is desirable that the uniqueness of fingerprint codewords is not reduced. A fingerprint codeword (also known as, and to referred to herein as a hash value, a signature, or a sub-fingerprint) generally represents a sequence of fingerprint bits that is used for indexing (e.g., in a hash table) the media fingerprints. The fewer number of fingerprints/media files that correspond to a hash value, the more unique the hash value is. This uniqueness property of the hash values allows for scaling of the fingerprint database to a large number of hours. However, if certain hash values are more likely to occur than others, then as the database size grows the uniqueness reduces since the more likely hash values will each link to a large number of fingerprints/media files. The large number of fingerprints/media files corresponding to hash values results in more computations to perform content identification. For example, in a hash-table based searching method a hash value of a query fingerprint may be used to identify all fingerprints/media files in a fingerprint database that are linked to the same hash value. Multiple fingerprints/media files being linked to the same hash value is referred to as collisions. The larger the number of collisions (e.g. , fingerprints/media files) for the same hash value, the greater the computations required to determine which one of the fingerprints/media files corresponding to the fingerprint codeword are equivalent or the best match to the query fingerprint. The fewer the number of collisions (e.g. , fingerprints/media files) for the same hash value, the lesser the computations required to determine which one of the fingerprints/media files corresponding to the hash value are equivalent or the best match to the query fingerprint. Thus, the fingerprints that have a small number of average collisions per hash value will result in shorter search duration. Such fingerprints are scalable for searching through a larger database of fingerprints than fingerprints for which the average number of collisions is higher.

[0007] Further, the hash-table look-up based matching could be easily misguided with a single bit- flip in the derived signature of the query media content (e.g., bit- flip caused by modification of an original media content to obtain the query media content). A notion of global weak bit assignment may be used when comparing signatures of reference and query media content using a hash table based look-up. For example, when using weak bits, a subset of S signature bits is globally selected from all the signature bits in a signature derived from query video and marked as weak. The selection of S signature bits for a signature or hash value may be determined by globally identifying S signature bits from all the signature bits in a signature that are most likely to flip when media content is processed. Given the knowledge of these S "weak" bits, variations of the query signature may be obtained by toggling the S weak bits. For example, if each bit may be assigned one of two values (e.g. , 0 and 1), all 2^s possible variations of the query signature may be tried while performing the hash-table look-up to find the hash entry of the target matching signature in the database. The target matching signature is then used to identify the corresponding target media content in the database, from which the query media content was derived or which is identical to the query media content.

[0008] The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section. Similarly, issues identified with respect to one or more approaches should not assume to have been recognized in any prior art on the basis of this section, unless otherwise indicated.

BRIEF DESCRIPTION OF DRAWINGS

[0009] The present invention is illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which like reference numerals refer to similar elements and in which:

[00010] FIG. 1 depicts an example method of global weak bit assignment; [0010] FIG. 2 depicts an example method for localized weak bit assignment, according to an embodiment of the present invention;

[0011] FIG. 3 depicts an example method for searching for a query hash value, according to an embodiment of the present invention;

[0012] FIG. 4 depicts an example data structure, according to an embodiment of the present invention;

[0013] FIG. 5 depicts a block diagram that illustrates a computer system upon which an embodiment of the present invention may be implemented; and

[0014] FIG. 6 depicts an example IC device, according to an embodiment of the present invention.

DESCRIPTION OF EXAMPLE EMBODIMENTS

[0015] The example embodiments described herein relate to locally assigning weak bits to each portion of a hash value when searching for the hash value. In the following description, for the purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the present invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form in order to avoid unnecessarily obscuring the present invention. [0016] Example embodiments are described herein according to the following outline:

1.0 General Overview

2.0 Localized Weak Bit Assignment

3.0 Derivation of Hash Values

4.0 Reusing Empty Entries in a Hash Table

5.0 Building a Database Storing Hash Values

6.0 Implementation Mechanisms — Hardware Overview

7.0 Equivalents, Extensions, Alternatives, and Miscellaneous

1.0 GENERAL OVERVIEW

[0017] In one or more embodiments, a method includes computing a query hash value, partitioning the query hash value into at least a first portion and a second portion, locally assigning weak bits within the first portion of the query hash value and weak bits within the second portion of the query hash value. The method further includes determining one or more variations of the query hash value with toggling one or more weak bits in the first portion and one or more weak bits in the second portion of the query hash value and identifying a target hash value that is identical to a variation of the query hash value, [0018] Identifying the target hash value that is identical to the variation of the query hash value may include determining an index value, of an array, based on a variation of the first portion of the query hash value, wherein data is associated with the index value of the array, identifying a second portion of the target hash value in the data associated with the index value of the array, determining that the second portion of the target hash value is identical to the second portion of the variation of the query hash value.

[0019] Locally assigning weak bits within the first portion and the second portion of the query hash value may include selecting a first predetermined number of weak bits from the first portion and selecting a second predetermined number of weak bits from the second portion.

[0020] Partitioning the query hash value into at least a first portion and a second portion may include partitioning the query hash value into equal sized portions or unequal sized portions. A ratio of weak bits to other bits may be similar in the first portion and the second portion. [0021] In one or more embodiments, prior to partitioning the query hash value, bits may be reordered based on a predetermined reordering scheme. The reordering scheme may include transferring hash bits that are expected to have a high bit error rate from the first portion of the hash value to the second portion of the hash value.

[0022] In one or more embodiments, a method includes deriving consecutive hash values from a plurality of consecutive media content frames, selecting at least a portion from each hash value of the consecutive hash values to obtain a plurality of portions extracted from consecutive hash values, and generating a new hash value based on the plurality of portions extracted from consecutive hash values.

[0023] In one or more embodiments, a method includes computing a query hash value, partitioning the query hash value into at least a first portion and a second portion, identifying a first array index of an array based on the first portion of the query hash value, determining a second array index of the array based on an offset stored in data associated with the first array index, and determining that the second portion of the query hash value is stored in data associated with the second array index.

[0024] The method may further include determining that the second portion of the query hash value is not stored in data associated with the first array index prior to determining the second array index based on the offset stored in data associated with the first array index. [0025] In one or more embodiments, a method includes obtaining a plurality of hash values, partitioning each hash value in the plurality of hash values into at least a first portion and a second portion, sorting each hash value in the plurality of hash values into a plurality of groups based on the first portion of that hash value. The method further includes subsequent to sorting each hash value in the plurality of hash values into the plurality of groups, storing each group of hash values. Storing each group of hash values may include identifying a data structure associated with the group of hash values; and storing the second portion of each hash value in the data structure associated with the group. Each hash value in the plurality of values may be associated with a corresponding pointer value and the second portion of each hash value may be stored in the data structure with the corresponding pointer value of that hash value.

[0026] Other embodiments of the invention may include a system and computer readable storage medium with functionality to execute the steps described above. 2.0 LOCALIZED WEAK BIT ASSIGNMENT

[0027] Examples of possible embodiments, which relate to locally assigning weak bits to each portion of a hash value when searching for the hash value, are described herein. In the following description, for the purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the present invention. It will be apparent, however, that the present invention may be practiced without these specific details. In other instances, well-known structures and devices are not described in exhaustive detail, in order to avoid unnecessarily occluding, obscuring, or obfuscating the present invention. Furthermore, although the invention is described below in relation to hash values derived from media content, one or more embodiments of the invention may apply to any use of hash values such as finding items in a database, detecting duplicated or similar records in a large file, finding similar stretches in DNA sequences, etc.

[0028] In an embodiment, hash values generally represent any number or set of numbers that are computed using a well-defined procedure or mathematical function (which may be referred to as a hash function) that is applied to possibly larger or variable-sized data. For example, a hash value corresponding to a fingerprint (or sub-fingerprint) of media content may be derived based on one or more features in the media content. Each numerical value computed from one or more features in the media content may be compared to a threshold value to determine the corresponding hash bit (e.g., 1 if the numerical value meets the threshold value or 0 if that numerical value doesn't meet the threshold value). However, a modification to the media content may result in modification of one or more features that are used to calculate the numerical value. Thus, a numerical value that is close to the threshold value, may easily cross over the threshold line with a modified feature. As a result, the corresponding hash bit for the numerical value may easily flip when the media content is modified. A hash bit that easily flips may be referred to as a weak bit, an unreliable bit or a bit with a low confidence measure.

[0029] A global assignment of weak bits involves selecting weak bits from the entire hash value. For example, if a hash value has 36 bits, the 6 weakest bits (e.g., corresponding to six numerical values that were closest to the threshold value for determining the corresponding hash bit) may be selected as weak bits. Figure 1 depicts an example method of global weak bit assignment. Hash value (102) has 8 bits numbered 1 through 8 with possible values 0 and 1, of which three bits are identified as the weakest bits in a global weak bit assignment (Step 104). The three weak bits (e.g., bit #1, bit #2, and bit #8) may then be toggled to determine 2³ = 8 possible permutations of the hash value (102). The eight possible permutations of the hash value are then searched for, in a database, to identify a target hash value. If a target hash value is found, then the target media content fingerprint, associated with the target hash value, is compared with the query media content fingerprint associated with query hash value (102). [0030] Figure 2 depicts an example method for localized weak bit assignment. As an optional step, the hash bits in the hash value (204a) may be reordered (Step 252). The hash bits may be reordered according to a predetermined reordering scheme based on training data. For example, a training set of hash values may be derived from a training set of images, and a determination may be made that particular hash bits in the hash values frequently tend to be weak bits or have a high bit error rate (BER). This knowledge of where weak bits generally fall based on the hash function applied to a set of training data (e.g., media content features), may be used to determine the reordering scheme. In addition, knowledge of the partitioning of the hash value into portions, may also be used to reorder the weak bits. In an embodiment, the reordering scheme may use the knowledge from the training set and the knowledge of the hash value partitioning to reorder hash bits. In the exemplary Figure 2, bit #2 and bit #7 are switched according to a reordering scheme applied to each hash value and previously determined based on a training set of data. The same reordering scheme of hash bits that is determined from a training set and performed in Step 252 on the hash value (204a) to obtain the reordered hash value (204b) may be applied to all hash values that are stored into the database or hash values which are searched for in the database for consistency. [0031] In one or more embodiments, the hash value (204b) is partitioned into two or more portions (Step 254). Partitioning of hash value (204b) may include designating a portion of the hash value (204b) as part of a first portion (204c) and designating another portion of the hash value (204b) as part of a second portion (204d). Partitioning of the hash value (204b) may or may not involve separately storing different portions of the hash value, or storing the different portions under different variables. Although the partitioning shown in exemplary figure 2, shows partitioning the hash value into two mutually exclusive portions, one or more embodiments of the invention may involve partitioning the hash value (204b) into any number of portions. Furthermore, one or more bits may be overlapping in different portions. For example, the first portion may include bit#l, bit#7, bit#3, bit#4, and bit#5 and the second portion may include bit #5, bit #6, bit#2, and bit#8.

[0032] In an embodiment, the different portions (e.g. , 204c and 204d) of the hash value (204b) are each locally assigned weak bits (Step 256). Locally assigning weak bits in a portion of a hash value may include assigning weak bits to a predetermined number of bits within the hash bits of that portion. For example, for a 64 bit hash value that is partitioned into two 32 bit portions, 8 weakest bits within each portion may be labeled (or assigned as) weak bits. As a variant of the above example, we may allocate up to 8 weakest bits from each 32-bit portion, but not label them as weak bits if their corresponding features' distances from the threshold that is used for converting features to hash bits exceeds a pre-determined value. In exemplary Figure 2, one weak bit is to be assigned to weakest bit in the first portion (204c), and two weak bits are to be assigned to the weakest bits in the second portion (204d). The weakest bit in the first portion (204c) is determined to be bit #1. Furthermore, the two weakest bits in the second portion (204d) are determined to be bit #6 and bit #8. [0033] For example, since the number of weak bits in a portion directly correspond to the number of variations for that portion, the assignment of hash bits may involve decreasing the number of weak bits to decrease the number of variations that have to be enumerated. As discussed below with relation to Figure 3, different types of data structures may be used for storing different portions of the hash value.

[0034] Figure 3 depicts an example method for searching for a query hash value. One or more steps shown in Figure 3 may be omitted altogether, or reordered. In an embodiment, a query hash value is received as input from a user, a program, or from any suitable source (Step 350). The query hash value received may be associated with any application that manages data using hash values.

[0035] In one or more embodiments, Step 352-Step 356 are similar to Step 252-Step 256 described above, for reordering hash bits in the query hash value, partitioning the query hash value into at least two portions, and locally selecting a subset of weak bits from the hash bits in the first portion and from the hash bits in the second portion. Variations of each portion are then determined and may be searched for, as described below.

[0036] In an embodiment, an index value of an array is determined based on a variation of the first portion of the query hash value (Step 358). The variation of the first portion of the query hash value is obtained by toggling the locally assigned weak bits in the first portion of the query hash value. The possible variations of the first portion are equivalent to P^s, where P is the number of possible values the bit can have (for example, P = 2 for a binary bit and P > 2 for a non-binary bit) and S is the number of weak bits. For example, if a first portion of the query hash value has 16 bits where four bits are determined to be weak bits that can be either 0 or 1, then 2⁴ = 16 variations of the query hash value are possible. For each variation an index value of an array may be determined. Determining the index value of the array may involve converting a binary hash value portion to a decimal value corresponding to the array index.

[0037] Thereafter a determination is made whether data is associated with the index value of the array (Step 360). Associated data may be stored at that index value of the array or a pointer to the associated data may be stored at that index value of the array. For example, a pointer at an array index may point to a tree based data structure in which the 2^nd portion of each hash value, with the first portion equal to the index value, may be stored. Accordingly, in an embodiment, different portions of the hash value are stored in different types of data structures or implicitly stored based on location in an array. Figure 4 depicts an example data structure, in accordance with one or more embodiments. As shown in Figure 4, a data flag (402) may be used to indicate whether data (406) is associated with the index value in the array index (404). In another example, a separate array may be used to indicate whether the data array holds data at the particular index. If the index value was equal to T, then Array Index T of the array index (404) would be identified. Furthermore, data including the second portion of hash values B, C, and G would be identified in associated of array index T. In this example, each of the hash values B, C, and G have a first portion equivalent to T. However, T does not necessarily have to be stored as this information may be implicitly known based on the array index value. Accordingly, only a portion of the hash value needs to be stored in the data structure.

[0038] If data is not associated with the array index based on the first portion, another variation of the first portion may be identified by toggling another weak bit in the first portion. However, if data is associated with the array index based on the first portion, a variation of the second portion of the query hash value may be determined by toggling weak bits in the second portion of the query hash value (Step 362). If a variation of the second portion of the query hash value is found in the data associated with the array index (Step 364), then a match is found (Step 370). If a variation of the second portion of the query hash value is not found in the data then a determination is made whether another variation is possible (Step 366). In an embodiment, the comparison of the variations of the second portion of the query hash value with the data found at an array index may be dynamic. For example, if the number of variations are fewer than the hash value portions stored in association with the index value, then each of the variations may be searched for in the data. However, if the number of hash value portions stored in association with the index value are fewer than the number of variations that are possible, then the hash value portions may be searched for in the list of possible variations of the second portion of the query hash value. For example, a search may be performed by bitwise-XORing the second portion of a stored hash value with the second portion of the query hash value and then bitwise- ANDing the result with the weak bit pattern in negation (where a weak bit is designated as a "0" bit, other bits designated as T) in the second portion of the query hash value. A final output of 0 means the second portion of that stored hash value is within the list of possible variations. Additional variations of the first portion may also be exhausted by checking other variations of the first portion (Step 368) in a search for the query hash value. If all variations are exhausted without finding a match, then a determination is made that the match is not found (Step 372).

3.0 DERIVATION OF HASH VALUES

[0039] In one or more embodiments, hash values may be computed by combining hash values derived from media content. Combining hash values may increase the randomness property of the hash value, thus decreasing collisions of different fingerprints for the same hash value. For example, consecutive hash values, which are hash values derived from consecutive frames of media content, may be combined to obtain new hash values. One method of combining hash values may include taking a portion of bits from each hash value of a consecutive set of hash values and concatenating the portions from the different hash values to obtain a new hash value. For example, if the hash index is 32-bits, 4 bits may be taken from each set of 8 consecutive hash values and concatenated to form a new 32 bit hash value. Furthermore, any other suitable combination may be used in order to obtain a new hash value from other hash values derived from media content.

4.0 REUSING EMPTY ENTRIES IN A HASH TABLE

[0040] In an embodiment, empty entries in a hash table are reused. As discussed above, with reference to Figure 4, in one or more embodiments, storing a hash value comprises partitioning the hash value into at least two portions, and storing only the second portion of the hash value in association with an array index corresponding to the first portion of the hash value. In an embodiment, an array index that is not associated with any data may be used to used to store overflow data. For example, with reference to Figure 4, Array Index S is not associated with any data because no hash values are stored in the data structure which have a first portion that corresponds to S. Accordingly, Array Index S may be reused to store overflow data that would normally be stored in association with another index value. If the data structure shown in Figure 4 allows a maximum of seven entries to be associated with each array index, then a new hash value Z, with a first portion corresponding to R, cannot be stored directly in association with Array Index R since Array Index R is already associated with 7 entries. In this example, Array Index R may be associated with an index offset value of 1 indicating that additional data associated with Array Index R is stored at index offset 1 from Array Index R. In an embodiment, the offset field is narrower than the data field of the second portion. Hash Value Z, which has a first portion corresponding to R, may then be stored in association with Array Index S, which is at index offset 1 from Array Index R. In another embodiment the offset may be a negative value indicating that associated data is stored at a lower index value. Accordingly, embodiments of the invention allow for storing a portion of a hash value at a different array index with use of an offset, and without necessarily storing the entire array index value.

5.0 BUILDING A DATABASE STORING HASH VALUES

[0041] In one or more embodiments, a database storing hash values is built by sorting the hash values into groups and storing each group of hash values into the database, a group at a time.

[0042] Sorting the hash values into groups may include partitioning the hash values into at least two or more portions. Thereafter, the hash values may be sorted into groups based on the first portion of each hash value. For example, all hash values with similar first portions are sorted into a group.

[0043] Storing each group of hash values into a database includes storing the hash values a group at a time into the database. In one or more embodiments, only the second portion of each hash value is stored into a data structure associated with that group (e.g., the data structure may be associated with the common first portion shared by all hash values in that group). Storing a group of hash values may include flushing the hash values from a Random

Access Memory (RAM) buffer to a less-random-access memory such as CompactFlash memory or hard disks, one group at a time.

6.0 IMPLEMENTATION MECHANISMS— HARDWARE OVERVIEW [0044] FIG. 5 depicts a block diagram that illustrates a computer system 500 upon which an embodiment of the present invention may be implemented. Computer system 500 includes a bus 502 or other communication mechanism for communicating information, and a processor 504 coupled with bus 502 for processing information. Computer system 500 also includes a main memory 506, such as a random access memory (RAM) or other dynamic storage device, coupled to bus 502 for storing information and instructions to be executed by processor 504. Main memory 506 also may be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor 504. Computer system 500 further includes a read only memory (ROM) 508 or other static storage device coupled to bus 502 for storing static information and instructions for processor 504. A storage device 510, such as a magnetic disk or optical disk, is provided and coupled to bus 502 for storing information and instructions.

[0045] Computer system 500 may be coupled via bus 502 to a display 512, such as a cathode ray tube (CRT), liquid crystal display (LCD), plasma screen display, or the like, for displaying information to a computer user. An input device 514, including alphanumeric (or non-alphabet based writing systems and/or non- Arabic number based) and other keys, is coupled to bus 502 for communicating information and command selections to processor 504. Another type of user input device is cursor control 516, such as a mouse, a trackball, or cursor direction keys for communicating direction information and command selections to processor 504 and for controlling cursor movement on display 512. This input device typically has two degrees of freedom in two axes, a first axis (e.g., x) and a second axis (e.g., y), which allows the device to specify positions in a plane.

[0046] Embodiments may relate to the use of computer system 500 for implementing techniques described herein. According to an embodiment of the invention, such techniques are performed by computer system 500 in response to processor 504 executing one or more sequences of one or more instructions contained in main memory 506. Such instructions may be read into main memory 506 from another machine-readable medium, such as storage device 510. Execution of the sequences of instructions contained in main memory 506 causes processor 504 to perform the process steps described herein. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions to implement the invention. Thus, embodiments of the invention are not limited to any specific combination of hardware circuitry and software. [0047] The term "machine -readable medium" as used herein refers to any storage medium that participates in providing data that causes a machine to operation in a specific fashion. In an embodiment implemented using computer system 500, various machine- readable media are involved, for example, in providing instructions to processor 504 for execution. Such a medium may take many forms, including but not limited to storage media and transmission media. Storage media includes both non-volatile media and volatile media. Non- volatile media includes, for example, optical or magnetic disks, such as storage device 510. Volatile media includes dynamic memory, such as main memory 506. Transmission media includes coaxial cables, copper wire and fiber optics, including the wires that comprise bus 502. Transmission media can also take the form of acoustic or electromagnetic waves, such as those generated during radio-wave and infra-red and other optical data communications. Such media are tangible to enable the instructions carried by the media to be detected by a physical mechanism that reads the instructions into a machine. [0048] Common forms of machine-readable media include, for example, a floppy disk, a flexible disk, hard disk, magnetic tape, or any other magnetic medium, a CD-ROM, any other optical medium, punchcards, papertape, other legacy media, or any other physical medium with patterns of holes or darkened spots, a RAM, a PROM, and EPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wave as described hereinafter, or any other medium from which a computer can read.

[0049] Various forms of machine-readable media may be involved in carrying one or more sequences of one or more instructions to processor 504 for execution. For example, the instructions may initially be carried on a magnetic disk of a remote computer. The remote computer can load the instructions into its dynamic memory and send the instructions over a telephone line using a modem. A modem local to computer system 500 can receive the data on the telephone line and use an infra-red transmitter to convert the data to an infra-red signal. An infra-red detector can receive the data carried in the infra-red signal and appropriate circuitry can place the data on bus 502. Bus 502 carries the data to main memory 506, from which processor 504 retrieves and executes the instructions. The instructions received by main memory 506 may optionally be stored on storage device 510 either before or after execution by processor 504.

[0050] Computer system 500 also includes a communication interface 518 coupled to bus 502. Communication interface 518 provides a two-way data communication coupling to a network link 520 that is connected to a local network 522. For example, communication interface 518 may be an integrated services digital network (ISDN) card or a digital subscriber line (DSL) or cable modem (traditionally modulator/demodulator) to provide a data communication connection to a corresponding type of telephone line. As another example, communication interface 518 may be a local area network (LAN) card to provide a data communication connection to a compatible LAN. Wireless links may also be implemented. In any such implementation, communication interface 518 sends and receives electrical, electromagnetic or optical signals that carry digital data streams representing various types of information.

[0051] Network link 520 typically provides data communication through one or more networks to other data devices. For example, network link 520 may provide a connection through local network 522 to a host computer 524 or to data equipment operated by an Internet Service Provider (ISP) 526. ISP 526 in turn provides data communication services through the world wide packet data communication network now commonly referred to as the "Internet" 528. Local network 522 and Internet 528 both use electrical, electromagnetic or optical signals that carry digital data streams. The signals through the various networks and the signals on network link 520 and through communication interface 518, which carry the digital data to and from computer system 500, are exemplary forms of carrier waves transporting the information. [0052] Computer system 500 can send messages and receive data, including program code, through the network(s), network link 520 and communication interface 518. In the Internet example, a server 530 might transmit a requested code for an application program through Internet 528, ISP 526, local network 522 and communication interface 518. [0053] The received code may be executed by processor 504 as the received code is received, and/or stored in storage device 510, or other non-volatile storage for later execution. In this manner, computer system 500 may obtain application code in the form of a carrier wave.

[0054] FIG. 6 depicts an example IC device 600, with which a possible embodiment of the present invention may be implemented. IC device 600 may have an input/output (I/O) feature 601. I/O feature 601 receives input signals and routes them via routing fabric 610 to a central processing unit (CPU) 602, which functions with storage 603. I/O feature 601 also receives output signals from other component features of IC device 600 and may control a part of the signal flow over routing fabric 610. A digital signal processing (DSP) feature performs at least a function relating to digital signal processing. An interface 605 accesses external signals and routes them to I/O feature 601, and allows IC device 600 to export signals. Routing fabric 610 routes signals and power between the various component features of IC device 600.

[0055] Configurable and/or programmable processing elements (CPPE) 611, such as arrays of logic gates may perform dedicated functions of IC device 600, which in an embodiment may relate to deriving and processing media fingerprints that generally correspond to media content. Storage 612 dedicates sufficient memory cells for CPPE 611 to function efficiently. CPPE may include one or more dedicated DSP features 614.

7.0 EQUIVALENTS, EXTENSIONS, ALTERNATIVES, AND MISCELLANEOUS [0056] In the foregoing specification, embodiments of the invention have been described with reference to numerous specific details that may vary from implementation to implementation. Thus, the sole and exclusive indicator of what is the invention, and is intended by the applicants to be the invention, is the set of claims that issue from this application, in the specific form in which such claims issue, including any subsequent correction. Any definitions expressly set forth herein for terms contained in such claims shall govern the meaning of such terms as used in the claims. Hence, no limitation, element, property, feature, advantage or attribute that is not expressly recited in a claim should limit the scope of such claim in any way. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense. [0057] Thus, an embodiment of the invention may relate to one or more of the example embodiments, which are enumerated below.

1. A method, comprising: computing a query hash value; partitioning the query hash value into at least a first portion and a second portion; locally assigning weak bits within the first portion of the query hash value and weak bits within the second portion of the query hash value; determining one or more variations of the query hash value with toggling one or more weak bits in the first portion and one or more weak bits in the second portion of the query hash value; identifying a target hash value that is identical to a variation of the query hash value, wherein the method is performed by a general purpose machine comprising a processor and configured to be a special purpose machine based on a set of software instructions.

2. The method as recited in Enumerated Example Embodiment 1, wherein identifying the target hash value that is identical to the variation of the query hash value comprises: determining an index value, of an array, based on a variation of the first portion of the query hash value, wherein data is associated with the index value of the array; identifying a second portion of the target hash value in the data associated with the index value of the array; and determining that the second portion of the target hash value is identical to a variation of the second portion of the query hash value.

3. The method as recited in Enumerated Example Embodiment 2, wherein determining the array index based on the variation of the first portion of the query hash value comprises converting the first portion of the query hash value from binary to decimal.

4. The method as recited in Enumerated Example Embodiment 1, wherein identifying the target hash value that is identical to the variation of the query hash value comprises: identifying a first portion of the target hash value that is identical to a first portion of the variation of the query hash value; identifying a second portion of the target hash value associated with the first portion of the target hash value; and determining that the second portion of the target hash value is identical to the second portion of the variation of the query hash value.

5. The method as recited in Enumerated Example Embodiment 1, wherein locally assigning weak bits within the first portion and the second portion of the query hash value comprises selecting a first predetermined number of weak bits from the first portion and selecting a second predetermined number of weak bits from the second portion.

6. The method as recited in Enumerated Example Embodiment 1, wherein locally assigning weak bits within the first portion comprises: selecting a predetermined number of feature values that are closest to a threshold value, wherein the feature values are derived from media content; determining if a distance of each selected feature value is within a predetermined range from the threshold value; and assigning bits in the query hash value as the one or more weak bits if the distance of a corresponding feature value from the threshold value is within the predetermined range.

7. The method as recited in Enumerated Example Embodiment 1, wherein computing the query hash value comprises: deriving one or more feature values from media content; and for at least one feature value, determining two or more bits in the query hash value from a single feature value.

8. The method as recited in Enumerated Example Embodiment 1, wherein computing the query hash value comprises: deriving one or more feature values from media content; partitioning the one or more feature values derived from the media content into three or more intervals; and assigning at least one bit in the query hash value for each feature value based on an interval corresponding to that feature value.

9. The method as recited in Enumerated Example Embodiment 1, wherein partitioning the query hash value into at least a first portion and a second portion comprises partitioning the query hash value into equal sized portions.

10. The method of as recited in Enumerated Example Embodiment 1 wherein the ratio of weak bits to other bits in the first portion is similar to the ratio of weak bits to other bits in the second portion.

11. The method as recited in Enumerated Example Embodiment 1, wherein partitioning the query hash value into at least a first portion and a second portion comprises partitioning the query hash value into unequal sized portions.

12. The method as recited in Enumerated Example Embodiment 1, wherein the query hash value is partitioned into three or more portions, wherein subsequent to partitioning the query hash value, weak bits are locally assigned to each portion of the query hash value, and wherein determining one or more variations of the query hash value comprises toggling one or more weak bits in each portion in the three or more portions of the query hash value.

13. The method as recited in Enumerated Example Embodiment 1, wherein the method further comprises: prior to portioning the query hash value into at least a first portion and a second portion, reordering hash bits in the query hash value.

14. The method as recited in Enumerated Example Embodiment 13, wherein reordering the hash bits in the query hash value comprises transferring hash bits that are expected to have a high bit error rate from the first portion of the hash value to the second portion of the hash value. 15. The method as recited in Enumerated Example Embodiment 1, further comprising: determining that a query fingerprint, associated with the query hash value, is similar to a target fingerprint associated with the target hash value.

16. A method, comprising: deriving consecutive hash values from a plurality of consecutive media content frames; selecting at least a portion from each hash value of the consecutive hash values to obtain a plurality of portions extracted from consecutive hash values; and generating a new hash value based on the plurality of portions extracted from consecutive hash values; wherein the method is performed by a general purpose machine comprising a processor and configured to be a special purpose machine based on a set of software instructions.

17. A method, comprising: computing a query hash value; partitioning the query hash value into at least a first portion and a second portion; identifying a first array index of an array based on the first portion of the query hash value; determining a second array index of the array based on an offset stored in data associated with the first array index; and determining that the second portion of the query hash value is stored in data associated with the second array index; wherein the method is performed by a general purpose machine comprising a processor and configured to be a special purpose machine based on a set of software instructions.

18. The method as recited in Enumerated Example Embodiment 17, further comprising determining that the second portion of the query hash value is not stored in data associated with the first array index prior to determining the second array index based on the offset stored in data associated with the first array index.

19. A method, comprising: obtaining a plurality of hash values; partitioning each hash value in the plurality of hash values into at least a first portion and a second portion; sorting each hash value in the plurality of hash values into a plurality of groups based on the first portion of that hash value; subsequent to sorting each hash value in the plurality of hash values into the plurality of groups, and storing each group of hash values; wherein storing each group of hash values comprises: identifying a data structure associated with the group of hash values; and storing the second portion of each hash value in the data structure associated with the group; wherein the method is performed by a general purpose machine comprising a processor and configured to be a special purpose machine based on a set of software instructions.

20. The method as recited in Enumerated Example Embodiment 19, wherein each hash value in the plurality of hash values is associated with a corresponding pointer value; and wherein the second portion of each hash value is stored in the data structure with the corresponding pointer value of that hash value.

21. The method as recited in Enumerated Example Embodiment 19, wherein storing the second portion of each hash value in the data structure associated with the group comprises: storing the second portion of each hash value on a Random Access Memory (RAM) buffer; and subsequent to storing the pointer value associated with each hash value in the group on the RAM buffer, flushing at least a portion of the RAM buffer to non-volatile solid state memory.

22. A computer readable storage medium having encoded instructions which, when executed by one or more processors, cause performance of the steps of a method as recited in any of Enumerated Example Embodiments 1-21.

23. A system comprising: one or processors; and a computer readable storage medium having encoded instructions which, when executed by the one or more processors, cause performance of a method as recited in any of Enumerated Example Embodiments 1-21.

24. A system comprising means for performing steps of a method as recited in any of Enumerated Example Embodiments 1-21.

25. A use for a computer system comprising performing steps of a method as recited in any of Enumerated Example Embodiments 1-21.

Claims

CLAIMSWhat is claimed is:

1. A method, comprising: computing a query hash value; partitioning the query hash value into at least a first portion and a second portion; locally assigning weak bits within the first portion of the query hash value and weak bits within the second portion of the query hash value; determining one or more variations of the query hash value with toggling one or more weak bits in the first portion and one or more weak bits in the second portion of the query hash value; and identifying a target hash value that is identical to a variation of the query hash value; wherein the method is performed by a general purpose machine comprising a processor and configured to be a special purpose machine based on a set of software instructions.

2. The method as recited in Claim 1, wherein identifying the target hash value that is identical to the variation of the query hash value comprises: determining an index value, of an array, based on a variation of the first portion of the query hash value, wherein data is associated with the index value of the array; identifying a second portion of the target hash value in the data associated with the index value of the array; and determining that the second portion of the target hash value is identical to a variation of the second portion of the query hash value.

3. The method as recited in Claim 2, wherein determining the array index based on the variation of the first portion of the query hash value comprises converting the first portion of the query hash value from binary to decimal.

4. The method as recited in Claim 1, wherein identifying the target hash value that is

identical to the variation of the query hash value comprises: identifying a first portion of the target hash value that is identical to a first portion of the variation of the query hash value; identifying a second portion of the target hash value associated with the first portion of the target hash value; and determining that the second portion of the target hash value is identical to the second portion of the variation of the query hash value.

5. The method as recited in Claim 1, wherein locally assigning weak bits within the first portion and the second portion of the query hash value comprises selecting a first predetermined number of weak bits from the first portion and selecting a second predetermined number of weak bits from the second portion.

6. The method as recited in Claim 1, wherein locally assigning weak bits within the first portion comprises: selecting a predetermined number of feature values that are closest to a threshold value, wherein the feature values are derived from media content; determining if a distance of each selected feature value is within a predetermined range from the threshold value; and assigning bits in the query hash value as the one or more weak bits if the distance of a corresponding feature value from the threshold value is within the predetermined range.

7. The method as recited in Claim 1, wherein computing the query hash value comprises: deriving one or more feature values from media content; for at least one feature value, determining two or more bits in the query hash value from a single feature value.

8. The method as recited in Claim 1, wherein computing the query hash value comprises: deriving one or more feature values from media content; partitioning the one or more feature values derived from the media content into three or more intervals; and assigning at least one bit in the query hash value for each feature value based on an interval corresponding to that feature value.

9. The method as recited in Claim 1, wherein partitioning the query hash value into at least a first portion and a second portion comprises partitioning the query hash value into equal sized portions.

10. The method of as recited in Claim 1 wherein the ratio of weak bits to other bits in the first portion is similar to the ratio of weak bits to other bits in the second portion.

11. The method as recited in Claim 1, wherein partitioning the query hash value into at least a first portion and a second portion comprises partitioning the query hash value into unequal sized portions.

12. The method as recited in Claim 1, wherein the query hash value is partitioned into three or more portions, wherein subsequent to partitioning the query hash value, weak bits are locally assigned to each portion of the query hash value, and wherein determining one or more variations of the query hash value comprises toggling one or more weak bits in each portion in the three or more portions of the query hash value.

13. The method as recited in Claim 1, wherein the method further comprises: prior to portioning the query hash value into at least a first portion and a second portion, reordering hash bits in the query hash value.

14. The method as recited in Claim 13, wherein reordering the hash bits in the query hash value comprises transferring hash bits that are expected to have a high bit error rate from the first portion of the hash value to the second portion of the hash value.

15. The method as recited in Claim 1, further comprising: determining that a query fingerprint, associated with the query hash value, is similar to a target fingerprint associated with the target hash value.

18. The method as recited in Claim 17, further comprising determining that the second portion of the query hash value is not stored in data associated with the first array index prior to determining the second array index based on the offset stored in data associated with the first array index.

19. A method, comprising: obtaining a plurality of hash values; partitioning each hash value in the plurality of hash values into at least a first portion and a second portion; sorting each hash value in the plurality of hash values into a plurality of groups based on the first portion of that hash value; and subsequent to sorting each hash value in the plurality of hash values into the plurality of groups, storing each group of hash values; wherein storing each group of hash values comprises: identifying a data structure associated with the group of hash values; storing the second portion of each hash value in the data structure associated with the group; wherein the method is performed by a general purpose machine comprising a processor and configured to be a special purpose machine based on a set of software instructions.

20. The method as recited in Claim 19, wherein each hash value in the plurality of hash values is associated with a corresponding pointer value; and wherein the second portion of

99 each hash value is stored in the data structure with the corresponding pointer value of that hash value.

21. The method as recited in Claim 19, wherein storing the second portion of each hash value in the data structure associated with the group comprises: storing the second portion of each hash value on a Random Access Memory (RAM) buffer; and subsequent to storing the pointer value associated with each hash value in the group on the RAM buffer, flushing at least a portion of the RAM buffer to non- volatile solid state memory.

22. A computer readable storage medium having encoded instructions which, when executed by one or more processors, cause performance of the steps of the method as recited in any of Claims 1-21.

23. A system comprising: one or processors; a computer readable storage medium having encoded instructions which, when executed by the one or more processors, cause performance of the method as recited in any of Claims 1-21.

24. A system comprising: means for performing one or more steps of a method as recited in one or more of Claims 1-21.

25. A use for a computer system, comprising: performing one or more steps of a method as recited in one or more of Claims 1-21.