WO2004006185A1 - 類似度算出方法及び装置 - Google Patents
類似度算出方法及び装置 Download PDFInfo
- Publication number
- WO2004006185A1 WO2004006185A1 PCT/JP2003/008142 JP0308142W WO2004006185A1 WO 2004006185 A1 WO2004006185 A1 WO 2004006185A1 JP 0308142 W JP0308142 W JP 0308142W WO 2004006185 A1 WO2004006185 A1 WO 2004006185A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- distance
- vector
- hierarchical
- calculated
- similarity
- Prior art date
Links
- 238000004364 calculation method Methods 0.000 title claims abstract description 112
- 239000013598 vector Substances 0.000 claims abstract description 357
- 238000000034 method Methods 0.000 claims description 59
- 238000006243 chemical reaction Methods 0.000 claims description 44
- 230000008569 process Effects 0.000 claims description 33
- 230000005236 sound signal Effects 0.000 claims description 19
- 230000009466 transformation Effects 0.000 claims description 16
- 238000001228 spectrum Methods 0.000 claims description 9
- 238000001514 detection method Methods 0.000 abstract description 28
- 239000011159 matrix material Substances 0.000 abstract description 19
- 238000012545 processing Methods 0.000 description 33
- 238000010586 diagram Methods 0.000 description 11
- 230000010354 integration Effects 0.000 description 10
- 238000009826 distribution Methods 0.000 description 7
- 238000005516 engineering process Methods 0.000 description 7
- 238000000605 extraction Methods 0.000 description 6
- 238000012217 deletion Methods 0.000 description 3
- 230000037430 deletion Effects 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 239000004065 semiconductor Substances 0.000 description 3
- 239000000654 additive Substances 0.000 description 2
- 230000000996 additive effect Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 230000008707 rearrangement Effects 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 101000969688 Homo sapiens Macrophage-expressed gene 1 protein Proteins 0.000 description 1
- 241000868953 Hymenocardia acida Species 0.000 description 1
- 102100021285 Macrophage-expressed gene 1 protein Human genes 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000001174 ascending effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 239000006185 dispersion Substances 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000000491 multivariate analysis Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000000513 principal component analysis Methods 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
- G06F18/2131—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on a transform domain processing, e.g. wavelet transform
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/14—Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/15—Correlation function computation including computation of convolution operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/7715—Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/76—Television signal recording
- H04N5/91—Television signal processing therefor
Definitions
- the present invention relates to a similarity calculation method and apparatus for performing pattern matching between two vectors at high speed, as well as a program and a recording medium.
- the so-called full search which determines the closest distance after determining the similarity between the input value and each of the candidates, is the simplest and least-missing method. Often used when the amount is small. However, for example, when searching for a part similar to the input video or input audio from a large amount of stored video or audio, the dimension of the feature vector per second is large, and they are equivalent to tens to hundreds of hours. Since the search is performed on the stored ones, performing such a simple full search has the problem that the search time is enormous.
- the binary tree search and the heart tree search are required.
- a high-speed technique such as a Shu method is used. This is to speed up the processing by storing data in advance in order and omitting comparison of branches or tables different from the input data during retrieval.
- the symbolized data rarely coincides completely, because distortion and noise are inherent in the data. If a high-speed technology is used, many detection leaks will occur.
- the data is multidimensional in nature, there is a problem that it is difficult to assign a unique order to the data in advance.
- Japanese Patent Laid-Open Publication No. Hei 8—1 2 3 4 60 describes a process in which a plurality of vectors that are close to each other are grouped at the time of data registration and represented by one representative vector. By calculating the distance between the vector and the representative vector, and comparing only the vectors in the group with the shortest distance with all the vectors in the group, the similar vector search is speeded up and multidimensional There has been proposed a technology that can reflect vector distortion.
- Japanese Patent Publication No. 2000-1-1345073 describes that vectors are coded and indexed by short codes, thereby suppressing an increase in the number of distance calculations and providing high-speed similar data. Techniques that enable search have been proposed.
- a similarity calculation method is a similarity calculation method for calculating a similarity between two input vectors, wherein the distance calculation between the two input vectors is performed in a hierarchical manner.
- the threshold comparison step if the integrated value of the distance calculated up to a certain level exceeds the threshold, control is performed so as to terminate the distance calculation.
- the distance between two vectors is calculated in a hierarchical manner, and if the integrated value of the distance calculated up to a certain level exceeds a predetermined threshold, the threshold is calculated.
- the calculation is sped up by detecting only that the value is greater than or equal to the value and not calculating the actual distance.
- the similarity calculation method may further include a conversion step of performing a predetermined conversion on the two input vectors.
- a conversion step of performing a predetermined conversion on the two input vectors.
- the distance between the two converted input vectors is calculated in a predetermined order based on the predetermined conversion.
- the predetermined transform is, for example, a transform that rearranges the order of each component constituting the input vector according to the magnitude of the variance of each component, a discrete cosine transform, a discrete Fourier transform, a Walsh This is the power conversion, or the power rune-nélebe conversion.
- the similarity calculation method includes, for each of the two input vectors converted in the conversion step, extracting each component constituting the input vector in the predetermined order, and forming a plurality of hierarchical portions.
- the hierarchical distance calculation step may calculate the distance between the components constituting the partial vector in order from the partial vector of the highest hierarchy. If the integrated value of the distances calculated for all the components that make up the partial vector up to a certain level is below the above threshold, the distance between the components that make up the partial vector one level lower Calculation is performed.
- a similarity calculation device for calculating a similarity between two input vectors, wherein a distance between the two input vectors is provided.
- a hierarchical distance calculating means for performing the calculation hierarchically; a threshold comparing means for comparing an integrated value of the distances calculated in each hierarchy by the hierarchical distance calculating means with a preset threshold; and a comparison by the threshold comparing means Control means for controlling the distance calculation by the hierarchical distance calculation means according to the result; and output means for outputting the integrated value of the distance calculated up to the last hierarchy as the similarity, wherein the control means Controls the distance calculation to be terminated when the integrated value of the distances calculated up to a certain level exceeds the threshold value as a result of the comparison in the threshold value comparison step.
- Such a similarity calculation device calculates the distance between two vectors in a hierarchical manner, and when the integrated value of the distance calculated up to a certain level exceeds a predetermined threshold value, only that the integrated value of the distance is equal to or more than the threshold value Calculation by calculating the actual distance by detecting Speed up.
- the similarity calculating device may further include a conversion unit that performs a predetermined conversion on the two input vectors.
- the hierarchical distance calculation unit is converted by the conversion unit.
- the distance between the two input vectors is calculated in a predetermined order based on the predetermined conversion.
- the predetermined transform is, for example, a transform that rearranges the order of each component constituting the input vector according to the magnitude of the variance of each component, a discrete cosine transform, a discrete Fourier transform, a Walsh-Hadamard transform, Or the Karhunen-Loeve transformation.
- the similarity calculation device extracts each component constituting the input vector in the predetermined order and obtains a plurality of hierarchical partial vectors. May be provided.
- the hierarchical distance calculating means hierarchically calculates the distance between the components constituting the partial vectors in order from the partial vector of the highest hierarchical level. If the integrated value of the distances calculated for all the components constituting the partial vector up to the hierarchy is lower than the above threshold value, the distance between the components constituting the partial vector one hierarchy lower is calculated.
- a program according to the present invention causes a computer to execute the above-described similarity calculation processing
- a recording medium according to the present invention is a computer-readable medium having such a program recorded thereon.
- FIG. 1 is a diagram illustrating a schematic configuration of a similar vector detection device according to the first embodiment.
- FIG. 2 is a flowchart illustrating processing at the time of vector registration in the similar vector detection apparatus.
- Fig. 3 is a flowchart explaining the processing at the time of vector search in the similar vector detector. It is a chart.
- FIG. 4 is a diagram for intuitively explaining the processing in the first embodiment.
- FIG. 5 is a diagram showing an example in which the distribution of vectors in the feature space is biased.
- FIG. 6 is a diagram illustrating a schematic configuration of a similar vector detection device according to the second embodiment.
- FIG. 7 is a flow chart for explaining processing at the time of vector registration in the similar vector detection device.
- FIG. 8 is a flowchart illustrating processing at the time of vector search in the similar vector detection apparatus.
- FIG. 9 is a diagram illustrating a schematic configuration of a similar vector detection device according to the third embodiment.
- FIG. 10 is a flowchart for explaining processing at the time of vector registration in the similar vector detection apparatus.
- FIG. 11 is a flowchart for explaining processing at the time of vector search in the similarity vector detection device.
- FIG. 12 is a flowchart illustrating an example of a process of extracting an acoustic feature vector from an acoustic signal.
- FIG. 13 is a diagram illustrating an example of a process of extracting an acoustic feature vector from an acoustic signal.
- FIG. 14 is a diagram for explaining transform coding in an audio signal.
- FIG. 15 is a flowchart illustrating an example of a process of extracting an audio feature vector from an encoded audio signal.
- FIG. 16 is a diagram illustrating an example of a process of extracting an audio feature vector from an encoded audio signal.
- FIG. 17 is a flowchart illustrating an example of a process of extracting a video feature vector from a video signal.
- FIG. 18 is a diagram illustrating an example of a process of extracting a video feature vector from a video signal.
- FIG. 19 illustrates another example of processing for extracting a video feature vector from a video signal. It is a flow chart.
- FIG. 20 is a diagram illustrating another example of the process of extracting a video feature vector from a video signal.
- FIG. 21 is a flowchart illustrating another example of a process of extracting a video feature vector from an encoded video signal.
- FIG. 22 is a diagram illustrating another example of the process of extracting a video feature vector from an encoded video signal.
- BEST MODE FOR CARRYING OUT THE INVENTION hereinafter, specific embodiments to which the present invention is applied will be described in detail with reference to the drawings.
- the present invention is applied to a similar vector detection method for detecting a vector similar to an input vector from a plurality of registered vectors at high speed, and an apparatus therefor.
- the distance is calculated for the one whose distance is smaller than a predetermined threshold, For those exceeding a predetermined threshold, only the fact that the threshold is exceeded is detected and the actual distance is not calculated, thereby speeding up the calculation of similar vector detection. Note that, in the similar vector detection device according to the present embodiment, when the distance exceeds the threshold, 11 is output for convenience.
- f (f [l], f [2], one, f [N]) t ⁇ ⁇ ⁇ (1)
- g (g [l [2], '', g [N]) t ⁇ ⁇ ⁇ ( 2)
- f [1], f [2],... represent each component of the vector f
- g [1], g [2],. Represents each component of g.
- t represents transpose
- N represents the dimension of the vector.
- FIG. 1 shows a schematic configuration of a similar vector detection device according to the first embodiment.
- the similar vector detection device 1 inputs a vector f and a vector g and outputs a square distance (or ⁇ 1) between the vectors.
- the recording unit 10 and the hierarchical distance It comprises an operation unit 11 and a threshold value judgment unit 12.
- step S1 the recording unit 10 (FIG. 1) inputs a registration vector g in advance.
- a registration vector g In general, there are a plurality of vectors g, and the number is often huge.
- step S2 the recording unit 10 records the input vector g.
- the recording unit 10 is, for example, a magnetic disk, an optical disk, a semiconductor memory, or the like.
- step S10 the threshold determination unit 12 (FIG. 1) sets the distance threshold S, and in the subsequent step S11, the hierarchical distance calculation unit 11 inputs the vector f, Obtain one vector g recorded in the recording unit 10.
- step S12 the hierarchical distance calculation unit 11 sets the component number i, which is an internal variable, to 1 and the integrated value sum of the distance to 0, and in step S13, the vector f
- An integration operation as shown by the following equation (3) is performed between the ith component f [i] and the ith component g [i] of the vector g.
- step S14 the threshold determination unit 12 determines whether the integrated value sum is less than the threshold S. If the integrated value sum is smaller than the threshold value S (Yes), the process proceeds to step S16. If the integrated value sum is equal to or larger than the threshold value S (No), the threshold value determination unit 12 is set to 1 in step S15. Is output and the process ends.
- the output 1 is a convenient numerical value indicating that the distance between the input vector f and the obtained vector g exceeds the threshold S and the vector g is rejected, as described above. is there.
- the threshold determination unit 12 sets the threshold value S, and terminates the integration operation in the hierarchical distance calculation unit 11 when the integrated value sum exceeds the threshold value S in the middle of the integration operation. However, the processing speed has been improved.
- step S16 it is determined whether or not the component number i is equal to or smaller than the number of dimensions N of the vector f and the vector g. If the component number i is equal to or smaller than N (Yes), i is incremented in step S17 and the process returns to step S13. On the other hand, if the component number i is larger than N (No), since the integration operation has been completed up to the last component of the vector f and the vector g, the threshold determination unit 12 in step S 18 sets the integrated value Output sum and end the process. The sum sum at this time is the square of the distance between the vectors.
- the square distance between the vectors is used.
- the same method can be used for any distance scale, not limited to the square distance.
- the sum value monotonically increases with respect to the sum value of the distances between the components, false rejection does not occur.
- the sum of the distances between the components is Therefore, for the vectors f and g whose distances are equal to or smaller than the threshold value ⁇ S, exactly the same distance as that of the simple full search method is output, and no error occurs.
- updates and deletions can be performed in a chronological order, and processing and management are easy. It is also possible to easily search in chronological order and to specify the chronological range to search.
- a search equivalent to a full search could be performed at high speed by setting the threshold value S of the distance.However, in this method, from which vector component the search is performed Since the search order depends on the order of the vectors, there is a difference in the search speed depending on this order. For example, if the distribution of vectors in the feature space seems to be biased as shown in Fig. 5, the search speed varies greatly depending on which of the f [1] axis and f [2] axis is integrated first. . In this example, evaluating the f [2] axis first reduces unnecessary integration and speeds up.
- the input vector f and the registered vector g are multiplied by the orthonormal
- the search is further speeded up by performing a transform and performing a search in descending order of significance using the vector f 'and the vector g' after the orthogonal transform.
- FIG. 6 shows a schematic configuration of a similar vector detection device according to the second embodiment.
- the similar vector detection device 2 inputs a vector and a vector g and outputs a distance (or ⁇ 1) between the vectors.
- the vector conversion units 20 and 21 and a recording unit It comprises a unit 22, a hierarchical distance calculation unit 23, and a threshold determination unit 24.
- the vector converters 20 and 21 perform the same conversion on the vector g and the vector f, respectively.
- the recording unit 22 is, for example, a magnetic disk, an optical disk, a semiconductor memory, or the like.
- step S 20 the vector conversion unit 20 (FIG. 6) inputs a registration vector g in advance, and in step S 21, converts the vector g as in the above equation (5), g '. Then, in step S2, the recording unit 10 records the converted vector g ′.
- step S30 the threshold determination unit 24 (FIG. 6) sets a distance threshold S, and in the following step S31, the vector conversion unit 21 inputs the vector f, and the hierarchical distance The operation unit 23 acquires one vector g ′ recorded in the recording unit 22.
- step S32 the vector conversion unit 21 converts the vector f as in the above equation (4) to generate a vector: f '.
- step S33 the hierarchical distance calculation unit 23 sets the component number i, which is an internal variable, to 1 and the integrated value sum of the distance to 0.
- step S35 the threshold determination unit 24 determines whether the integrated value sum is less than the threshold S. If the integrated value sum is less than the threshold value S (Yes), the process proceeds to step S37. If the integrated value sum is equal to or greater than the threshold value S (No), the threshold determination unit 24 determines in step S36 that the value of -1 Is output and the process ends.
- step S37 it is determined whether or not the component number i is equal to or smaller than the number of dimensions ⁇ ⁇ ⁇ ⁇ of the vector ⁇ and the vector g '. If the component number i is equal to or less than N (Yes), i is incremented in step S38, and the process returns to step S34. On the other hand, when the component number i is larger than N (No), since the multiplication operation has been completed up to the last component of the vector f ′ and the vector g ′, the threshold determination unit 24 calculates Output the value sum and end the process. The integrated value sum at this time is the square of the distance between the vectors.
- an order matrix This simply rearranges the order of the vector components.
- an 8th-order matrix P is represented by the following equation (8). 0 1 0 0 0 0 0 0 0 0 0
- the orthogonal transform using this order matrix is effective when the spread of each vector component is different, and is high-speed because only rearrangement is required and multiplication / division or conditional branching is not required.
- the energy when the feature vector is regarded as a discrete signal is biased toward low-frequency components.
- the orthogonal transform is represented by the discrete cosine transform (DCT) expressed by the following equations (10) and (11), and by the following equations (12) and (13).
- DCT discrete cosine transform
- (12) and (12) Low-frequency components using a discrete Fourier transform (DFT)
- DFT discrete Fourier transform
- the fast transform method can be used for the discrete cosine transform and the discrete Fourier transform, and it is not necessary to hold the entire transformation matrix. This is much more advantageous than performing all calculations.
- the Walsh-Hadamard transformation is an orthogonal transformation in which each element of the transformation matrix is composed of only ⁇ 1, and is suitable for high-speed transformation because no multiplication is required during the transformation.
- the alternating number as a concept close to the frequency, and arranging the components in order from the low alternating number, the discrete cosine transform and the discrete Fourier transform described above can be used.
- distance calculation is speeded up for vectors with large correlation between adjacent components.
- the Walsh-Hadamard transform matrix is constructed according to the sign of the Fourier transform matrix, or by recursive expansion of the matrix.
- equation (14) shows an eighth-order Walsh-Hadamard transformation matrix W arranged in the order of alternating numbers.
- the optimal Karhunen-Loeve transform (hereinafter referred to as the KL transform) is used as the orthogonal transform. ) Is effective.
- the KL transformation matrix T is an eigenvalue decomposition of the variance matrix V of the sample vector, and is defined as the following equation (15), where the eigenvalues are..., ⁇ ⁇ .
- the KL transform is an orthogonal transform matrix that completely removes the correlation between the components, and the variance of the transformed vector components becomes the eigenvalue ⁇ i. Therefore, by constructing the KL transformation matrix ⁇ ⁇ ⁇ so that the eigenvalues are arranged in descending order, it is possible to integrate all components and remove redundant information, and then integrate the distance from the axis with the largest variance.
- the vector itself is compressed by extracting and holding only the vector components having large eigenvalues and not storing the vector components having small eigenvalues.
- the storage area and data read time in Fig. 6) can also be reduced.
- the search operation is speeded up by speeding up the distance calculation.
- searching for example, the time required to read data from a recording unit such as an eighteen disc is required. Can also be a major contributor.
- the KL transform in the above-described second embodiment corresponds to an analysis method called principal component analysis in the field of multivariate analysis, and is an operation for extracting main components constituting a vector. Therefore, in the third embodiment described below, the main component of the transformed vector g ′ obtained in the second embodiment is defined as an index vector gi, and the remaining components are defined as a detailed vector g 2. Record. During the search, first with reference to the index vector g perform distance calculation, by performing the result only further distance calculations to obtain more base vector g 2 when it is less than the threshold value S, shorten the data read time Can be planned.
- FIG. 9 shows a schematic configuration of a similar vector detection device according to the third embodiment.
- the similar vector detection device 3 inputs a vector f and a vector g and outputs a square distance (or ⁇ 1) between the vectors. It comprises units 30 and 31, an index recording unit 32, a detailed recording unit 33, a hierarchical distance calculation unit 34, and a threshold determination unit 35.
- the vector converters 30 and 31 perform the same conversion on the vector 8 and the vector f, respectively, as in the above-described second embodiment.
- the index recording unit 32 and the detailed recording unit 33 are, for example, a magnetic disk, an optical disk, a semiconductor memory, or the like.
- step S40 the vector conversion unit 30 (FIG. 9) inputs a registration vector g in advance, and in step S41, the above-described equation
- the vector transformer 30 performs an index having a predetermined number M (1 ⁇ ⁇ N) components in order from a component having a small component number, that is, a component having a large variance or eigenvalue in the above-described transform, or a low-frequency component.
- M (1 ⁇ ⁇ N) components split into a vector and a detailed vector g 2 with the remaining components.
- the index recording portion 3 2 records the index vector gi, in step S 4 3, detailed recording unit 3 3 records the details base vector g 2.
- step S50 the threshold determination unit 35 (FIG. 9) sets the distance threshold S, and in the following step S51, the vector conversion unit 31 inputs the vector f, and the hierarchical distance The operation unit 34 acquires one index vector gi recorded in the index recording unit 32.
- step S52 the vector conversion unit 31 converts the vector f as in the above equation (4) to generate the vector ⁇ . Further, the vector transformation unit 31 divides, in ascending order of component numbers, into an index vector fi having a predetermined number M (1 ⁇ M ⁇ N) of components and a detailed vector ⁇ 2 having the remaining components. .
- step S55 the threshold determination unit 35 determines whether the integrated value sum is less than the threshold S. If the integrated value sum is less than the threshold value S (Yes), the process proceeds to step S57. If the integrated value sum is equal to or greater than the threshold value S (No), the threshold determination unit 35 in step S56 performs Outputs 1 and ends the processing.
- the output 1 is a convenient numerical value indicating that the distance has exceeded the threshold and has been rejected, as described above.
- step S57 it is determined whether or not the component number i is equal to or smaller than the dimension number M of the index vector f index vector. If the component number i is equal to or less than M (Yes), i is incremented in step S58, and the process returns to step S54. On the other hand, when the component number i is larger than M (No), the hierarchical distance calculation unit 34 acquires one detailed vector g 2 recorded in the detailed recording unit 33.
- step S60 the hierarchical distance calculation unit 34 calculates a value between the ith component ⁇ ′ [i] of the vector and the ith component g ′ [i] of the vector g ′ by using the above equation (16). Perform the multiplication operation as shown.
- step S61 the threshold determination unit 35 determines whether the integrated value sum is less than the threshold S. If the integrated value sum is less than the threshold value S (Yes), the process proceeds to step S63. If the integrated value sum is equal to or more than the threshold value S (No), the threshold determination unit 35 in step S62 is executed. ⁇ 1 is output and the processing ends.
- step S63 it is determined whether or not the component number i is equal to or smaller than the number of dimensions N of the vector f 'and the vector g'. If the component number i is equal to or smaller than N (Yes), i is incremented in step S64, and the process returns to step S60. On the other hand, if the component number i is larger than N (No), since the calculation has been completed up to the last component of the vector f ′ and the vector g ′, the threshold determination unit 35 Output the value sum and end the process. At this time, the integrated value sum is the square of the distance between the vectors. As described above, the processing for one registered vector g ′ is shown in the flowchart of FIG.
- the storage capacity and accuracy are not changed and the operation speed is hardly changed as compared with the first and second embodiments, but most of the comparisons are rejected at the stage of the index vector g. If less need to get more base vector g 2 are, the head is eliminated to over by Isseki der click process.
- the vector is divided into two stages, the index vector and the detailed vector, but similarly, the index vector is further divided into a higher-order index vector and a detailed index vector and divided into three stages.
- the index vector is further divided into a higher-order index vector and a detailed index vector and divided into three stages.
- step S70 an audio signal for each time interval T is acquired from an audio signal in the target time interval.
- Q is an index representing a discrete frequency
- Q is a maximum discrete frequency.
- step S 7 3 the average scan Bae spectrum S 'q of Pawasu Bae vector coefficient S q determined calculated, the average spectrum S ⁇ vectorized at step S 74, it generates an acoustic feature vector a.
- This acoustic feature vector a is represented, for example, by the following equation (17).
- the audio signal in the target time section is described as being divided into time sections T.
- the spectrum is not divided every time section T and the spectrum is not divided. The calculation may be performed.
- acoustic signals are enormous, they are often compressed and encoded before being recorded or transmitted. After decoding the coded audio signal and returning it to the baseband, it is possible to extract the audio feature vector a using the above method, but the audio feature vector a can be extracted only by partial decoding. If extraction is possible, the extraction process can be made more efficient and faster.
- transform coding which is a commonly used coding method, as shown in FIG. 14, an audio signal as an original sound is divided into frames for each time interval T. Then, a modified discrete cosine transform (Modified Discrete Coefficient) is applied to the sound signal of each frame. Orthogonal transformation such as sine transform (MDCT) is performed, and its coefficients are quantized and encoded. At this time, a scale factor, which is a magnitude normalization coefficient, is extracted for each frequency band and separately encoded. Therefore, by decoding only this scale factor, it can be used as the acoustic feature vector a.
- Modified Discrete Coefficient Modified Discrete Coefficient
- MDCT sine transform
- step S80 an encoded audio signal in the time section T of the target time section is obtained, and in step S81, the scale factor of each frame is partially decoded. Subsequently, in step S82, it is determined whether or not decoding within the target time interval has been completed. If the decoding has been completed (Yes), the process proceeds to step S83. If not completed (No), the process proceeds to step S83. Return to S80.
- step S83 the largest scale factor is detected for each band from the scale factors in the target time interval, and in step S84, they are vectorized to generate an acoustic feature vector a.
- the sound feature vector a equivalent to the above can be extracted at high speed without completely decoding the encoded sound signal.
- step S90 as shown in FIG. 18, a video frame is obtained from a video signal in the target time interval T.
- step S91 a time averaged image 100 is created based on all the acquired video frames.
- step S92 the created time averaged image 100 is divided into ⁇ vertical X X Y small blocks, and a block averaged image 110 is created by averaging the pixel values in each block.
- step S93 these are arranged in the order of R, G, B, for example, from the upper left to the lower right, to generate a one-dimensional video feature vector V.
- the rule v is represented, for example, by the following equation (18).
- the one-dimensional video feature vector V may be generated by rearranging the pixel values of the time average image 100 without creating the block average image 110.
- step S100 as shown in FIG. 20, a video frame is obtained from a video signal in the target time interval T.
- step S101 a histogram for each color, for example, R, G, B signal values is created from the signal values of each video frame.
- step S102 these are arranged in the order of, for example, R, G, and B to generate a one-dimensional video feature vector V.
- This video feature vector V is, for example, Expression (19),
- video signals are enormous, they are often compressed and encoded before being recorded or transmitted. After decoding the coded video signal and returning it to baseband, it is possible to extract the video feature vector V using the above method, but the video feature vector V is extracted only by partial decoding If possible, the extraction process can be made more efficient and faster.
- step S110 for the target time interval T to be vectorized, the coded video signal of the most recent coded group (Group of Pictures: G0P) is obtained, and the intraframe code in the GOP is obtained. Obtain a chemical picture (I picture) 1 20.
- the frame image is encoded in units of a macroblock MB (16 ⁇ 16 pixels or 8 ⁇ 8 pixels), and a discrete cosine transform (DCT) is used.
- This DCT-converted DC coefficient corresponds to the average value of the pixel values of the image in the macroblock.
- step S111 the DC coefficients are obtained, and in step S112, these are arranged in the order of, for example, Y, Cb, and Cr to generate a one-dimensional video feature vector V.
- This video feature vector V is expressed by, for example, the following equation (20). expressed.
- the video feature vector V can be extracted at high speed without completely decoding the encoded video signal.
- a hierarchical distance integration operation is performed, and when a similarity vector exceeds a threshold value for a preset distance.
- similar vectors can be detected at high speed.
- a vector similar to the input vector is detected from a large number of registered vectors, most of the registered vectors are dissimilar and exceed the threshold. Detection time can be greatly reduced.
- the vector is subjected to an order transform, a discrete cosine transform, a discrete Fourier transform, a Walsh-Hadamard transform, or a KL transform in advance, and a highly significant vector component, that is, a component having a large variance or eigenvalue in the above-described transform,
- a highly significant vector component that is, a component having a large variance or eigenvalue in the above-described transform
- the present invention is not limited to this, and arbitrary processing is realized by causing a CPU (Central Processing Unit) to execute a computer program. It is also possible.
- the computer program can be provided by being recorded on a recording medium, or can be provided by being transmitted via the Internet or another transmission medium.
- INDUSTRIAL APPLICABILITY According to the present invention described above, the distance between two vectors is calculated in a hierarchical manner. The calculation can be sped up by detecting only that the distance is not and calculating the actual distance. In particular, when a vector similar to the input vector is detected from a large number of registered vectors, most of the registered vectors are dissimilar and exceed the threshold, so the distance calculation is terminated early. Detection time can be greatly reduced.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Mathematical Physics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Multimedia (AREA)
- Mathematical Optimization (AREA)
- Software Systems (AREA)
- Databases & Information Systems (AREA)
- Pure & Applied Mathematics (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computing Systems (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Algebra (AREA)
- Signal Processing (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Image Analysis (AREA)
- Complex Calculations (AREA)
- Television Signal Processing For Recording (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
Description
Claims
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
DE60330147T DE60330147D1 (de) | 2002-07-09 | 2003-06-26 | Ähnlichkeitsberechnungsverfahren und einrichtung |
US10/489,012 US7260488B2 (en) | 2002-07-09 | 2003-06-26 | Similarity calculation method and device |
EP03736281A EP1521210B9 (en) | 2002-07-09 | 2003-06-26 | Similarity calculation method and device |
KR1020047003337A KR101021044B1 (ko) | 2002-07-09 | 2003-06-26 | 유사도 산출 방법 및 장치 및 컴퓨터 판독가능한 기록 매체 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2002200481A JP4623920B2 (ja) | 2002-07-09 | 2002-07-09 | 類似度算出方法及び装置、並びにプログラム及び記録媒体 |
JP2002-200481 | 2002-07-09 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2004006185A1 true WO2004006185A1 (ja) | 2004-01-15 |
Family
ID=30112514
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2003/008142 WO2004006185A1 (ja) | 2002-07-09 | 2003-06-26 | 類似度算出方法及び装置 |
Country Status (7)
Country | Link |
---|---|
US (1) | US7260488B2 (ja) |
EP (1) | EP1521210B9 (ja) |
JP (1) | JP4623920B2 (ja) |
KR (1) | KR101021044B1 (ja) |
CN (1) | CN1324509C (ja) |
DE (1) | DE60330147D1 (ja) |
WO (1) | WO2004006185A1 (ja) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10691909B2 (en) | 2016-11-11 | 2020-06-23 | Samsung Electronics Co., Ltd. | User authentication method using fingerprint image and method of generating coded model for user authentication |
Families Citing this family (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7539870B2 (en) * | 2004-02-10 | 2009-05-26 | Microsoft Corporation | Media watermarking by biasing randomized statistics |
JP4220449B2 (ja) * | 2004-09-16 | 2009-02-04 | 株式会社東芝 | インデキシング装置、インデキシング方法およびインデキシングプログラム |
JP2006101462A (ja) * | 2004-09-30 | 2006-04-13 | Sanyo Electric Co Ltd | 画像信号処理装置 |
US7552303B2 (en) * | 2004-12-14 | 2009-06-23 | International Business Machines Corporation | Memory pacing |
KR100687207B1 (ko) * | 2005-09-16 | 2007-02-26 | 주식회사 문화방송 | 이미지 전송 장치 및 이미지 수신 장치 |
IL179582A0 (en) * | 2006-11-26 | 2007-05-15 | Algotec Systems Ltd | Comparison workflow automation by registration |
US8738633B1 (en) | 2012-01-31 | 2014-05-27 | Google Inc. | Transformation invariant media matching |
US20170206202A1 (en) * | 2014-07-23 | 2017-07-20 | Hewlett Packard Enterprise Development Lp | Proximity of data terms based on walsh-hadamard transforms |
US9568591B2 (en) * | 2014-11-10 | 2017-02-14 | Peter Dan Morley | Method for search radar processing using random matrix theory |
US9503747B2 (en) * | 2015-01-28 | 2016-11-22 | Intel Corporation | Threshold filtering of compressed domain data using steering vector |
US10783268B2 (en) | 2015-11-10 | 2020-09-22 | Hewlett Packard Enterprise Development Lp | Data allocation based on secure information retrieval |
US11080301B2 (en) | 2016-09-28 | 2021-08-03 | Hewlett Packard Enterprise Development Lp | Storage allocation based on secure data comparisons via multiple intermediaries |
JP6922556B2 (ja) | 2017-08-29 | 2021-08-18 | 富士通株式会社 | 生成プログラム、生成方法、生成装置、及び剽窃検知プログラム |
CN108960537B (zh) * | 2018-08-17 | 2020-10-13 | 安吉汽车物流股份有限公司 | 物流订单的预测方法及装置、可读介质 |
CN112861260B (zh) * | 2021-02-01 | 2022-03-11 | 中国人民解放军国防科技大学 | 固体火箭发动机装药性能匹配方法、装置和设备 |
CN114225361A (zh) * | 2021-12-09 | 2022-03-25 | 栾金源 | 一种网球测速方法 |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS4934246A (ja) * | 1972-07-28 | 1974-03-29 | ||
JPS6227878A (ja) * | 1985-07-29 | 1987-02-05 | Ricoh Co Ltd | マツチング方法 |
JPH02273880A (ja) * | 1989-04-15 | 1990-11-08 | Toshiba Corp | パターン認識装置 |
EP0575815A1 (en) * | 1992-06-25 | 1993-12-29 | Atr Auditory And Visual Perception Research Laboratories | Speech recognition method |
JPH07287753A (ja) * | 1994-04-19 | 1995-10-31 | N T T Data Tsushin Kk | 物品識別システム |
JPH1013832A (ja) * | 1996-06-25 | 1998-01-16 | Nippon Telegr & Teleph Corp <Ntt> | 動画像認識方法および動画像認識検索方法 |
WO1999067696A2 (en) * | 1998-06-23 | 1999-12-29 | Koninklijke Philips Electronics N.V. | A scalable solution for image retrieval |
JP2002008027A (ja) * | 2000-06-20 | 2002-01-11 | Ricoh Co Ltd | パターン認識方法、パターン認識装置およびパターン認識プログラムを記録した記録媒体 |
Family Cites Families (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0711819B2 (ja) * | 1986-06-20 | 1995-02-08 | 株式会社リコー | パターン認識方法 |
JPS6339092A (ja) * | 1986-08-04 | 1988-02-19 | Ricoh Co Ltd | 辞書検索方式 |
JPS6339093A (ja) * | 1986-08-04 | 1988-02-19 | Ricoh Co Ltd | 辞書検索方式 |
JP3224955B2 (ja) * | 1994-05-27 | 2001-11-05 | 株式会社東芝 | ベクトル量子化装置およびベクトル量子化方法 |
TW293227B (ja) * | 1994-11-24 | 1996-12-11 | Victor Company Of Japan | |
KR0165497B1 (ko) * | 1995-01-20 | 1999-03-20 | 김광호 | 블럭화현상 제거를 위한 후처리장치 및 그 방법 |
KR100247969B1 (ko) * | 1997-07-15 | 2000-03-15 | 윤종용 | 대용량패턴정합장치및방법 |
JP3252802B2 (ja) * | 1998-07-17 | 2002-02-04 | 日本電気株式会社 | 音声認識装置 |
US6535617B1 (en) * | 2000-02-14 | 2003-03-18 | Digimarc Corporation | Removal of fixed pattern noise and other fixed patterns from media signals |
JP3816309B2 (ja) | 2000-06-26 | 2006-08-30 | アマノ株式会社 | 駐車場管理装置 |
JP2002191050A (ja) * | 2000-12-22 | 2002-07-05 | Fuji Xerox Co Ltd | 画像符号化装置および方法 |
US6807305B2 (en) * | 2001-01-12 | 2004-10-19 | National Instruments Corporation | System and method for image pattern matching using a unified signal transform |
US6963667B2 (en) * | 2001-01-12 | 2005-11-08 | National Instruments Corporation | System and method for signal matching and characterization |
-
2002
- 2002-07-09 JP JP2002200481A patent/JP4623920B2/ja not_active Expired - Fee Related
-
2003
- 2003-06-26 WO PCT/JP2003/008142 patent/WO2004006185A1/ja active Application Filing
- 2003-06-26 CN CNB038009765A patent/CN1324509C/zh not_active Expired - Fee Related
- 2003-06-26 US US10/489,012 patent/US7260488B2/en not_active Expired - Lifetime
- 2003-06-26 EP EP03736281A patent/EP1521210B9/en not_active Expired - Fee Related
- 2003-06-26 KR KR1020047003337A patent/KR101021044B1/ko active IP Right Grant
- 2003-06-26 DE DE60330147T patent/DE60330147D1/de not_active Expired - Lifetime
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS4934246A (ja) * | 1972-07-28 | 1974-03-29 | ||
JPS6227878A (ja) * | 1985-07-29 | 1987-02-05 | Ricoh Co Ltd | マツチング方法 |
JPH02273880A (ja) * | 1989-04-15 | 1990-11-08 | Toshiba Corp | パターン認識装置 |
EP0575815A1 (en) * | 1992-06-25 | 1993-12-29 | Atr Auditory And Visual Perception Research Laboratories | Speech recognition method |
JPH07287753A (ja) * | 1994-04-19 | 1995-10-31 | N T T Data Tsushin Kk | 物品識別システム |
JPH1013832A (ja) * | 1996-06-25 | 1998-01-16 | Nippon Telegr & Teleph Corp <Ntt> | 動画像認識方法および動画像認識検索方法 |
WO1999067696A2 (en) * | 1998-06-23 | 1999-12-29 | Koninklijke Philips Electronics N.V. | A scalable solution for image retrieval |
JP2002008027A (ja) * | 2000-06-20 | 2002-01-11 | Ricoh Co Ltd | パターン認識方法、パターン認識装置およびパターン認識プログラムを記録した記録媒体 |
Non-Patent Citations (2)
Title |
---|
ATSUNORI YOSHIKAWA ET AL.: "Chokko henkan o mochiita kaogazo no shikibetsu", THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS GIJUTSU KENKYU HOKOKU, vol. 95, no. 469, 18 January 1996 (1996-01-18), pages 16, XP002974132 * |
See also references of EP1521210A4 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10691909B2 (en) | 2016-11-11 | 2020-06-23 | Samsung Electronics Co., Ltd. | User authentication method using fingerprint image and method of generating coded model for user authentication |
Also Published As
Publication number | Publication date |
---|---|
CN1324509C (zh) | 2007-07-04 |
KR101021044B1 (ko) | 2011-03-14 |
KR20050016278A (ko) | 2005-02-21 |
DE60330147D1 (de) | 2009-12-31 |
EP1521210A1 (en) | 2005-04-06 |
US7260488B2 (en) | 2007-08-21 |
JP2004046370A (ja) | 2004-02-12 |
EP1521210B9 (en) | 2010-09-15 |
US20050033523A1 (en) | 2005-02-10 |
EP1521210A4 (en) | 2007-07-04 |
JP4623920B2 (ja) | 2011-02-02 |
EP1521210B1 (en) | 2009-11-18 |
CN1552042A (zh) | 2004-12-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2004006185A1 (ja) | 類似度算出方法及び装置 | |
JP3550681B2 (ja) | 画像検索装置及び方法、並びに類似画像検索プログラムを格納した記憶媒体 | |
CA2364798C (en) | Image search system and image search method thereof | |
CA2814401C (en) | Vector transformation for indexing, similarity search and classification | |
JP4301193B2 (ja) | 画像比較装置及び方法、画像検索装置及び方法、並びにプログラム及び記録媒体 | |
JP4138007B2 (ja) | Dc及び動き符号を用いたmpeg圧縮列のビデオ検索 | |
US7295718B2 (en) | Non-linear quantization and similarity matching methods for retrieving image data | |
JP2004045565A (ja) | 類似時系列検出方法及び装置、並びにプログラム及び記録媒体 | |
US20170026665A1 (en) | Method and device for compressing local feature descriptor, and storage medium | |
JP2006505075A (ja) | 複数のイメージフレームを有するビデオシーケンス検索のための非線形量子化及び類似度マッチング方法 | |
WO2007066924A1 (en) | Real-time digital video identification system and method using scene information | |
Seetharaman et al. | Statistical framework for image retrieval based on multiresolution features and similarity method | |
KR101365989B1 (ko) | 트리 구조를 기반으로 한 엔트로피 부호화 및 복호화 장치및 방법 | |
JP5155210B2 (ja) | 画像比較装置及びその方法、画像検索装置、並びにプログラム及び記録媒体 | |
KR20010039811A (ko) | 디지털 영상 텍스쳐 분석 방법 | |
JP2968666B2 (ja) | 画像符号化方法および装置 | |
Qiu | Embedded colour image coding for content-based retrieval | |
WO2016110125A1 (zh) | 高维向量的哈希方法、向量量化方法及装置 | |
CN113656639A (zh) | 视频检索方法及装置、计算机可读存储介质、电子设备 | |
Arnia et al. | Fast method for joint retrieval and identification of JPEG coded images based on DCT sign | |
JP4697111B2 (ja) | 画像比較装置および方法、並びに、画像検索装置および方法 | |
KR100333744B1 (ko) | 영상 압축이미지를 이용한 유사이미지 검색시스템 및 그 방법과 기록매체 | |
KR20010027936A (ko) | 텍스쳐 영상 검색 장치 및 그 방법 | |
Sha et al. | Low-complexity and high-coding-efficiency image deletion for compressed image sets in cloud servers | |
JP4002212B2 (ja) | 動画像符号化方法,装置,プログラムおよびプログラムの記録媒体 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): CN KR US |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PT RO SE SI SK TR |
|
WWE | Wipo information: entry into national phase |
Ref document number: 20038009765 Country of ref document: CN |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2003736281 Country of ref document: EP Ref document number: 1020047003337 Country of ref document: KR |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
WWE | Wipo information: entry into national phase |
Ref document number: 10489012 Country of ref document: US |
|
WWP | Wipo information: published in national office |
Ref document number: 2003736281 Country of ref document: EP |