CN111078941B

CN111078941B - Similar video retrieval system based on frame correlation coefficient and perceptual hash

Info

Publication number: CN111078941B
Application number: CN201911307045.2A
Authority: CN
Inventors: 魏榕山; 李晨嘉; 吴剑涵; 张鼎盛; 曹嘉祺
Original assignee: Fuzhou University
Current assignee: Fuzhou University
Priority date: 2019-12-18
Filing date: 2019-12-18
Publication date: 2022-10-28
Anticipated expiration: 2039-12-18
Also published as: CN111078941A

Abstract

The invention relates to a similar video retrieval system based on frame correlation coefficients and perceptual hashing. The video acquisition module is connected with the DSP through the DDR2SDRAM memory, the video database is connected with the DSP through the Nand FLASH memory, and the DSP is connected with the NOR FLASH memory, the GPU computing unit and the output display module. The video acquisition module is used for receiving videos input by a user, the DDR2SDRAM memory is used for caching acquired video image frames and performing data conversion on the video image frames, the NOR FLASH memory is used for storing the characteristic information extraction algorithm module and the matching retrieval algorithm module, the DSP processor is responsible for algorithm execution, the GPU calculation unit is used for improving algorithm calculation speed, the Nand FLASH memory is used for storing video characteristic information of a video database, and the output display module is used for displaying matching retrieval results. The method can quickly extract the characteristic information of the input video, retrieve the target similar video, and solve the problems of time and labor consumption, low system precision and low hardware degree of the traditional manual retrieval mode.

Description

Similar video retrieval system based on frame correlation coefficient and perceptual hash

Technical Field

The invention belongs to the field of video retrieval, and particularly relates to a similar video retrieval system based on frame correlation coefficients and perceptual hashing.

Background

With the rapid development of computer multimedia technology and internet propagation technology, more and more videos are uploaded and downloaded. These videos are uploaded, downloaded, viewed and edited and then re-uploaded to the internet, resulting in the internet being flooded with a large number of videos of similar content. These videos with similar contents bring huge loss to the interests of copyright providers, increase the storage cost of operators, and also affect the accuracy of users in retrieving videos, and seriously affect the user experience. In view of the above, research on Similar Video Retrieval (SVR) is becoming increasingly important.

In the current big data era, similar video retrieval faces a huge challenge: the explosive growth of the number of videos brings about a severe storage cost problem; how to quickly retrieve a target video from a large-scale video database becomes more challenging. Therefore, in terms of video retrieval accuracy and algorithm computation complexity, a certain gap exists between the actual requirement and the existing solution.

At present, similar video retrieval based on hash is one of the hot directions concerned by researchers, and is an effective scheme for solving large-scale similar video retrieval. However, most researchers pay attention to the hash algorithm and often ignore the representativeness of video features, only use single local feature information and ignore the possibility of global feature information and algorithm hardware, so that the problems of low video retrieval precision and low safety performance are caused.

Disclosure of Invention

The invention aims to provide a similar video retrieval system based on frame correlation coefficients and perceptual hash, which can quickly and efficiently extract the characteristic information of a video input by a user and retrieve a target similar video. The problems that the traditional manual retrieval mode is time-consuming and labor-consuming, the system precision is not high, and the hardware degree is low are solved.

In order to achieve the purpose, the technical scheme of the invention is as follows: a similar video retrieval system based on frame correlation coefficient and perceptual hash comprises a video acquisition module, a DDR2SDRAM memory, a NOR FLASH memory, a DSP processor, a GPU computing unit, a Nand FLASH memory and an output display module;

the video acquisition module is used for receiving a video input by a user, extracting an image frame with n input video frames, and transmitting each image frame to a DDR2SDRAM memory for caching in real time;

the DDR2SDRAM memory is used for caching the collected video image frames and performing data conversion on the image frames;

the NOR FLASH memory is used for storing a characteristic information extraction algorithm module and a matching retrieval algorithm module;

the DSP processes the video image frame by calling a characteristic information extraction algorithm module and a matching retrieval algorithm module in the NOR FLASH memory and transmits the processing result to an output display module;

the GPU computing unit is used for improving the algorithm computing speed;

the Nand FLASH memory is connected with the video database and the DSP and is used for storing video characteristic information of the video database, namely storing video key frame binary hash codes in the video database;

and the output display module displays the matching retrieval result through display equipment.

In an embodiment of the present invention, the feature information extraction algorithm module extracts feature information of an input video, that is, a key frame of the input video, by using a frame correlation coefficient method; the frame correlation coefficient method is to use a correlation coefficient to represent the correlation of two random variables, define the correlation coefficient of two image frames to measure the similarity of adjacent image frames, and if the correlation coefficient of the adjacent image frames is less than a threshold value, determine the image frame as a key frame.

In an embodiment of the present invention, the specific functions of the feature information extraction algorithm module are implemented as follows:

step S31, acquiring n video image frames input and each image frame f of input video from DDR2SDRAM memory _k ；

S32, setting a threshold value T, wherein the value range of the threshold value T is between [0,1 ];

s33, initializing key frames, and traversing the video image frame sequence from the 1 st frame to the nth frame;

step S34, calculating the current frame f _k And the next frame f _k+1 Correlation coefficient ρ (f) of (1) _k ,f _k+1 ) And calculating the formula:

due to the fact that

Thus, the device

Where m is the number of columns of image frame pixels, n is the number of rows of image frame pixels, D (f) _k ),D(f _k+1 ) As standard deviation of the image frame, cov (f) _k ,f _k+1 ) Is the covariance of the image frame, E is the expectation of the image frame,

the mean value of the gray values of the image frames;

step S35, determining the correlation coefficient rho (f) _k ,f _k+1 ) Whether the frame is larger than the threshold value T or not, and if the frame is larger than the threshold value T, the frame f is judged _k For video key-frames F _k And outputs, repeats step S34.

In an embodiment of the present invention, the matching retrieval algorithm module calculates a binary hash code of an input video key frame by using a perceptual hash algorithm, performs matching calculation on the calculated binary hash code of the input video key frame and a binary hash code of a video key frame in a video database in a Nand FLASH memory, and a video matched to an optimal solution is a retrieved target video.

In an embodiment of the present invention, the specific functions of the matching search algorithm module are implemented as follows:

step S51, inputting a video key frame F _k The image frame is reduced to be 8 multiplied by 8 in size, 64 pixel points are totally arranged, and the image frame structure and the light and shade basic information are reserved;

step S52, the input video key frame F to be reduced _k Converting into 64-level gray picture, and calculating the average gray value of all pixels in the picture

Step S53, comparing the gray value and the average value of each pixel

The comparison is carried out in such a way that,greater than or equal to the average value

Is recorded as 1, less than the average value

Is marked as 0;

step S54, combining 64 results obtained by comparison in step S53 in sequence to form a 64-bit binary number, namely obtaining the input video key frame F _k The binary hash code of (a);

step S55, judging whether all the video key frames F are traversed _k (ii) a If all the video key frames F are traversed _k Executing step S56, otherwise executing step S51;

step S56, inputting a video key frame F _k The binary hash codes are coded into a set, and the binary hash codes are calculated with the binary hash codes of the video key frames in the Nand FLASH memory storage video database, so that the number of different digits in 64-digit binary numbers is calculated; and (4) obtaining the video with the least number of different bits, namely the retrieved target similar video.

Compared with the prior art, the invention has the following beneficial effects: the system can quickly and efficiently extract the characteristic information of the video input by the user and retrieve the target similar video. The problems that the traditional manual retrieval mode is time-consuming and labor-consuming, the system precision is not high, and the hardware degree is low are solved; the invention has the following main advantages:

1. and in the aspect of a retrieval algorithm: the method adopts a frame correlation coefficient method to extract the video characteristic information, can accurately reflect the self dynamic content of the video, effectively reduces the redundancy of the key frame and enables the video characteristic information to be more representative. Compared with the traditional interframe difference method for extracting the video key frame, the accuracy is higher. And calculating the binary hash code of the image frame by adopting a perceptual hash algorithm, and performing matching calculation with the video in the video database. The binary representation can effectively solve the problem of storage consumption of more video data, and the video retrieval speed can be greatly accelerated;

2. in the aspect of system design: the invention transplants the video retrieval algorithm to the DSP hardware system, has convenient and rapid operation and large information processing amount, can accurately match and retrieve the target video from the video database, and has wider practical range. In addition, the system utilizes the characteristics of high storage density and high writing speed of the Nand FLASH memory to store the characteristic information of the video database. The system introduces an algorithm acceleration unit to improve the algorithm calculation speed.

Drawings

Fig. 1 shows a block diagram of a video retrieval system.

Fig. 2 shows a flow chart of a feature information extraction algorithm module.

Fig. 3 shows a flow chart of a matching retrieval algorithm module.

Figure 4 shows a video retrieval system workflow diagram.

Detailed Description

The technical scheme of the invention is specifically explained in the following by combining the attached drawings.

The invention provides a similar video retrieval system based on frame correlation coefficients and perceptual hash, which comprises a video acquisition module, a DDR2SDRAM memory, a NOR FLASH memory, a DSP processor, a GPU computing unit, a Nand FLASH memory and an output display module;

the DSP processor processes video image frames by calling a characteristic information extraction algorithm module and a matching retrieval algorithm module in the NOR FLASH memory and transmits the processing result to an output display module;

the GPU computing unit is used for improving the algorithm computing speed;

the Nand FLASH memory is connected with the video database and the DSP and is used for storing video characteristic information of the video database, namely storing a binary hash code of a video key frame in the video database;

The following is a specific implementation of the present invention.

The invention is applied to the field of video retrieval. A video retrieval algorithm is carried on the DSP, so that the characteristic information of the video input by a user can be quickly and efficiently extracted, and the target similar video can be retrieved. The problems that the traditional manual retrieval mode is time-consuming and labor-consuming, the system precision is not high, and the hardware degree is low are solved.

A block diagram of the video retrieval system of the present invention is depicted in fig. 1. The video retrieval system mainly comprises seven parts, namely a video acquisition module, a DDR2SDRAM memory, a NOR FLASH memory, a DSP processor, a GPU computing unit, a Nand FLASH memory and an output display module, wherein the functions of each part in the video retrieval system are as follows:

1. video acquisition module

The video acquisition module is used for receiving a video input by a user and extracting an image frame with n input video frames. The video acquisition module can transmit each image frame to a DDR2SDRAM memory for caching in real time.

2. DDR2SDRAM memory

The DDR2SDRAM memory input end is connected with the video acquisition module. The method is used for the operation of a video retrieval system, caching the collected video image frames and carrying out data conversion on the image frames. The DDR2SDRAM memory capacity should be greater than 256M.

3. NOR FLASH memory

The NOR FLASH memory is used to store the video retrieval system. The NOR FLASH memory stores a characteristic information extraction algorithm module and a matching retrieval algorithm module which are carried by the system. The NOR FLASH memory capacity should be greater than 1GBit.

4. DSP processor

The DSP processor is connected with the DDR2SDRAM memory, the NOR FLASH memory, the GPU computing unit, the Nand FLASH memory and the output display module. And the video retrieval system is responsible for running and calling an instruction algorithm to execute target operation.

5. GPU computing unit

The GPU calculation unit is used for accelerating the algorithm and improving the working speed of the system.

6. Nand FLASH memory

The input end of the Nand FLASH memory is connected with the video database. And the video characteristic information of the video database is stored, namely the video key frame binary hash code in the video database is stored. The Nand FLASH memory capacity should be greater than 32GBit.

7. Output display module

And the output display module is used for displaying the matching retrieval result. And displaying the retrieved target video through a PC display, an LED electronic television screen or other video playing equipment.

The characteristic information extraction algorithm module adopts a frame correlation coefficient method to extract the characteristic information of the input video, namely the key frame of the input video. The frame correlation coefficient method mainly uses a correlation coefficient to represent the correlation of two random variables, defines the correlation coefficient of two image frames to measure the similarity of adjacent image frames, and judges that the image frame is a key frame if the correlation coefficient of the adjacent image frame is less than a threshold value.

The characteristic information extraction algorithm module flow is shown in fig. 2, and includes the following steps:

step one, acquiring n video image frames and each image frame f of input video from DDR2SDRAM _k ；

Setting a threshold value T, wherein the value range of the threshold value T is between [0,1 ];

step three, initializing a key frame, and traversing the video image frame sequence from the 1 st frame to the nth frame;

step four, calculating the current frame f _k And the next frame f _k+1 Correlation coefficient of (p) ((f)) _k ,f _k+1 ) And calculating a formula:

and because

Namely, it is

Where m is the number of columns of image frame pixels, n is the number of rows of image frame pixels, D (f) _k ),D(f _k+1 ) Is the standard deviation of the image frame, cov (f) _k ,f _k+1 ) Is the covariance of the image frame, E is the expectation of the image frame,

the mean value of the gray values of the image frames;

step five, judging the correlation coefficient rho (f) _k ,f _k+1 ) If it is greater than the threshold T. If the frame is larger than the threshold value T, the frame f is judged _k For video key-frames F _k And outputting and repeating the step four.

And the matching retrieval algorithm module calculates the binary hash code of the input video key frame by adopting a perceptual hash algorithm. Nand

The FLASH memory mainly stores the characteristic information of the video in the video database. And the matching retrieval algorithm module performs matching calculation on the binary hash codes of the input video key frames obtained by calculation and the binary hash codes of the video database in the Nand FLASH memory, and the video matched to obtain the optimal solution is the retrieved target video.

The matching search algorithm module flow is shown in fig. 3, and includes the following steps:

step one, inputting a video key frame F _k The image frame is reduced to be 8 multiplied by 8 in size, 64 pixel points are totally arranged, and the image frame structure and the light and shade basic information are reserved;

step two, the reduced input video key frame F _k Converting into 64-level gray picture, and calculating the average gray value of all pixels in the picture

Step three, the gray value and the average value of each pixel are calculated

Making comparison with the average value

Is recorded as 1, less than the mean value

Is marked as 0;

step four, combining 64 results obtained by comparison in the step three in sequence to form a 64-bit binary number, and obtaining an input video key frame F _k The binary hash code of (a);

step five, judging whether all the video key frames F are traversed _k . If all the video key frames F are traversed _k Executing the step six, otherwise, executing the step one;

step six, inputting a video key frame F _k The binary hash codes are coded into a set, and the binary hash codes are calculated with the binary hash codes of the video key frames of the NandFLASH storage stored video database, so that the number of different digits in 64-digit binary numbers is calculated. And (4) obtaining the video with the least number of different bits, namely the retrieved target similar video.

The work flow diagram of the whole video retrieval system of the invention is shown in fig. 4, and comprises the following steps:

step one, a user inputs a video to a video retrieval system

Step two, buffering the collected video image frame in a DDR2SDRAM memory

Step three, the DSP processor calls a frame correlation coefficient method to extract key frames of the input video

Step four, the DSP processor calls a perceptual hash algorithm to calculate the binary hash code of the input video key frame

Fifthly, matching calculation is carried out on the binary Hash codes of the key frames and the video characteristic information of the video database, and the target video is retrieved

And step six, transmitting the retrieved target video to an output display module for display.

The invention is applied to the field of video retrieval. The video retrieval algorithm is carried on the DSP, so that the characteristic information of the video provided by the user can be extracted quickly and efficiently, the target similar video can be retrieved, and the problems of time and labor consumption and low hardware degree of the traditional manual retrieval mode are solved.

The above are preferred embodiments of the present invention, and all changes made according to the technical scheme of the present invention that produce functional effects do not exceed the scope of the technical scheme of the present invention belong to the protection scope of the present invention.

Claims

1. A similar video retrieval system based on frame correlation coefficient and perceptual hash is characterized by comprising a video acquisition module, a DDR2SDRAM memory, a NOR FLASH memory, a DSP processor, a GPU computing unit, a Nand FLASH memory and an output display module;

the characteristic information extraction algorithm module adopts a frame correlation coefficient method to extract the characteristic information of the input video, namely the key frame of the input video; the frame correlation coefficient method is that the correlation coefficient is used for representing the correlation of two random variables, the correlation coefficient of two image frames is defined to measure the similarity of adjacent image frames, and if the correlation coefficient of the adjacent image frames is greater than a threshold value, the image frame is judged to be a key frame; the specific functions of the characteristic information extraction algorithm module are realized as follows:

step S31,Fetching n input video image frames and each image frame f of input video from DDR2SDRAM memory _k ；

step S34, calculating the current frame f _k And the next frame f _k+1 Correlation coefficient of (p) ((f)) _k ,f _k+1 ) And calculating the formula:

due to the fact that

Thus, it is possible to provide

the mean value of the gray values of the image frames is obtained;

step S35, determining the correlation coefficient rho (f) _k ,f _k+1 ) Whether the frame is larger than the threshold value T or not, and if the frame is larger than the threshold value T, the frame f is judged _k For video key-frame F _k Outputting, and repeating the step S34;

the GPU computing unit is used for improving the algorithm computing speed;

2. The system of claim 1, wherein the matching search algorithm module calculates the binary hash code of the input video key frame by using a perceptual hash algorithm, performs matching calculation on the calculated binary hash code of the input video key frame and the binary hash code of the video key frame in the video database in the Nand FLASH memory, and obtains the video with the optimal solution as the searched target video.

3. The system according to claim 2, wherein the matching search algorithm module is implemented by the following specific functions:

step S52, the reduced input video key frame F _k Converting into 64-level gray picture, and calculating the average gray value of all pixels in the picture

Step S53, the gray value and the average value of each pixel are calculated

Making comparison with the average value

Is recorded as 1, less than the average value

Is marked as 0;

step S54, comparing 64 results obtained in step S53Sequentially combining to form a 64-bit binary number to obtain an input video key frame F _k The binary hash code of (a);

step S56, inputting a video key frame F _k The binary hash codes are coded into a set and are calculated with the binary hash codes of the video key frames in the Nand FLASH memory storage video database, and the number of different digits in 64-digit binary numbers is calculated; and (3) obtaining a plurality of videos with the minimum number of different bits, namely the retrieved target similar video.