CN111339369A - Video retrieval method, system, computer equipment and storage medium based on depth features - Google Patents

Video retrieval method, system, computer equipment and storage medium based on depth features Download PDF

Info

Publication number
CN111339369A
CN111339369A CN202010115194.5A CN202010115194A CN111339369A CN 111339369 A CN111339369 A CN 111339369A CN 202010115194 A CN202010115194 A CN 202010115194A CN 111339369 A CN111339369 A CN 111339369A
Authority
CN
China
Prior art keywords
video
frame
key frame
densenet
key
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010115194.5A
Other languages
Chinese (zh)
Inventor
曾凡智
程勇
周燕
陈嘉文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Foshan University
Original Assignee
Foshan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Foshan University filed Critical Foshan University
Priority to CN202010115194.5A priority Critical patent/CN111339369A/en
Publication of CN111339369A publication Critical patent/CN111339369A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/71Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/73Querying
    • G06F16/738Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/75Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Library & Information Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a video retrieval method, a system, computer equipment and a storage medium based on depth characteristics, wherein the method comprises the following steps: constructing a convolutional neural network, wherein the convolutional neural network is a DenseNet model; acquiring a plurality of videos; extracting a depth feature vector of a video frame in each video by using a DenseNet model; for each video, extracting key frames according to the depth feature vectors of the video frames, and outputting a key frame set; establishing an index relationship between each video and the key frame set of each video, and storing the index relationship into a video characteristic database; and searching the video in the video characteristic database according to the image or the short video provided by the user, and outputting a video searching result. The invention realizes the video retrieval function with high accuracy and high recall rate.

Description

Video retrieval method, system, computer equipment and storage medium based on depth features
Technical Field
The invention relates to a video retrieval method, a system, computer equipment and a storage medium based on depth features, and belongs to the field of video retrieval.
Background
Currently, a video retrieval method based on text labeling is relatively mature and widely applied to the market. The method needs to manually summarize and annotate the videos in the video library in advance, and the video retrieval result completely depends on the word expression of the user and manually marked information in advance. However, as the number of videos gradually increases, the contents are more diversified, and the conventional video retrieval method based on artificial text annotation gradually fails to meet the requirements of people on higher-level video retrieval. Most of content-based video retrieval systems adopt features such as color, texture, shape, SIFT and the like, and the features are susceptible to video blur, noise, illumination change and the like.
In recent years, deep learning obtains excellent results in the fields of video and image processing, a deep feature descriptor has strong image feature description capacity, the retrieval result of the method can meet the requirements of people on higher-level video retrieval, and the method has wide application prospects in the fields of security monitoring, remote online education, film and television copyright protection, network short video review and the like.
Disclosure of Invention
In view of the above, the present invention provides a video retrieval method, system, computer device and storage medium based on depth features, which implement a video retrieval function with high accuracy and high recall rate.
The invention aims to provide a video retrieval method based on depth characteristics.
A second object of the present invention is to provide a video retrieval system based on depth features.
It is a third object of the invention to provide a computer apparatus.
It is a fourth object of the present invention to provide a storage medium.
The first purpose of the invention can be achieved by adopting the following technical scheme:
a method for depth feature-based video retrieval, the method comprising:
constructing a convolutional neural network; wherein the convolutional neural network is a DenseNet model;
acquiring a plurality of videos;
extracting a depth feature vector of a video frame in each video by using a DenseNet model;
for each video, extracting key frames according to the depth feature vectors of the video frames, and outputting a key frame set;
establishing an index relationship between each video and the key frame set of each video, and storing the index relationship into a video characteristic database;
and searching the video in the video characteristic database according to the image or the short video provided by the user, and outputting a video searching result.
Further, the DenseNet model adopts a DenseNet-201 model;
the DenseNet-201 model comprises a convolution layer, a pooling layer, a first dense block, a first transition layer, a second dense block, a second transition layer, a third dense block, a third transition layer, a fourth dense block and a classification layer which are sequentially connected.
Further, the extracting a key frame according to the depth feature vector of the video frame and outputting a key frame set specifically includes:
setting the 1 st frame as a reference frame, taking the reference frame as a key frame, and adding the key frame into a key frame set;
according to the depth feature vector of the video frame, calculating the cosine included angle similarity of the current frame and the reference frame;
if the cosine included angle similarity is smaller than a threshold value, comparing the current frame with the key frame set, if the cosine included angle similarity is not repeated, taking the current frame as a key frame, adding the key frame set, and updating the current frame into a reference frame;
if the updated reference frame is not the last frame, the cosine included angle similarity calculation is carried out on the current frame and the reference frame according to the depth feature vector of the video frame, and the subsequent operation is executed; and if the updated reference frame is the last frame, outputting the key frame set.
Further, the cosine included angle similarity is calculated as follows:
Figure BDA0002391285400000021
wherein, IkRepresenting the depth feature vector of the current frame, IrefRepresenting the depth feature vector of the reference frame.
Further, according to the image provided by the user, retrieving the video in the video feature database, and outputting a video retrieval result, specifically comprising:
according to the image provided by the user, the DenseNet model is utilized to extract the characteristics of the image, the cosine included angle similarity comparison is carried out on the characteristics and the database, and the first N most similar videos are output in a sequence from large to small.
Further, according to the short video provided by the user, retrieving the video in the video feature database, and outputting a video retrieval result, specifically comprising:
according to the short video provided by the user, the characteristics of the short video are extracted by using a DenseNet model, the key frame set of the short video and all the key frame sets in the database are matched in similarity in a sliding window mode, the similarity is sorted from large to small, and the first N most similar videos are output.
The second purpose of the invention can be achieved by adopting the following technical scheme:
a depth feature based video retrieval system, the system comprising:
the convolutional neural network construction module is used for constructing a convolutional neural network; wherein the convolutional neural network is a DenseNet model;
the video acquisition module is used for acquiring a plurality of videos;
the video frame feature extraction module is used for extracting a depth feature vector of a video frame in each video by using a DenseNet model;
the key frame extraction module is used for extracting key frames and outputting a key frame set according to the depth feature vectors of the video frames aiming at each video;
the index establishing module is used for establishing an index relationship between each video and the key frame set of each video and storing the index relationship into a video characteristic database;
and the video retrieval module is used for retrieving videos in the video characteristic database according to the images or short videos provided by the user and outputting video retrieval results.
Further, the DenseNet model adopts a DenseNet-201 model;
the DenseNet-201 model comprises a convolution layer, a pooling layer, a first dense block, a first transition layer, a second dense block, a second transition layer, a third dense block, a third transition layer, a fourth dense block and a classification layer which are sequentially connected.
The third purpose of the invention can be achieved by adopting the following technical scheme:
a computer device comprises a processor and a memory for storing processor executable programs, and when the processor executes the programs stored in the memory, the video retrieval method is realized.
The fourth purpose of the invention can be achieved by adopting the following technical scheme:
a storage medium stores a program which, when executed by a processor, implements the video retrieval method described above.
Compared with the prior art, the invention has the following beneficial effects:
1. the invention uses DenseNet model as convolution neural network at first, the DenseNet model further expands convolution neural network connection on the basis of ResNet model, for any layer of dense block in convolution neural network, the characteristic diagram of all layers in front of the layer is the input of the layer, the characteristic diagram of the layer is the input of all layers behind, the advantage of this design alleviates the problem of gradient disappearance, enhances the propagation of characteristic diagram, promotes the utilization ratio, greatly reduces the number of parameters and the extracted characteristics are more abundant and diversified; secondly, aiming at the fact that characteristics such as color, texture and shape adopted by the traditional content-based video retrieval are easily influenced by noise and illumination interference, deep characteristics with high abstraction and high generalization robustness of images can be extracted through a convolutional neural network, video shot segmentation, video frame depth characteristic extraction, key frame extraction and video characteristic database construction are achieved, and finally the content-based video retrieval function is achieved.
2. The invention provides an image depth feature descriptor, when the video frame features are extracted, a DenseNet model is introduced, the feature vector of a full connection layer of a penultimate layer is used as the image features of the image depth descriptor, the top5 of the network model on an ImageNet large-scale data set reaches 95% of classification accuracy, the depth features of the network model solve the problem that the image features such as traditional color, texture and shape are easily interfered by noise and illumination, and the network model has good generalization popularization capability, and experiments show that the method is superior to the most advanced method at present. The video retrieval function with high accuracy and high recall rate is realized.
3. In the process of extracting the video key frames, a reference frame mechanism is introduced, the key frames are extracted according to the threshold value in a self-adaptive mode, and the process that the traditional key frame extraction method needs to perform shot segmentation and key frame clustering firstly is omitted.
4. The invention adopts a retrieval mode based on images or short videos to retrieve videos in the video characteristic database, can directly and quickly and accurately find videos with similar content characteristics in massive videos, and provides accurate query for a search engine based on the retrieval model of images or short videos, so that a user can find the most relevant videos, thereby improving the working efficiency.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the structures shown in the drawings without creative efforts.
Fig. 1 is a flowchart of a video retrieval method according to embodiment 1 of the present invention.
Fig. 2 is a structural diagram of the DenseNet model.
Fig. 3 is a video inter-frame similarity curve according to embodiment 1 of the present invention.
Fig. 4 is a flowchart of extracting key frames according to embodiment 1 of the present invention.
Fig. 5 is a flowchart of retrieving a video according to an image provided by a user according to embodiment 1 of the present invention.
Fig. 6 is a flowchart of retrieving a video according to an image provided by a short video according to embodiment 1 of the present invention.
Fig. 7 is a block diagram of a video retrieval system according to embodiment 2 of the present invention.
Fig. 8 is a block diagram of a computer device according to embodiment 3 of the present invention.
Fig. 9 is a block diagram of a main program of video retrieval in video retrieval software installed in a computer device according to embodiment 3 of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer and more complete, the technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some embodiments of the present invention, but not all embodiments, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts based on the embodiments of the present invention belong to the protection scope of the present invention.
Example 1:
as shown in fig. 1, the present embodiment provides a video retrieval method based on depth features, which includes the following steps:
and S101, constructing a convolutional neural network.
In the field of computer vision, a Convolutional Neural network has become the most mainstream method, and compared with the underlying physical features, the image features based on a Convolutional Neural Network (CNN) model use the trained Convolutional Neural network model to extract the image depth features of a single key frame.
In recent years, five classical convolutional neural network models, AlexNet, VGGNet, inclusion net, ResNet and densnet, have appeared, ordered by order of appearance, which have respectively gained champions from the image recognition project of the large-scale visual recognition challenge race (ILSVRC) from 2012 to 2017. Compared with the traditional video retrieval algorithm based on shape, color, texture, SIFT and the like, the network models have excellent performance in the field of image recognition; the core of the ResNet model is to train a deeper convolutional neural network by establishing a 'short-circuit connection' between a front layer and a rear layer, which is helpful for the back propagation of gradient in the training process, and the DenseNet model is consistent with the ResNet model in basic idea, but establishes a dense connection of all the front layers and the rear layers, as shown in FIG. 2; another great feature of the DenseNet model is that feature reuse is realized by the connection of features on channels, which allows the DenseNet model to realize better performance than the ResNet model with less parameter and computation cost, and thus the DenseNet model also cuts the best paper prize of CVPR 2017.
The advantages of the DenseNet model are mainly shown in the following aspects: 1) due to the dense connection mode, the DenseNet model promotes the reverse propagation of the gradient, so that the network is easier to train; 2) the parameter is smaller and the calculation is more efficient, because the DenseNet model realizes short circuit connection through connection characteristics, the characteristic reuse is realized, and a smaller growth rate is adopted, and the unique characteristic diagram of each layer is smaller; 3) the final classifier uses low-level features due to feature multiplexing.
The DenseNet model of the embodiment adopts a DenseNet-201 model, the model is realized by using a TensorFlow framework, the number of convolution layers reaches 201 layers, but the parameter quantity is only 80M, the model belongs to a lightweight network model, and the top5 reaches 95% of classification accuracy on an ImageNet large-scale data set. Specific parameters of the denenet-201 network structure are shown in fig. 3, and include a convolutional Layer (constraint), a Pooling Layer (Pooling), a first Dense Block (density Block1), a first Transition Layer (Transition Layer1), a second Dense Block (density Block2), a second Transition Layer (Transition Layer2), a third Dense Block (density Block3), a third Transition Layer (Transition Layer3), a fourth Dense Block (density Block4), and a Classification Layer (Classification Layer), which are connected in sequence, where k ═ 32 represents a channel number growth rate.
TABLE 1 DenseNet-201 model Structure
Figure BDA0002391285400000061
S102, acquiring a plurality of videos.
The video of the embodiment can be acquired through collection, for example, a plurality of videos are shot through a camera.
S103, extracting the depth feature vector of the video frame in each video by using a DenseNet model.
Using the DenseNet-201 network model loaded with the pre-training parameters, 1920-dimensional feature vectors of video frames in each video are extracted, which are characterized by the output of the last-but-one layer full-link layer of the network model.
And S104, aiming at each video, extracting key frames according to the depth feature vectors of the video frames, and outputting a key frame set.
In this embodiment, the cosine included angle distance is used to measure the similarity between the front frame and the rear frame, the key frame extraction is achieved by comparing the threshold values, the value range of the cosine included angle similarity is [0,1], and the cosine included angle distance is calculated as follows:
Figure BDA0002391285400000062
wherein, IkRepresenting the depth feature vector of the current frame, Ik-1The depth feature vector of the previous frame is shown, and the similarity curve between video frames is shown in fig. 3.
In the process of extracting the key frames, a reference frame mechanism is introduced to achieve the purpose of processing the gradient shot and the key frames simultaneously to remove the repetition, as shown in fig. 4, the key frames are extracted according to the depth feature vectors of the video frames, and a key frame set is output, which specifically comprises the following steps:
and S1041, setting the 1 st frame as a reference frame, and adding the reference frame as a key frame into a key frame set.
The video is set to have N video frames, the 1 st frame (namely the 1 st video frame) is set as a reference frame and is used as a key frame, and the key frame set is added.
S1042, according to the depth feature vector of the video frame, cosine included angle similarity T calculation is carried out on the current frame and the reference frame.
The cosine included angle similarity T of the present embodiment is calculated by using the cosine included angle distance of the above formula (1), as follows:
Figure BDA0002391285400000071
wherein, IkRepresenting the depth feature vector of the current frame, IrefRepresenting the depth feature vector of the reference frame.
S1043, if the cosine included angle similarity T is smaller than a threshold value e, comparing the current frame with the key frame set, if the cosine included angle similarity T is not repeated, taking the current frame as a key frame, adding the key frame set, updating the current frame into a reference frame, and entering the step S1044; if the cosine included angle similarity T is greater than or equal to the threshold e, the next frame is taken as the current frame, and the process returns to the step S1042.
S1044, if the updated reference frame is not the last frame (i.e., nth frame), which indicates that the loop has not ended, returning to step S1042; and if the updated reference frame is the last frame, indicating that the cycle is ended, outputting the key frame set.
And S105, establishing an index relation between each video and the key frame set of each video, and storing the index relation into a video characteristic database.
Specifically, an index relationship is established between the video id of each video and the key frame set of each video, and the index relationship is stored in a video feature database, as shown in table 2 below.
TABLE 2 video feature database
Video id (Video _ id) Key frame feature (Key _ frame _ feature) Time (Time)
Video A Key frame 1 0:30
Video A Key frame 2 1:34
Video B Key frame 1 0:17
Video C Key frame 1 0:19
The steps S101 to S105 are video storage stages, and the step S106 is a video retrieval stage. It can be understood that the above steps S101 to S105 are completed on one computer device, and the video retrieval phase of step S106 can be performed on the computer device, or the video retrieval phase of step S106 can be performed on other networked computer devices.
And S106, retrieving the video in the video characteristic database according to the image or the short video provided by the user, and outputting a video retrieval result.
At present, the mainstream video retrieval mainly takes keyword retrieval as a main factor, and mass videos are generated, and the keyword retrieval needs to consume a large amount of time for manual marking, and this embodiment can retrieve videos in a video feature database by using two modes, namely, an image retrieval video and a short video retrieval video, and the specific description is as follows:
1) and searching the video in the video characteristic database according to the image provided by the user, and outputting a video searching result.
Specifically, according to the image provided by the user, a DenseNet model is used to extract 1920-dimensional features of the image, the features are compared with the cosine included angle similarity of the database, and the top N most similar videos are output in an order from large to small, as shown in fig. 5.
2) And searching the video in the video characteristic database according to the short video provided by the user, and outputting a video searching result.
Specifically, according to the short videos provided by the user, the characteristics of the short videos are extracted by using the DenseNet model, the key frame set of the short videos and all the key frame sets in the database are matched in similarity in a sliding window mode, and the first N most similar videos are output in a descending order of similarity.
Those skilled in the art will appreciate that all or part of the steps in the method for implementing the above embodiments may be implemented by a program to instruct associated hardware, and the corresponding program may be stored in a computer-readable storage medium.
It should be noted that although the method operations of the above-described embodiments are depicted in the drawings in a particular order, this does not require or imply that these operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Rather, the depicted steps may change the order of execution. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.
Example 2:
as shown in fig. 7, this embodiment provides a depth feature-based video retrieval system, which includes a convolutional neural network construction module 701, a video acquisition module 702, a video frame feature extraction module 703, a key frame extraction module 704, an index establishment module 705, and a video retrieval module 706, where specific functions of each module are as follows:
the convolutional neural network constructing module 701 is configured to construct a convolutional neural network; wherein the convolutional neural network is a DenseNet model.
The video obtaining module 702 is configured to obtain a plurality of videos.
The video frame feature extraction module 703 is configured to extract a depth feature vector of a video frame in each video by using a DenseNet model.
The key frame extracting module 704 is configured to, for each video, extract a key frame according to the depth feature vector of the video frame, and output a key frame set.
The index establishing module 705 is configured to establish an index relationship between each video and the key frame set of each video, and store the index relationship in the video feature database.
The video retrieval module 706 is configured to retrieve videos in the video feature database according to images or short videos provided by the user, and output a video retrieval result.
The specific implementation of each module in this embodiment may refer to embodiment 1, which is not described herein any more; it should be noted that the system provided in this embodiment is only illustrated by the division of the functional modules, and in practical applications, the functions may be distributed by different functional modules according to needs, that is, the internal structure is divided into different functional modules to complete all or part of the functions described above.
Example 3:
the present embodiment provides a computer device, which may be a computer, as shown in fig. 8, and includes a processor 802, a memory, an input device 803, a display 804 and a network interface 805 connected by a system bus 801, the processor is used for providing computing and control capabilities, the memory includes a nonvolatile storage medium 806 and an internal memory 807, the nonvolatile storage medium 806 stores an operating system, computer programs and a database, the internal memory 807 provides an environment for the operating system and the computer programs in the nonvolatile storage medium to run, and when the processor 802 executes the computer programs stored in the memory, the video retrieval method of the above embodiment 1 is implemented as follows:
constructing a convolutional neural network; wherein the convolutional neural network is a DenseNet model;
acquiring a plurality of videos;
extracting a depth feature vector of a video frame in each video by using a DenseNet model;
for each video, extracting key frames according to the depth feature vectors of the video frames, and outputting a key frame set;
establishing an index relationship between each video and the key frame set of each video, and storing the index relationship into a video characteristic database;
and searching the video in the video characteristic database according to the image or the short video provided by the user, and outputting a video searching result.
Further, the DenseNet model adopts a DenseNet-201 model;
the DenseNet-201 model comprises a convolution layer, a pooling layer, a first dense block, a first transition layer, a second dense block, a second transition layer, a third dense block, a third transition layer, a fourth dense block and a classification layer which are sequentially connected.
Further, the extracting a key frame according to the depth feature vector of the video frame and outputting a key frame set specifically includes:
setting the 1 st frame as a reference frame, taking the reference frame as a key frame, and adding the key frame into a key frame set;
according to the depth feature vector of the video frame, calculating the cosine included angle similarity of the current frame and the reference frame;
if the cosine included angle similarity is smaller than a threshold value, comparing the current frame with the key frame set, if the cosine included angle similarity is not repeated, taking the current frame as a key frame, adding the key frame set, and updating the current frame into a reference frame;
if the updated reference frame is not the last frame, the cosine included angle similarity calculation is carried out on the current frame and the reference frame according to the depth feature vector of the video frame, and the subsequent operation is executed; and if the updated reference frame is the last frame, outputting the key frame set.
Further, according to the image provided by the user, retrieving the video in the video feature database, and outputting a video retrieval result, specifically comprising:
according to the image provided by the user, the DenseNet model is utilized to extract the characteristics of the image, the cosine included angle similarity comparison is carried out on the characteristics and the database, and the first N most similar videos are output in a sequence from large to small.
Further, according to the short video provided by the user, retrieving the video in the video feature database, and outputting a video retrieval result, specifically comprising:
according to the short video provided by the user, the characteristics of the short video are extracted by using a DenseNet model, the key frame set of the short video and all the key frame sets in the database are matched in similarity in a sliding window mode, the similarity is sorted from large to small, and the first N most similar videos are output.
The computer equipment of the embodiment can be provided with video retrieval software capable of realizing the video retrieval method, the video retrieval software is provided with a video retrieval algorithm program, the video retrieval algorithm program is shown in fig. 9 and consists of video warehousing and video retrieval, wherein the video warehousing mainly comprises the steps of building a convolutional neural network, extracting video frame characteristics, extracting key frames and establishing indexes, and the videos and the depth characteristics of the corresponding key frames are indexed and warehoused so as to facilitate the subsequent retrieval of the videos; video retrieval includes retrieving video with pictures and retrieving video with short video.
Example 4:
the present embodiment provides a storage medium, which is a computer-readable storage medium, and stores a computer program, and when the computer program is executed by a processor, the method for checking attendance according to embodiment 1 above is implemented as follows:
constructing a convolutional neural network; wherein the convolutional neural network is a DenseNet model;
acquiring a plurality of videos;
extracting a depth feature vector of a video frame in each video by using a DenseNet model;
for each video, extracting key frames according to the depth feature vectors of the video frames, and outputting a key frame set;
establishing an index relationship between each video and the key frame set of each video, and storing the index relationship into a video characteristic database;
and searching the video in the video characteristic database according to the image or the short video provided by the user, and outputting a video searching result.
Further, the DenseNet model adopts a DenseNet-201 model;
the DenseNet-201 model comprises a convolution layer, a pooling layer, a first dense block, a first transition layer, a second dense block, a second transition layer, a third dense block, a third transition layer, a fourth dense block and a classification layer which are sequentially connected.
Further, the extracting a key frame according to the depth feature vector of the video frame and outputting a key frame set specifically includes:
setting the 1 st frame as a reference frame, taking the reference frame as a key frame, and adding the key frame into a key frame set;
according to the depth feature vector of the video frame, calculating the cosine included angle similarity of the current frame and the reference frame;
if the cosine included angle similarity is smaller than a threshold value, comparing the current frame with the key frame set, if the cosine included angle similarity is not repeated, taking the current frame as a key frame, adding the key frame set, and updating the current frame into a reference frame;
if the updated reference frame is not the last frame, the cosine included angle similarity calculation is carried out on the current frame and the reference frame according to the depth feature vector of the video frame, and the subsequent operation is executed; and if the updated reference frame is the last frame, outputting the key frame set.
Further, according to the image provided by the user, retrieving the video in the video feature database, and outputting a video retrieval result, specifically comprising:
according to the image provided by the user, the DenseNet model is utilized to extract the characteristics of the image, the cosine included angle similarity comparison is carried out on the characteristics and the database, and the first N most similar videos are output in a sequence from large to small.
Further, according to the short video provided by the user, retrieving the video in the video feature database, and outputting a video retrieval result, specifically comprising:
according to the short video provided by the user, the characteristics of the short video are extracted by using a DenseNet model, the key frame set of the short video and all the key frame sets in the database are matched in similarity in a sliding window mode, the similarity is sorted from large to small, and the first N most similar videos are output.
The storage medium described in this embodiment may be a magnetic disk, an optical disk, a computer Memory, a Random Access Memory (RAM), a usb disk, a removable hard disk, or other media.
In summary, the DenseNet model is used as the convolutional neural network firstly, the DenseNet model further expands the convolutional neural network connection on the basis of the ResNet model, for any layer of dense blocks in the convolutional neural network, the feature maps of all layers in front of the layer are input to the layer, and the feature maps of all layers in back of the layer are input to the layer, so that the design has the advantages of reducing the problem of gradient disappearance, enhancing the propagation of the feature maps, improving the utilization rate, greatly reducing the number of parameters and enabling the extracted features to be richer and more diversified; secondly, aiming at the fact that characteristics such as color, texture and shape adopted by the traditional content-based video retrieval are easily influenced by noise and illumination interference, deep characteristics with high abstraction and high generalization robustness of images can be extracted through a convolutional neural network, video shot segmentation, video frame depth characteristic extraction, key frame extraction and video characteristic database construction are achieved, and finally the content-based video retrieval function is achieved.
The above description is only for the preferred embodiments of the present invention, but the protection scope of the present invention is not limited thereto, and any person skilled in the art can substitute or change the technical solution and the inventive concept of the present invention within the scope of the present invention.

Claims (10)

1. A method for video retrieval based on depth features, the method comprising:
constructing a convolutional neural network; wherein the convolutional neural network is a DenseNet model;
acquiring a plurality of videos;
extracting a depth feature vector of a video frame in each video by using a DenseNet model;
for each video, extracting key frames according to the depth feature vectors of the video frames, and outputting a key frame set;
establishing an index relationship between each video and the key frame set of each video, and storing the index relationship into a video characteristic database;
and searching the video in the video characteristic database according to the image or the short video provided by the user, and outputting a video searching result.
2. The video retrieval method of claim 1, wherein the DenseNet model employs a DenseNet-201 model;
the DenseNet-201 model comprises a convolution layer, a pooling layer, a first dense block, a first transition layer, a second dense block, a second transition layer, a third dense block, a third transition layer, a fourth dense block and a classification layer which are sequentially connected.
3. The video retrieval method according to claim 1, wherein the extracting key frames according to the depth feature vectors of the video frames and outputting the key frame set specifically comprises:
setting the 1 st frame as a reference frame, taking the reference frame as a key frame, and adding the key frame into a key frame set;
according to the depth feature vector of the video frame, calculating the cosine included angle similarity of the current frame and the reference frame;
if the cosine included angle similarity is smaller than a threshold value, comparing the current frame with the key frame set, if the cosine included angle similarity is not repeated, taking the current frame as a key frame, adding the key frame set, and updating the current frame into a reference frame;
if the updated reference frame is not the last frame, the cosine included angle similarity calculation is carried out on the current frame and the reference frame according to the depth feature vector of the video frame, and the subsequent operation is executed; and if the updated reference frame is the last frame, outputting the key frame set.
4. The video retrieval method of claim 3, wherein the cosine angle similarity is calculated as follows:
Figure FDA0002391285390000011
wherein, IkRepresenting the depth feature vector of the current frame, IrefRepresenting the depth feature vector of the reference frame.
5. The video retrieval method according to any one of claims 1 to 4, wherein retrieving videos in the video feature database according to images provided by a user, and outputting a video retrieval result specifically includes:
according to the image provided by the user, the DenseNet model is utilized to extract the characteristics of the image, the cosine included angle similarity comparison is carried out on the characteristics and the database, and the first N most similar videos are output in a sequence from large to small.
6. The video retrieval method according to any one of claims 1 to 4, wherein retrieving videos in the video feature database according to short videos provided by a user, and outputting video retrieval results, specifically comprising:
according to the short video provided by the user, the characteristics of the short video are extracted by using a DenseNet model, the key frame set of the short video and all the key frame sets in the database are matched in similarity in a sliding window mode, the similarity is sorted from large to small, and the first N most similar videos are output.
7. A depth feature-based video retrieval system, the system comprising:
the convolutional neural network construction module is used for constructing a convolutional neural network; wherein the convolutional neural network is a DenseNet model;
the video acquisition module is used for acquiring a plurality of videos;
the video frame feature extraction module is used for extracting a depth feature vector of a video frame in each video by using a DenseNet model;
the key frame extraction module is used for extracting key frames and outputting a key frame set according to the depth feature vectors of the video frames aiming at each video;
the index establishing module is used for establishing an index relationship between each video and the key frame set of each video and storing the index relationship into a video characteristic database;
and the video retrieval module is used for retrieving videos in the video characteristic database according to the images or short videos provided by the user and outputting video retrieval results.
8. The video retrieval system of claim 7, wherein the DenseNet model employs a DenseNet-201 model;
the DenseNet-201 model comprises a convolution layer, a pooling layer, a first dense block, a first transition layer, a second dense block, a second transition layer, a third dense block, a third transition layer, a fourth dense block and a classification layer which are sequentially connected.
9. A computer device comprising a processor and a memory for storing a program executable by the processor, wherein the processor, when executing the program stored in the memory, implements the video retrieval method of any one of claims 1 to 6.
10. A storage medium storing a program, wherein the program, when executed by a processor, implements the video retrieval method of any one of claims 1 to 6.
CN202010115194.5A 2020-02-25 2020-02-25 Video retrieval method, system, computer equipment and storage medium based on depth features Pending CN111339369A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010115194.5A CN111339369A (en) 2020-02-25 2020-02-25 Video retrieval method, system, computer equipment and storage medium based on depth features

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010115194.5A CN111339369A (en) 2020-02-25 2020-02-25 Video retrieval method, system, computer equipment and storage medium based on depth features

Publications (1)

Publication Number Publication Date
CN111339369A true CN111339369A (en) 2020-06-26

Family

ID=71185653

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010115194.5A Pending CN111339369A (en) 2020-02-25 2020-02-25 Video retrieval method, system, computer equipment and storage medium based on depth features

Country Status (1)

Country Link
CN (1) CN111339369A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112053313A (en) * 2020-08-31 2020-12-08 西安工业大学 Night vision anti-halation video processing method for heterogeneous image fusion
CN112069967A (en) * 2020-08-31 2020-12-11 西安工业大学 Night-vision anti-halation pedestrian detection and tracking method based on heterogeneous video fusion
CN112487242A (en) * 2020-11-27 2021-03-12 百度在线网络技术(北京)有限公司 Method and device for identifying video, electronic equipment and readable storage medium
CN112836600A (en) * 2021-01-19 2021-05-25 新华智云科技有限公司 Method and system for calculating video similarity
CN113139517A (en) * 2021-05-14 2021-07-20 广州广电卓识智能科技有限公司 Face living body model training method, face living body model detection method, storage medium and face living body model detection system
CN113627342A (en) * 2021-08-11 2021-11-09 人民中科(济南)智能技术有限公司 Method, system, device and storage medium for video depth feature extraction optimization
CN117540047A (en) * 2023-11-24 2024-02-09 中科世通亨奇(北京)科技有限公司 Method, system, equipment and storage medium for retrieving video based on picture
CN117612215A (en) * 2024-01-23 2024-02-27 南京中孚信息技术有限公司 Identity recognition method, device and medium based on video retrieval

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108228915A (en) * 2018-03-29 2018-06-29 华南理工大学 A kind of video retrieval method based on deep learning
CN109359725A (en) * 2018-10-24 2019-02-19 北京周同科技有限公司 Training method, device, equipment and the computer readable storage medium of convolutional neural networks model
CN109918537A (en) * 2019-01-18 2019-06-21 杭州电子科技大学 A kind of method for quickly retrieving of the ship monitor video content based on HBase
CN110162665A (en) * 2018-12-28 2019-08-23 腾讯科技(深圳)有限公司 Video searching method, computer equipment and storage medium
CN110278449A (en) * 2019-06-26 2019-09-24 腾讯科技(深圳)有限公司 A kind of video detecting method, device, equipment and medium
CN110751209A (en) * 2019-10-18 2020-02-04 北京邮电大学 Intelligent typhoon intensity determination method integrating depth image classification and retrieval

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108228915A (en) * 2018-03-29 2018-06-29 华南理工大学 A kind of video retrieval method based on deep learning
CN109359725A (en) * 2018-10-24 2019-02-19 北京周同科技有限公司 Training method, device, equipment and the computer readable storage medium of convolutional neural networks model
CN110162665A (en) * 2018-12-28 2019-08-23 腾讯科技(深圳)有限公司 Video searching method, computer equipment and storage medium
CN109918537A (en) * 2019-01-18 2019-06-21 杭州电子科技大学 A kind of method for quickly retrieving of the ship monitor video content based on HBase
CN110278449A (en) * 2019-06-26 2019-09-24 腾讯科技(深圳)有限公司 A kind of video detecting method, device, equipment and medium
CN110751209A (en) * 2019-10-18 2020-02-04 北京邮电大学 Intelligent typhoon intensity determination method integrating depth image classification and retrieval

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张惠凡 等: "基于卷积神经网络的鸟类视频图像检索研究" *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112053313A (en) * 2020-08-31 2020-12-08 西安工业大学 Night vision anti-halation video processing method for heterogeneous image fusion
CN112069967A (en) * 2020-08-31 2020-12-11 西安工业大学 Night-vision anti-halation pedestrian detection and tracking method based on heterogeneous video fusion
CN112487242A (en) * 2020-11-27 2021-03-12 百度在线网络技术(北京)有限公司 Method and device for identifying video, electronic equipment and readable storage medium
CN112836600A (en) * 2021-01-19 2021-05-25 新华智云科技有限公司 Method and system for calculating video similarity
CN112836600B (en) * 2021-01-19 2023-12-22 新华智云科技有限公司 Video similarity calculation method and system
CN113139517A (en) * 2021-05-14 2021-07-20 广州广电卓识智能科技有限公司 Face living body model training method, face living body model detection method, storage medium and face living body model detection system
CN113139517B (en) * 2021-05-14 2023-10-27 广州广电卓识智能科技有限公司 Face living body model training method, face living body model detection method, storage medium and face living body model detection system
CN113627342A (en) * 2021-08-11 2021-11-09 人民中科(济南)智能技术有限公司 Method, system, device and storage medium for video depth feature extraction optimization
CN113627342B (en) * 2021-08-11 2024-04-12 人民中科(济南)智能技术有限公司 Method, system, equipment and storage medium for video depth feature extraction optimization
CN117540047A (en) * 2023-11-24 2024-02-09 中科世通亨奇(北京)科技有限公司 Method, system, equipment and storage medium for retrieving video based on picture
CN117612215A (en) * 2024-01-23 2024-02-27 南京中孚信息技术有限公司 Identity recognition method, device and medium based on video retrieval
CN117612215B (en) * 2024-01-23 2024-04-26 南京中孚信息技术有限公司 Identity recognition method, device and medium based on video retrieval

Similar Documents

Publication Publication Date Title
CN111339369A (en) Video retrieval method, system, computer equipment and storage medium based on depth features
CN108108499B (en) Face retrieval method, device, storage medium and equipment
US8232996B2 (en) Image learning, automatic annotation, retrieval method, and device
CN109359725B (en) Training method, device and equipment of convolutional neural network model and computer readable storage medium
CN102184242B (en) Cross-camera video abstract extracting method
CN106649663B (en) A kind of video copying detection method based on compact video characterization
US9665773B2 (en) Searching for events by attendants
CN114694185B (en) Cross-modal target re-identification method, device, equipment and medium
Wang et al. Duplicate discovery on 2 billion internet images
Papadopoulos et al. Automatic summarization and annotation of videos with lack of metadata information
CN111723692B (en) Near-repetitive video detection method based on label features of convolutional neural network semantic classification
Zhong et al. Video-based person re-identification based on distributed cloud computing
WO2024032177A1 (en) Data processing method and apparatus, electronic device, storage medium, and program product
CN109241342B (en) Video scene retrieval method and system based on depth clues
Xu et al. A novel shot detection algorithm based on clustering
CN114973099A (en) Intelligent object searching method and system based on traceable target identification
CN111506754B (en) Picture retrieval method, device, storage medium and processor
Kamde et al. Entropy supported video indexing for content based video retrieval
CN111581420A (en) Medical image real-time retrieval method based on Flink
Ravani et al. Parallel CBIR system based on color coherence vector
CN114612985B (en) Portrait query method, system, equipment and storage medium
Nasreen et al. Parallelizing Multi-featured Content Based Search and Retrieval of Videos through High Performance Computing
Hiriyannaiah et al. Deep learning and its applications for content-based video retrieval
CN112597329B (en) Real-time image retrieval method based on improved semantic segmentation network
CN115442660B (en) Self-supervision countermeasure video abstract extraction method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination