WO2017177661A1 - 基于卷积神经网络的视频检索方法及*** - Google Patents

基于卷积神经网络的视频检索方法及*** Download PDF

Info

Publication number
WO2017177661A1
WO2017177661A1 PCT/CN2016/103945 CN2016103945W WO2017177661A1 WO 2017177661 A1 WO2017177661 A1 WO 2017177661A1 CN 2016103945 W CN2016103945 W CN 2016103945W WO 2017177661 A1 WO2017177661 A1 WO 2017177661A1
Authority
WO
WIPO (PCT)
Prior art keywords
layer
convolution layer
convolution
video
calculation model
Prior art date
Application number
PCT/CN2016/103945
Other languages
English (en)
French (fr)
Inventor
刘阳
白茂生
魏伟
蔡砚刚
祁海
李兴玉
Original Assignee
乐视控股(北京)有限公司
乐视云计算有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 乐视控股(北京)有限公司, 乐视云计算有限公司 filed Critical 乐视控股(北京)有限公司
Publication of WO2017177661A1 publication Critical patent/WO2017177661A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/73Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content

Definitions

  • the present disclosure relates to the field of convolutional neural network technologies, and in particular, to a video retrieval method and system based on a convolutional neural network.
  • the more commonly used methods generally perform video retrieval according to the title of the video or a label manually set in the video in advance, but this method has certain limitations, and sometimes it is not possible to retrieve the result satisfactory to the user. For example, to retrieve the name of a video, due to the repetition of the name, the difference in the language, the result that the user often retrieves is still a large amount of video, and the desired video cannot be accurately obtained. Moreover, for some special cases, the user sometimes does not know the title of the video that he wants to retrieve, and the user will not be able to perform the search.
  • the present disclosure proposes a video retrieval method and system based on a convolutional neural network, which can greatly improve the speed and accuracy of video retrieval.
  • a video retrieval method based on convolutional neural network includes:
  • the calculation model Constructing a calculation model for classification based on a convolutional neural network according to a retrieval requirement of the video, the calculation model including a convolution layer, a pooling layer, a fully connected layer, and a classifier;
  • the calculation model is trained by image data to obtain an optimized calculation model, and the classifier in the optimization calculation model is removed to obtain an extraction calculation model;
  • the video to be retrieved is extracted from the transition frame, and the transition feature of the video is extracted by extracting the calculation model, and the transition feature is retrieved in the transition feature database to obtain a video retrieval result.
  • the calculation model for classification based on the convolutional neural network comprises a convolution layer C1, a convolution layer C2, a pooling layer P2, a convolution layer C3, a pooling layer P3, and a convolution layer connected in sequence.
  • C4 convolutional layer C5, convoluted layer C6, pooled layer P6, fully connected layer fc7, fully connected layer fc8, and classifier.
  • the core sizes of the six convolution layers are no more than 5 ⁇ 5, and the core size of the convolution layer C1 is 3 ⁇ 3, and the core size of the convolution layer C2 is 3 ⁇ 3.
  • the core of the convolution layer C3 has a size of 5 ⁇ 5, and the pooling layer P2, the pooling layer P3, the convolution layer C4, the convolution layer C5, the convolution layer C6, and the pooling layer P6.
  • the nuclear size is 3 ⁇ 3;
  • the convolution layer C1, the convolution layer C2, the convolution layer C3, the convolution layer C4, the convolution layer C5, and the convolution layer C6 The number of steps is 1, the number of steps of the pooling layer P2, the pooling layer P3, and the pooling layer P6 are both 2;
  • the pad values of the convolution layer C1, the convolution layer C2, the convolution layer C4, the convolution layer C5, and the convolution layer C6 are all 1, and the pad value of the convolution layer C3 is 2, and the pooling layer
  • the pad value of P2, pooling layer P3, and pooling layer P6 is 0;
  • the number of convolution kernels of the convolution layer C1 and the convolution layer C2 is 96, and the number of convolution kernels of the convolution layer C3 and the convolution layer C6 is 256, and the convolution The number of convolution kernels of layer C4 and convolution layer C5 is 384.
  • the number of nodes of the fully connected layer fc7 and the fully connected layer fc8 is 4096 and 1000 respectively; and the fully connected layer fc7 adopts a dropout method for data processing;
  • the classifier is a softmax classifier.
  • the convolution layer C1, the convolution layer C2, the convolution layer C3, the convolution layer C4, the convolution layer C5, the convolution layer C6, and the fully connected layer fc7 all use the activation function LEAKY RELU for data. Activation processing.
  • the method for performing the extraction of the transition frame is:
  • a preset algorithm it is determined whether the difference between the gray histograms of the Y planes of the two frames before and after the current video frame is greater than a preset video threshold, and if so, the current video frame is a transition frame, otherwise, the current video frame is not Transition frame
  • the Hp[i] and Hn[i] are respectively values corresponding to the gray histogram of the Y frame of the previous frame and the subsequent frame of the current video frame, and T is a preset video threshold.
  • the preset video threshold T is calculated as:
  • T width*height/8, where width and height are the width and height of the video frame, respectively.
  • the present disclosure also provides a video retrieval system based on a convolutional neural network, comprising:
  • the building module is configured to construct a calculation model for classification based on the convolutional neural network according to the retrieval requirement of the video, and send the constructed calculation model to the training module, where the calculation model includes a convolution layer, a pooling layer, and a full connection.
  • Layer and classifier
  • a training module configured to receive a calculation model sent by the construction module, train the calculation model through image data, obtain an optimization calculation model, remove a classifier in the optimization calculation model, obtain an extraction calculation model, and send the extraction calculation model Give the database module and the retrieval module;
  • a database module configured to receive an extraction calculation model sent by the training module, extract a transition frame of an existing video resource, and extract a transition feature of the transition frame by using the extraction calculation model, and establish a transition feature database.
  • a retrieval module configured to receive an extraction calculation model sent by the training module, extract a video frame to be retrieved, extract a transition feature of the video by extracting a calculation model, and set the transition feature in the database
  • the search is performed in the transition feature database in the module to obtain the search result of the video.
  • the calculation model for classification based on the convolutional neural network comprises a convolution layer C1, a convolution layer C2, a pooling layer P2, a convolution layer C3, a pooling layer P3, and a convolution layer connected in sequence.
  • C4 convolutional layer C5, convoluted layer C6, pooled layer P6, fully connected layer fc7, fully connected layer fc8, and classifier.
  • the core sizes of the six convolution layers are no more than 5 ⁇ 5, and the core size of the convolution layer C1 is 3 ⁇ 3, and the core size of the convolution layer C2 is 3 ⁇ 3.
  • the core of the convolution layer C3 has a size of 5 ⁇ 5, and the pooling layer P2, the pooling layer P3, the convolution layer C4, the convolution layer C5, the convolution layer C6, and the pooling layer P6.
  • the nuclear size is 3 ⁇ 3;
  • the number of steps of the convolution layer C1, the convolution layer C2, the convolution layer C3, the convolution layer C4, the convolution layer C5, and the convolution layer C6 is 1, the pooling layer P2, the pooling layer P3, The number of steps of the pooling layer P6 is 2;
  • the pad values of the convolution layer C1, the convolution layer C2, the convolution layer C4, the convolution layer C5, and the convolution layer C6 are all 1, and the pad value of the convolution layer C3 is 2, and the pooling layer
  • the pad value of P2, pooling layer P3, and pooling layer P6 is 0;
  • the number of convolution kernels of the convolution layer C1 and the convolution layer C2 is 96, and the number of convolution kernels of the convolution layer C3 and the convolution layer C6 is 256, and the convolution The number of convolution kernels of layer C4 and convolution layer C5 is 384.
  • the number of nodes of the fully connected layer fc7 and the fully connected layer fc8 is 4096 and 1000 respectively; and the fully connected layer fc7 adopts a dropout method for data processing;
  • the classifier is a softmax classifier.
  • the convolution layer C1, the convolution layer C2, the convolution layer C3, the convolution layer C4, the convolution layer C5, the convolution layer C6, and the fully connected layer fc7 all use the activation function LEAKY RELU for data. Activation processing.
  • the method for performing the extraction of the transition frame is:
  • a preset algorithm it is determined whether the difference between the gray histograms of the Y planes of the two frames before and after the current video frame is greater than a preset video threshold, and if so, the current video frame is a transition frame, otherwise, the current video frame is not Transition frame
  • the Hp[i] and Hn[i] are respectively values corresponding to the gray histogram of the Y frame of the previous frame and the subsequent frame of the current video frame, and T is a preset video threshold.
  • the preset video threshold T is calculated as:
  • T width*height/8, where width and height are the width and height of the video frame, respectively.
  • the present disclosure also provides a non-transitory storage medium storing computer executable instructions arranged to perform the convolutional neural network based video retrieval method described above.
  • the present disclosure also provides a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions, when the program instructions are executed by a computer, The computer is caused to perform the above-described convolutional neural network based video retrieval method.
  • the present disclosure also provides an electronic device comprising at least one processor and a memory communicatively coupled to the at least one processor, the memory for storing instructions executable by the at least one processor, the instructions being The at least one processor, when executed, causes the at least one processor to perform the convolutional neural network based video retrieval method described above.
  • the convolutional neural network-based video retrieval method and system of the present disclosure improves the robustness of the retrieval process, removes redundant information, and improves the speed and accuracy of video retrieval.
  • FIG. 1 is a flowchart of a video retrieval method based on a convolutional neural network according to an embodiment of the present disclosure
  • FIG. 2 is a schematic structural diagram of a convolutional neural network calculation model according to an embodiment of the present disclosure
  • FIG. 3 is a schematic structural diagram of an embodiment of a video retrieval system based on a convolutional neural network according to an embodiment of the present disclosure
  • FIG. 4 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present disclosure.
  • FIG. 1 is a flowchart of a convolutional neural network based video retrieval method provided by the present disclosure.
  • the convolutional neural network-based video retrieval method includes:
  • Step 101 Construct a calculation model for classification based on a convolutional neural network according to a retrieval requirement of the video, where the calculation model includes a convolution layer, a pooling layer, a fully connected layer, and a classifier;
  • CNN Convolutional Neural Network
  • convolutional neural networks have become one of the research hotspots in many scientific fields, especially in the field of pattern classification. Since the network avoids the complicated pre-processing of images, it can directly input the original image, thus obtaining a wider range. application.
  • classification characteristics of the convolutional neural network it can also be extended to other application fields, for example, the extraction of video or picture features in the present disclosure.
  • the computational model for classification based on convolutional neural networks refers to a type of computational model for performing feature extraction and finally implementing classification.
  • the calculation model includes multiple convolution layers, a pooling layer, and a fully connected layer.
  • the convolution layer is used for feature extraction
  • the pooling layer is used for dimension reduction of feature data, that is, the basis for ensuring data validity.
  • the amount of data is greatly reduced
  • the full connection layer implements data mapping
  • the classifier is used to implement classification of features.
  • Step 102 Train the calculation model through image data, obtain an optimization calculation model, remove the classifier in the optimization calculation model, and obtain an extraction calculation model;
  • the image data is, for example, an imagenet database.
  • the process of completing the self-learning of the computational model is completed step by step, which is called the optimized computational model.
  • the remaining computing models can only extract the image or video features, that is, obtain a trained extraction model.
  • Step 103 Perform a process of extracting a transition frame from an existing video resource, and extract a transition feature of the transition frame by using the extracted calculation model to establish a transition feature database.
  • the transition feature database is established.
  • the video can be retrieved through the transition feature.
  • the transition feature database usually only needs to be established once to enable retrieval of all subsequent videos, and the transition feature database can be reused in other related fields.
  • Step 104 The video to be retrieved is extracted from the transition frame, and the transition feature of the video is extracted by extracting the calculation model, and the transition feature is retrieved in the transition feature database to obtain a video search result.
  • the video to be retrieved is usually a broken video, a partial video, or a lower version of the video, and the user wants to retrieve a better and complete video through these videos.
  • the video to be retrieved is also extracted by the extracted calculation model to obtain the transition feature of the video, and then retrieved in the established transition feature database in step 103, and the video result related to the video to be retrieved can be obtained.
  • the video retrieval method based on convolutional neural network can obtain a convolution based on a convolutional neural network based on a convolutional neural network for classifying the classification model in the post-training calculation model.
  • the extraction calculation module of the neural network extracts the transition features of the transition frame according to the extraction calculation module, and establishes a transition feature database.
  • the video can be quickly retrieved through the transition feature database.
  • the convolutional neural network-based video retrieval method and system not only improves the robustness of the retrieval process but also removes redundant information by using the transition frame in the video as the object of data processing;
  • the computational model of the network extracts features, which greatly improves the speed and accuracy of video retrieval.
  • the video retrieval method is not only used for video retrieval, but also suitable for retrieval of multimedia files such as pictures and audios, and only needs to establish a multimedia feature database.
  • the convolutional neural network-based computational model for classification includes a convolutional layer C1, a convolutional layer C2, a pooled layer P2, a convolutional layer C3, and a pooling layer that are sequentially connected.
  • the pooling layer is subjected to a pooling process using an average value.
  • the calculation model adopts 2 fully connected layers, and the last fully connected layer fc8 outputs features 1000-dimensional features. And in order to prevent over-fitting, the dropout method is adopted in the full connection layer.
  • the softmax classifier is used for training.
  • the above network is classified and trained by using the ImageNet image database.
  • the number of training iterations is 300,000.
  • the softmax layer in the above model is removed, and the other parts of the model are used for feature extraction, and the feature output is the fully connected layer fc8 in the model.
  • Establish a video database for all existing video resources, perform the transition frame extraction in turn, and then use the trained model (removing the softmax layer) to extract the feature of the transition frame and save it, so that a video will get a feature vector.
  • the feature vectors of all videos are saved for subsequent retrieval.
  • the transition frame is first extracted, and then the trained model is used for feature extraction. Finally, the kd tree algorithm is used to perform fast retrieval based on the extracted features and the characteristics of the entire video library.
  • the core sizes of the six convolution layers are no more than 5 ⁇ 5, and the core size of the convolution layer C1 is 3 ⁇ 3, and the core size of the convolution layer C2 is 3 ⁇ 3. .
  • the successively connected convolutional layers can extract the feature data of the video or the picture more effectively because of having a smaller core, and also reduce the parameters of the neural network calculation model, thereby improving the speed of feature extraction and preventing over-fitting. Great role.
  • the core of the convolution layer C3 has a size of 5 ⁇ 5, and the pooling layer P2, the pooling layer P3, the convolution layer C4, the convolution layer C5, the convolution layer C6, and the pooling layer P6.
  • the nuclear size is 3 ⁇ 3;
  • the convolution layer C1, the convolution layer C2, the convolution layer C3, the convolution layer C4, the convolution layer C5, and the convolution layer C6 The number of steps is 1, the number of steps of the pooling layer P2, the pooling layer P3, and the pooling layer P6 are both 2;
  • the pad values of the convolution layer C1, the convolution layer C2, the convolution layer C4, the convolution layer C5, and the convolution layer C6 are all 1, and the pad value of the convolution layer C3 is 2, and the pooling layer
  • the pad value of P2, pooling layer P3, and pooling layer P6 is 0;
  • the number of convolution kernels of the convolution layer C1 and the convolution layer C2 is 96, and the number of convolution kernels of the convolution layer C3 and the convolution layer C6 is 256, and the convolution The number of convolution kernels of layer C4 and convolution layer C5 is 384.
  • the step number of the convolution layer refers to the step size of each movement of the core of the convolution layer
  • the pad value refers to whether a circle of data is added to participate in the operation around the input data, and the size of the pad value is added data.
  • the number of laps In this way, the processing efficiency and speed of the calculation model can be improved, thereby improving the efficiency of video feature extraction.
  • the number of nodes of the fully connected layer fc7 and the fully connected layer fc8 are 4096 and 1000, respectively; and the fully connected layer fc7 performs data processing by using a dropout method; the classifier is a softmax classifier. .
  • the number of nodes described can also be understood as the number of features.
  • the dropout method discards the remaining data by randomly opening a certain number of data, which can effectively prevent over-fitting of data, thereby improving the speed and efficiency of feature extraction.
  • the convolution layer C1, the convolution layer C2, the convolution layer C3, the convolution layer C4, the convolution layer C5, the convolution layer C6, and the fully connected layer fc7 all use the activation function LEAKY RELU for data.
  • the activation function calculates a new output result by the last output data through an algorithm in the activation function, and uses the new output result as the input data of the next layer.
  • the activation function LEAKY RELU used in the present disclosure has a certain output value when the function value is less than zero, that is, the data of the part whose function value is less than zero can also participate in the training process. .
  • the output value is multiplied by a coefficient a, which is preferably a fixed value.
  • the method for performing the extraction of the transition frame is:
  • a preset algorithm it is determined whether the difference between the gray histograms of the Y planes of the two frames before and after the current video frame is greater than a preset video threshold, and the difference between the gray histograms of the Y planes of the two frames before and after the current video frame If the video threshold is greater than the preset video threshold, the current video frame is a transition frame, and the difference between the gray histograms of the Y frames before and after the current video frame is less than or equal to a preset video threshold, and the current video frame is not a transition frame;
  • the Hp[i] and Hn[i] are respectively values corresponding to the gray histogram of the Y frame of the previous frame and the subsequent frame of the current video frame, and T is a preset video threshold.
  • the important frame refers to a frame that can represent video content very well.
  • the general key frame can represent the approximate content of the video, but considering that the key frame changes after multiple encodings, in order to improve the robustness of the system, the present disclosure treats the transition frame in the video as a processing object, ensuring At the same time of robustness, redundant information is removed, which ultimately improves the accuracy and speed of video retrieval.
  • the preset video threshold T is calculated as:
  • T width*height/8, where width and height are the width and height of the video frame, respectively.
  • the present disclosure prepares about 1.3 million images of the training sample, and trains the convolutional neural network to obtain an optimized calculation model. Then use the model to extract and build the existing video library in the network.
  • the convolutional layer in the training model is initialized with a Gaussian distribution with a standard deviation of 0.01.
  • the a parameter of the LEAKY RELU function is 0.01.
  • the parameters in the fully connected layer are initialized with a Gaussian distribution with a standard deviation of 0.002.
  • the dropout module has a parameter of 0.5.
  • the training process uses the back propagation algorithm (BP algorithm) to train and update the parameters. A total of 300,000 iterations are trained in the present disclosure.
  • BP algorithm back propagation algorithm
  • the calculation model based on the convolutional neural network includes: a convolution layer C1, a convolution layer C2, a pooling layer P2, a convolution layer C3, a pooling layer P3, a convolution layer C4, a convolution layer C5, which are sequentially connected.
  • the convolutional neural network based video retrieval system includes a construction module 301, a training module 302, a database module 303, and a retrieval module 304.
  • the construction module 301 is configured to construct a calculation model for classification based on the convolutional neural network according to the retrieval requirement of the video, and send the constructed calculation model to the training module 302, where the calculation model includes a convolution layer, a pooling layer, Fully connected layer and classifier;
  • the training module 302 is configured to receive the calculation model sent by the construction module 301, train the calculation model through image data, obtain an optimization calculation model, remove the classifier in the optimization calculation model, and obtain an extraction calculation model;
  • the model is sent to the database module 303 and the retrieval module 304;
  • the database module 303 is configured to receive the extracted computing model sent by the training module 302, extract the transition frame of the existing video resource, and extract the transition feature of the transition frame by using the extracted computing model to establish a transition.
  • Feature database is configured to receive the extracted computing model sent by the training module 302, extract the transition frame of the existing video resource, and extract the transition feature of the transition frame by using the extracted computing model to establish a transition.
  • the retrieval module 304 is configured to receive the extraction calculation model sent by the training module 302, extract the transition frame from the video to be retrieved, extract the transition feature of the video by extracting the calculation model, and set the transition feature in the The search is performed in the transition feature database in the database module 303 to obtain a search result of the video.
  • the convolutional neural network based video retrieval system is constructed by the The module 301 establishes a calculation model for classification based on a convolutional neural network, and the training module 302 trains to obtain an optimized calculation model, removes the classifier in the optimization calculation model, and obtains an extraction calculation model; the database module 303 is established. Based on the transition feature database of the transition frame, the retrieval module 304 finally implements accurate retrieval of the video.
  • the convolutional neural network-based video retrieval system not only improves the robustness of the retrieval process but also removes redundant information by using the transition frame within the video as the object of data processing; by using a convolutional neural network based
  • the calculation model extracts features, which greatly improves the speed and accuracy of video retrieval.
  • the present disclosure uses a convolutional neural network to extract and combine the overall features of the video, and uses the kd tree algorithm to complete the retrieval, and the retrieval result has the advantages of being accurate and fast.
  • the convolutional neural network-based computational model for classification includes a convolutional layer C1, a convolutional layer C2, a pooling layer P2, a convolutional layer C3, and a pooling layer that are sequentially connected.
  • the core sizes of the six convolution layers are no more than 5 ⁇ 5, and the core size of the convolution layer C1 is 3 ⁇ 3, and the core size of the convolution layer C2 is 3 ⁇ 3.
  • the convolution layer C3 has a core size of 5 ⁇ 5, and the pooling layer P2, the pooling layer P3, the convolution layer C4, the convolution layer C5, and the convolution layer C6.
  • the core size of the pooled layer P6 is 3 ⁇ 3;
  • the number of steps of the convolution layer C1, the convolution layer C2, the convolution layer C3, the convolution layer C4, the convolution layer C5, and the convolution layer C6 is 1, the pooling layer P2, the pooling layer P3, The number of steps of the pooling layer P6 is 2;
  • the pad values of the convolution layer C1, the convolution layer C2, the convolution layer C4, the convolution layer C5, and the convolution layer C6 are all 1, and the pad value of the convolution layer C3 is 2, and the pooling layer
  • the pad value of P2, pooling layer P3, and pooling layer P6 is 0;
  • the number of convolution kernels of the convolution layer C1 and the convolution layer C2 is 96, and the number of convolution kernels of the convolution layer C3 and the convolution layer C6 is 256, and the convolution The number of convolution kernels of layer C4 and convolution layer C5 is 384.
  • the number of nodes of the fully connected layer fc7 and the fully connected layer fc8 is 4096 and 1000 respectively; and the fully connected layer fc7 performs data processing by using a dropout method; the classifier is softmax Classifier.
  • the convolution layer C1, the convolution layer C2, the convolution layer C3, the convolution layer C4, the convolution layer C5, the convolution layer C6, and the fully connected layer fc7 all use the activation function LEAKY RELU for data. Activation processing.
  • the method for performing the extraction of the transition frame is:
  • a preset algorithm it is determined whether the difference between the gray histograms of the Y planes of the two frames before and after the current video frame is greater than a preset video threshold, and the difference between the gray histograms of the Y planes of the two frames before and after the current video frame If the video threshold is greater than the preset video threshold, the current video frame is a transition frame. If the difference between the gray histograms of the Y frames before and after the current video frame is less than or equal to a preset video threshold, the current video frame is not a transition frame;
  • the Hp[i] and Hn[i] are respectively values corresponding to the gray histogram of the Y frame of the previous frame and the subsequent frame of the current video frame, and T is a preset video threshold.
  • the preset video threshold T is calculated as:
  • T width*height/8, where width and height are the width and height of the video frame, respectively.
  • the convolutional neural network-based video retrieval system of the present disclosure is applied to various types of intelligent terminal devices such as mobile phones, computers, tablets, smart televisions, or the like, and can also be used for various types of servers. In, for example, a web search server. In the meantime, it is also within the scope of the present disclosure to use some of the modules in the convolutional neural network-based video retrieval system of the present disclosure as the functional modules of the terminal and the server respectively.
  • Embodiments of the present disclosure also provide a non-transitory storage medium including computer executable instructions, The computer executable instructions, when executed by a computer processor, are used to perform the convolutional neural network based video retrieval method of the above embodiments.
  • Embodiments of the present disclosure also provide a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions when the program instructions are When executed, the computer is caused to perform the convolutional neural network based video retrieval method of the above embodiment.
  • FIG. 4 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present disclosure.
  • the electronic device includes: one or more processors 501 and a memory 502, and one processor 501 is taken as an example in FIG. .
  • the electronic device may also include an input device 503 and an output device 504.
  • the processor 501, the memory 502, the input device 503, and the output device 504 in the electronic device may be connected by a bus or other means, and the bus connection is taken as an example in FIG.
  • the memory 502 is a non-volatile computer readable storage medium that can be used to store non-volatile software programs, non-volatile computer-executable programs, and modules, such as convolutional neural network-based video in embodiments of the present application.
  • the program instruction/module corresponding to the retrieval method.
  • the processor 501 performs various functional applications of the server and data processing by executing non-volatile software programs, instructions, and modules stored in the memory 502, that is, the convolutional neural network-based video retrieval method of the above embodiment is implemented.
  • the memory 502 may include a storage program area and an storage data area, wherein the storage program area may store an operating system, an application required for at least one function; the storage data area may store data created according to a method of taking a photo, and the like.
  • memory 502 can include high speed random access memory, and can also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device.
  • memory 502 can optionally include a memory that is remotely located relative to processor 501.
  • the input device 503 can be used to receive input numeric or character information, as well as user settings and key signal inputs related to function control.
  • Output device 504 can include a display device such as a display screen.
  • the one or more modules are stored in the memory 502, and when executed by the one or more processors 501, perform a convolutional neural network based video retrieval method in any of the above method embodiments.
  • the above product can perform the method provided by the embodiment of the present disclosure, and has the corresponding functional modules and beneficial effects of the execution method.
  • the above product can perform the method provided by the embodiment of the present disclosure, and has the corresponding functional modules and beneficial effects of the execution method.
  • the electronic device of an embodiment of the present disclosure exists in various forms including, but not limited to:
  • Mobile communication devices These devices are characterized by mobile communication functions and are mainly aimed at providing voice and data communication.
  • Such terminals include: smart phones (such as iPhone), multimedia phones, functional phones, and low-end phones.
  • Ultra-mobile personal computer equipment This type of equipment belongs to the category of personal computers, has computing and processing functions, and generally has mobile Internet access.
  • Such terminals include: PDAs, MIDs, and UMPC devices, such as the iPad.
  • Portable entertainment devices These devices can display and play multimedia content. Such devices include: audio, video players (such as iPod), handheld game consoles, e-books, and smart toys and portable car navigation devices.
  • the server consists of a processor, a hard disk, a memory, a system bus, etc.
  • the server is similar to a general-purpose computer architecture, but because of the need to provide highly reliable services, processing power and stability High reliability in terms of reliability, security, scalability, and manageability.
  • the device embodiments described above are merely illustrative, wherein the units described as separate components may or may not be physically separate, and the components displayed as units may or may not be physical units, ie may be located A place, or it can be distributed to multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • the convolutional neural network-based video retrieval method and system of the present disclosure improves the robustness of the retrieval process, removes redundant information, and improves the speed and accuracy of video retrieval.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Library & Information Science (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)

Abstract

一种基于卷积神经网络的视频检索方法及***,所述方法包括:首先构建基于卷积神经网络的计算模型(101);通过图像数据对计算模型进行训练,得到优化计算模型;去除模型中的分类器,得到提取计算模型(102);对已有视频资源进行转场帧的提取,并通过提取计算模型提取得到转场帧的转场特征,建立转场特征数据库(103);将待检索的视频进行转场帧的提取得到转场特征,将转场特征在转场特征数据库中进行检索,得到视频的检索结果(104)。

Description

基于卷积神经网络的视频检索方法及***
本申请要求在2016年4月15日提交中国专利局、申请号为201610237628.2、公开名称为“基于卷积神经网络的视频检索方法及***”的中国专利申请的优先权,该申请的全部内容通过引用结合在本申请中。
技术领域
本公开涉及卷积神经网络技术领域,特别涉及一种基于卷积神经网络的视频检索方法及***。
背景技术
随着互联网及多媒体技术的不断发展,人们越来越依赖于通过网络检索获取想要得到的信息,例如通过网络检索获取视频信息。但是由于网络中存在数量巨大的视频文件和相关内容,而用户在检索时通常都是想要获取某一个单一的视频,如何从海量的视频资源中,快速地查找到想要的视频是目前用户进行视频检索的难点。
目前,比较常用的方法一般是根据视频的标题或者人为预先在视频中手动设置的标签进行视频检索,但是这种方法存在一定的局限性,有时并不能检索得到用户满意的结果。例如:检索某一个视频的名字,由于名字的重复、语种的差异用户常常检索得到的结果还是一大堆视频,无法准确得到想要的视频。而且,针对某些特殊情形,用户有时候并不知道想要检索的视频的标题,此时用户将无法实现检索。例如:用户已有部分视频内容或者低版本的视频文件,而想要检索完整的视频或者高版本的视频文件,但是又不知道视频的标题,此 时,相关技术的检索方法均不能较好的实现视频检索的目的。
发明内容
有鉴于此,本公开提出一种基于卷积神经网络的视频检索方法及***,能够大大提高视频检索的速度和准确性。
本公开提供的一种基于卷积神经网络的视频检索方法,包括:
根据视频的检索需求,构建基于卷积神经网络的用于分类的计算模型,所述计算模型包括卷积层、池化层、全连接层以及分类器;
通过图像数据对所述计算模型进行训练,得到优化计算模型,去除优化计算模型中的分类器,得到提取计算模型;
对已有视频资源进行转场帧的提取,并通过所述提取计算模型提取得到转场帧的转场特征,建立转场特征数据库;
将待检索的视频进行转场帧的提取,通过提取计算模型提取得到视频的转场特征,将所述转场特征在所述转场特征数据库中进行检索,得到视频的检索结果。
可选地,所述基于卷积神经网络的用于分类的计算模型包括依次连接的卷积层C1、卷积层C2、池化层P2、卷积层C3、池化层P3、卷积层C4、卷积层C5、卷积层C6、池化层P6、全连接层fc7、全连接层fc8以及分类器。
可选地,所述6个卷积层的核大小均不大于5×5,且所述卷积层C1的核大小为3×3,卷积层C2的核大小为3×3。
可选地,所述卷积层C3的核大小为5×5,所述池化层P2、池化层P3、卷积层C4、卷积层C5、卷积层C6、池化层P6的核大小均为3×3;
所述卷积层C1、卷积层C2、卷积层C3、卷积层C4、卷积层C5、卷积层C6 的步数均为1,所述池化层P2、池化层P3、池化层P6的步数均为2;
所述卷积层C1、卷积层C2、卷积层C4、卷积层C5、卷积层C6的pad值均为1,所述卷积层C3的pad值为2,所述池化层P2、池化层P3、池化层P6的pad值为0;
所述卷积层C1、卷积层C2的卷积核的个数均为96个,所述卷积层C3、卷积层C6的卷积核的个数均为256个,所述卷积层C4、卷积层C5的卷积核的个数均为384个。
可选的,所述全连接层fc7、全连接层fc8的节点数目分别为4096、1000;且所述全连接层fc7采用dropout方式进行数据处理;
所述分类器为softmax分类器。
可选的,所述卷积层C1、卷积层C2、卷积层C3、卷积层C4、卷积层C5、卷积层C6以及所述全连接层fc7均采用激活函数LEAKY RELU进行数据的激活处理。
可选的,所述进行转场帧的提取的方法为:
按照预设的算法,判断所述当前视频帧的前后两帧Y平面的灰度直方图之差是否大于预设的视频阈值,若是,则当前视频帧为转场帧,否则,当前视频帧不是转场帧;
算法公式如下:
Figure PCTCN2016103945-appb-000001
其中,所述Hp[i]、Hn[i]分别为当前视频帧的前一帧和后一帧的Y平面的灰度直方图对应的数值,T为预设的视频阈值。
可选地,所述预设的视频阈值T的计算公式为:
T=width*height/8,其中,width和height分别为视频帧的宽度和高度。
本公开还提供了一种基于卷积神经网络的视频检索***,包括:
构建模块,设置为根据视频的检索需求,构建基于卷积神经网络的用于分类的计算模型,将构建的计算模型发送给训练模块,所述计算模型包括卷积层、池化层、全连接层以及分类器;
训练模块,设置为接收所述构建模块发送的计算模型,通过图像数据对所述计算模型进行训练,得到优化计算模型,去除优化计算模型中的分类器,得到提取计算模型;将提取计算模型发送给数据库模块和检索模块;
数据库模块,设置为接收所述训练模块发送的提取计算模型,对已有视频资源进行转场帧的提取,并通过所述提取计算模型提取得到转场帧的转场特征,建立转场特征数据库;
检索模块,设置为接收所述训练模块发送的提取计算模型,将待检索的视频进行转场帧的提取,通过提取计算模型提取得到视频的转场特征,将所述转场特征在所述数据库模块中的转场特征数据库中进行检索,得到视频的检索结果。
可选的,所述基于卷积神经网络的用于分类的计算模型包括依次连接的卷积层C1、卷积层C2、池化层P2、卷积层C3、池化层P3、卷积层C4、卷积层C5、卷积层C6、池化层P6、全连接层fc7、全连接层fc8以及分类器。
可选地,所述6个卷积层的核大小均不大于5×5,且所述卷积层C1的核大小为3×3,卷积层C2的核大小为3×3。
可选地,所述卷积层C3的核大小为5×5,所述池化层P2、池化层P3、卷积层C4、卷积层C5、卷积层C6、池化层P6的核大小均为3×3;
所述卷积层C1、卷积层C2、卷积层C3、卷积层C4、卷积层C5、卷积层C6的步数均为1,所述池化层P2、池化层P3、池化层P6的步数均为2;
所述卷积层C1、卷积层C2、卷积层C4、卷积层C5、卷积层C6的pad值均为1,所述卷积层C3的pad值为2,所述池化层P2、池化层P3、池化层P6的pad值为0;
所述卷积层C1、卷积层C2的卷积核的个数均为96个,所述卷积层C3、卷积层C6的卷积核的个数均为256个,所述卷积层C4、卷积层C5的卷积核的个数均为384个。
可选的,所述全连接层fc7、全连接层fc8的节点数目分别为4096、1000;且所述全连接层fc7采用dropout方式进行数据处理;
所述分类器为softmax分类器。
可选的,所述卷积层C1、卷积层C2、卷积层C3、卷积层C4、卷积层C5、卷积层C6以及所述全连接层fc7均采用激活函数LEAKY RELU进行数据的激活处理。
可选的,所述进行转场帧的提取的方法为:
按照预设的算法,判断所述当前视频帧的前后两帧Y平面的灰度直方图之差是否大于预设的视频阈值,若是,则当前视频帧为转场帧,否则,当前视频帧不是转场帧;
算法公式如下:
Figure PCTCN2016103945-appb-000002
其中,所述Hp[i]、Hn[i]分别为当前视频帧的前一帧和后一帧的Y平面的灰度直方图对应的数值,T为预设的视频阈值。
可选地,所述预设的视频阈值T的计算公式为:
T=width*height/8,其中,width和height分别为视频帧的宽度和高度。
本公开还提供了一种非暂态存储介质,存储有计算机可执行指令,所述计算机可执行指令设置为执行上述的基于卷积神经网络的视频检索方法。
本公开还提供了一种计算机程序产品,所述计算机程序产品包括存储在非暂态计算机可读存储介质上的计算机程序,所述计算机程序包括程序指令,当所述程序指令被计算机执行时,使所述计算机执行上述的基于卷积神经网络的视频检索方法。
本公开还提供了一种电子设备,包括至少一个处理器和与所述至少一个处理器通信连接的存储器,所述存储器用于存储可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行时,使所述至少一个处理器执行上述的基于卷积神经网络的视频检索方法。
本公开的基于卷积神经网络的视频检索方法及***提高了检索过程的鲁棒性,去除了冗余信息,提高了视频检索的速度和准确性。
附图说明
图1为本公开实施例提供的基于卷积神经网络的视频检索方法的流程图;
图2为本公开实施例提供的卷积神经网络计算模型的结构示意图;
图3为本公开实施例提供的基于卷积神经网络的视频检索***的实施例的结构示意图;
图4是本公开实施例提供的电子设备的硬件结构示意图。
具体实施方式
下面结合附图和实施例对本公开作进一步的详细说明。在不冲突的情况下,以下实施例和实施例中的特征可以相互组合。
需要说明的是,本公开中所有使用“第一”和“第二”的表述均是为了区分两个相同名称非相同的实体或者非相同的参量,可见“第一”“第二”仅为了表述的方便,不应理解为对本公开实施例的限定。
图1为本公开提供的基于卷积神经网络的视频检索方法的流程图。所述基于卷积神经网络的视频检索方法,包括:
步骤101,根据视频的检索需求,构建基于卷积神经网络的用于分类的计算模型,所述计算模型包括卷积层、池化层、全连接层以及分类器;
其中,所述卷积神经网络(Convolutional Neural Network,CNN)是近年发展起来,并引起广泛重视的一种高效识别方法。目前,卷积神经网络已经成为众多科学领域的研究热点之一,特别是在模式分类领域,由于该网络避免了对图像的复杂前期预处理,可以直接输入原始图像,因而得到了更为广泛的应用。同时根据卷积神经网络的分类特性,还可以扩展到其他应用领域中,例如:本公开中用于视频或图片特征的提取。所述基于卷积神经网络的用于分类的计算模型是指用于进行特征提取并最终实现分类的一类计算模型。其计算模型中包含多个卷积层、池化层以及全连接层,所述卷积层用于特征的提取,池化层用于特征数据的降维,也即在保证数据有效性的基础上大大减少数据量,全连接层实现数据映射,分类器用于实现特征的分类。
步骤102,通过图像数据对所述计算模型进行训练,得到优化计算模型,去除优化计算模型中的分类器,得到提取计算模型;
其中,所述图像数据例如为imagenet数据库。通过迭代训练,逐步完成使得计算模型完成自我学习的过程,称为优化的计算模型。将模型最后的分类器去除后,剩余的计算模型只能实现图片或视频特征的提取,也即得到一个训练后的提取计算模型。
步骤103,对已有视频资源进行转场帧的提取,并通过所述提取计算模型提取得到转场帧的转场特征,建立转场特征数据库;
其中,将所有能够获得的视频通过所述提取计算模型提取得到转场特征后, 建立转场特征数据库,当用户后续进行检索时,可以通过转场特征实现视频的检索。所述转场特征数据库通常只需要建立一次,就能够实现后续所有视频的检索,而且,在其他相关领域中所述转场特征数据库也能够重复使用。
步骤104,将待检索的视频进行转场帧的提取,通过提取计算模型提取得到视频的转场特征,将所述转场特征在所述转场特征数据库中进行检索,得到视频的检索结果。
其中,待检索的视频通常为残缺的视频、部分视频、或者较低版本的视频等,而用户想要通过这些视频检索得到更好的、完整的视频。将待检索的视频同样通过所述提取计算模型提取得到视频的转场特征,然后在步骤103中已建立的转场特征数据库中检索,能够得到与待检索的视频相关的视频结果。
由上述实施例可知,所述基于卷积神经网络的视频检索方法通过建立基于卷积神经网络的用于分类的计算模型,将训练后的计算模型中的分类器去除后,能够得到基于卷积神经网络的提取计算模块,将所有的视频资源根据所述提取计算模块提取得到转场帧的转场特征,并建立转场特征数据库,最后通过所述转场特征数据库可以实现视频的快速检索。所述基于卷积神经网络的视频检索方法及***通过将视频内的转场帧作为数据处理的对象,不仅提高了检索过程的鲁棒性,而且去除了冗余信息;通过采用基于卷积神经网络的计算模型进行特征的提取,大大提高了视频检索的速度和准确性。
需要说明的是:所述视频检索的方法不仅仅用于视频的检索,同样适用于图片、音频等多媒体文件的检索,只需要相应的建立多媒体的特征数据库。
在一些可选地实施例中,所述基于卷积神经网络的用于分类的计算模型包括依次连接的卷积层C1、卷积层C2、池化层P2、卷积层C3、池化层P3、卷积层C4、卷积层C5、卷积层C6、池化层P6、全连接层fc7、全连接层fc8以及分类器。 这样,通过前端两个卷积层的连续特征提取,能够提高计算模型的特征提取的效率和速度。
可选地,所述池化层采用平均值进行池化处理。
可选的,在本公开一个实施例中,所述计算模型采用2个全连接层,最后的全连接层fc8输出特征为1000维特征。并且为了防止过拟合,全连接层中采用了dropout的方式。训练时采用softmax分类器。
完成上述计算模型的设计后,利用ImageNet图像数据库对上述网络进行分类训练。训练迭代次数为30万次。完成训练后,将上述模型中的softmax层去掉,利用模型中其他部分进行特征提取工作,特征输出为模型中的全连接层fc8。
建立视频数据库:对所有现有视频资源,依次进行转场帧提取,之后利用训练好的模型(去除softmax层)对转场帧进行特征提取,并保存,这样一个视频会得到一个特征向量。将所有视频的特征向量进行保存,以便后续检索时应用。
检索:对于待检索的视频,首先进行转场帧提取,之后利用训练好的模型进行特征提取,最后利用kd tree算法,根据已提取的特征以及整个视频库的特征进行快速检索。
在另一些实施例中,所述6个卷积层的核大小均不大于5×5,且所述卷积层C1的核大小为3×3,卷积层C2的核大小为3×3。这样,依次连接的卷积层由于具有较小的核,能够更有效地提取视频或图片的特征数据,同时还减少了神经网络计算模型的参数,对于提升特征提取的速度以及防止过度拟合有较大的作用。
可选地,所述卷积层C3的核大小为5×5,所述池化层P2、池化层P3、卷积层C4、卷积层C5、卷积层C6、池化层P6的核大小均为3×3;
所述卷积层C1、卷积层C2、卷积层C3、卷积层C4、卷积层C5、卷积层C6 的步数均为1,所述池化层P2、池化层P3、池化层P6的步数均为2;
所述卷积层C1、卷积层C2、卷积层C4、卷积层C5、卷积层C6的pad值均为1,所述卷积层C3的pad值为2,所述池化层P2、池化层P3、池化层P6的pad值为0;
所述卷积层C1、卷积层C2的卷积核的个数均为96个,所述卷积层C3、卷积层C6的卷积核的个数均为256个,所述卷积层C4、卷积层C5的卷积核的个数均为384个。
其中,所述卷积层的步数指卷积层的核每次移动的步长,所述pad值是指在输入数据的周围是否添加一圈数据参与运算,pad值的大小也即添加数据的圈数。这样,能够提高计算模型的处理效率和速度,进而提高视频特征提取的效率。
在本公开一些实施例中,所述全连接层fc7、全连接层fc8的节点数目分别为4096、1000;且所述全连接层fc7采用dropout方式进行数据处理;所述分类器为softmax分类器。这里,所述的节点数目也可以理解为特征数目。所述dropout方式是通过随机开启一定数目的数据,而将剩下的数据丢弃,这样能够有效地防止数据的过拟合,进而提高特征提取的速度和效率。
可选地,所述卷积层C1、卷积层C2、卷积层C3、卷积层C4、卷积层C5、卷积层C6以及所述全连接层fc7均采用激活函数LEAKY RELU进行数据的激活处理。所述激活函数是将上一次的输出数据通过激活函数中的算法计算得到新的输出结果,将所述新的输出结果作为下一层的输入数据。同时,本公开所采用的激活函数LEAKY RELU相比于传统的激活函数RELU,在函数值小于零时,也具有一定的输出值,也即使得函数值小于零这一部分的数据也可以参与训练过程。这里,当函数值小于0时,输出值为输入值乘以一个系数a,所述系数a优选采用固定值。
在本公开可选的实施例中,所述进行转场帧的提取的方法为:
按照预设的算法,判断所述当前视频帧的前后两帧Y平面的灰度直方图之差是否大于预设的视频阈值,若当前视频帧的前后两帧Y平面的灰度直方图之差大于预设的视频阈值,则当前视频帧为转场帧,当前视频帧的前后两帧Y平面的灰度直方图之差小于或等于预设的视频阈值,当前视频帧不是转场帧;
算法公式如下:
Figure PCTCN2016103945-appb-000003
其中,所述Hp[i]、Hn[i]分别为当前视频帧的前一帧和后一帧的Y平面的灰度直方图对应的数值,T为预设的视频阈值。
对于输入的视频,考虑到视频中含有大量的帧信息,如果对视频的每一个帧都进行检测,则会非常耗时。故本公开首先对视频的重要帧进行提取。这里的重要帧是指能够很好的代表视频内容的帧。在视频中,一般关键帧可以表示视频的大致内容,但考虑到关键帧在多次编码后会发生变化,为了提高***的鲁棒性,本公开将视频内的转场帧作为处理对象,保证了鲁棒性的同时,去除了冗余信息,最终提高了视频检索的准确率和速度。
可选地,所述预设的视频阈值T的计算公式为:
T=width*height/8,其中,width和height分别为视频帧的宽度和高度。
可选的,本公开准备训练样本约130万张图像,对卷积神经网络进行训练,获得优化计算模型。然后利用模型对网络中现有的视频库进行特征提取并建库。其中训练模型中卷积层采用标准偏差为0.01的高斯分布进行初始化。LEAKY RELU函数的a参数为0.01。全连接层中参数采用标准偏差为0.002的高斯分布进行初始化。dropout模块的参数为0.5。训练过程采用反向传播算法(BP算法)进行参数的训练及更新。本公开中一共训练30万次迭代。
参照图2所示,为本公开提供的基于卷积神经网络的计算模型的结构示意图。所述基于卷积神经网络的计算模型包括:依次连接的卷积层C1、卷积层C2、池化层P2、卷积层C3、池化层P3、卷积层C4、卷积层C5、卷积层C6、池化层P6、全连接层fc7、全连接层fc8以及分类器SVM。所有卷积层以及全连接层fc7均通过激活函数LEAKY RELU进行数据的处理,使得前一层的数据能够向下一层传递。且所述全连接层fc7采用dropout方式进行数据处理。
参照图3所示,为本公开提供的基于卷积神经网络的视频检索***的实施例的结构示意图。所述基于卷积神经网络的视频检索***包括:构建模块301,训练模块302,数据库模块303以及检索模块304。
构建模块301,设置为根据视频的检索需求,构建基于卷积神经网络的用于分类的计算模型,将构建的计算模型发送给训练模块302,所述计算模型包括卷积层、池化层、全连接层以及分类器;
训练模块302,设置为接收所述构建模块301发送的计算模型,通过图像数据对所述计算模型进行训练,得到优化计算模型,去除优化计算模型中的分类器,得到提取计算模型;将提取计算模型发送给数据库模块303和检索模块304;
数据库模块303,设置为接收所述训练模块302发送的提取计算模型,对已有视频资源进行转场帧的提取,并通过所述提取计算模型提取得到转场帧的转场特征,建立转场特征数据库;
检索模块304,设置为接收所述训练模块302发送的提取计算模型,将待检索的视频进行转场帧的提取,通过提取计算模型提取得到视频的转场特征,将所述转场特征在所述数据库模块303中的转场特征数据库中进行检索,得到视频的检索结果。
由上述实施例可知,所述基于卷积神经网络的视频检索***通过所述构建 模块301建立基于卷积神经网络的用于分类的计算模型,通过所述训练模块302训练得到优化计算模型,去除优化计算模型中的分类器,得到提取计算模型;通过所述数据库模块303建立了基于转场帧的转场特征数据库,最后通过所述检索模块304实现视频的准确检索。所述基于卷积神经网络的视频检索***通过将视频内的转场帧作为数据处理的对象,不仅提高了检索过程的鲁棒性,而且去除了冗余信息;通过采用基于卷积神经网络的计算模型进行特征的提取,大大提高了视频检索的速度和准确性。
可选地,本公开采用卷积神经网络进行视频整体特征的提取与组合,利用kd tree算法完成检索,检索结果具有准确、快速等优点。
在一些可选的实施例中,所述基于卷积神经网络的用于分类的计算模型包括依次连接的卷积层C1、卷积层C2、池化层P2、卷积层C3、池化层P3、卷积层C4、卷积层C5、卷积层C6、池化层P6、全连接层fc7、全连接层fc8以及分类器。
在另一些可选的实施例中,所述6个卷积层的核大小均不大于5×5,且所述卷积层C1的核大小为3×3,卷积层C2的核大小为3×3。
在一些可选地实施例中,所述卷积层C3的核大小为5×5,所述池化层P2、池化层P3、卷积层C4、卷积层C5、卷积层C6、池化层P6的核大小均为3×3;
所述卷积层C1、卷积层C2、卷积层C3、卷积层C4、卷积层C5、卷积层C6的步数均为1,所述池化层P2、池化层P3、池化层P6的步数均为2;
所述卷积层C1、卷积层C2、卷积层C4、卷积层C5、卷积层C6的pad值均为1,所述卷积层C3的pad值为2,所述池化层P2、池化层P3、池化层P6的pad值为0;
所述卷积层C1、卷积层C2的卷积核的个数均为96个,所述卷积层C3、卷积层C6的卷积核的个数均为256个,所述卷积层C4、卷积层C5的卷积核的个数均为384个。
在另一些可选地实施例中,所述全连接层fc7、全连接层fc8的节点数目分别为4096、1000;且所述全连接层fc7采用dropout方式进行数据处理;所述分类器为softmax分类器。
可选的,所述卷积层C1、卷积层C2、卷积层C3、卷积层C4、卷积层C5、卷积层C6以及所述全连接层fc7均采用激活函数LEAKY RELU进行数据的激活处理。
可选的,所述进行转场帧的提取的方法为:
按照预设的算法,判断所述当前视频帧的前后两帧Y平面的灰度直方图之差是否大于预设的视频阈值,若当前视频帧的前后两帧Y平面的灰度直方图之差大于预设的视频阈值,则当前视频帧为转场帧,若当前视频帧的前后两帧Y平面的灰度直方图之差小于或等于预设的视频阈值,当前视频帧不是转场帧;
算法公式如下:
Figure PCTCN2016103945-appb-000004
其中,所述Hp[i]、Hn[i]分别为当前视频帧的前一帧和后一帧的Y平面的灰度直方图对应的数值,T为预设的视频阈值。
可选地,所述预设的视频阈值T的计算公式为:
T=width*height/8,其中,width和height分别为视频帧的宽度和高度。
在一些可选的实施例中,本公开所述的基于卷积神经网络的视频检索***应用于手机、电脑、平板、智能电视等各类智能终端设备中,或者,还可以用于各类服务器中,例如网页搜索的服务器中。同时,将本公开所述基于卷积神经网络的视频检索***中的部分模块分别作为终端、服务器的功能模块也属于本公开保护的范围。
本公开的实施例还提供一种包含计算机可执行指令的非暂态存储介质,所 述计算机可执行指令在由计算机处理器执行时用于执行上述实施例的基于卷积神经网络的视频检索方法。
本公开的实施例还提供了一种计算机程序产品,所述计算机程序产品包括存储在非暂态计算机可读存储介质上的计算机程序,所述计算机程序包括程序指令,当所述程序指令被计算机执行时,使所述计算机执行上述实施例的基于卷积神经网络的视频检索方法。
图4为本申请实施例提供的一种电子设备的硬件结构示意图,如图5所示,该电子设备包括:一个或多个处理器501以及存储器502,图4中以一个处理器501为例。
电子设备还可以包括:输入装置503和输出装置504。
电子设备中的处理器501、存储器502、输入装置503和输出装置504可以通过总线或者其他方式连接,图4中以通过总线连接为例。
存储器502作为一种非易失性计算机可读存储介质,可用于存储非易失性软件程序、非易失性计算机可执行程序以及模块,如本申请实施例中的基于卷积神经网络的视频检索方法对应的程序指令/模块。处理器501通过运行存储在存储器502中的非易失性软件程序、指令以及模块,从而执行服务器的各种功能应用以及数据处理,即实现上述实施例的基于卷积神经网络的视频检索方法。
存储器502可以包括存储程序区和存储数据区,其中,存储程序区可存储操作***、至少一个功能所需要的应用程序;存储数据区可存储根据拍摄照片的方法使用所创建的数据等。此外,存储器502可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他非易失性固态存储器件。在一些实施例中,存储器502可选包括相对于处理器501远程设置的存储器。
输入装置503可用于接收输入的数字或字符信息,以及用户设置以及功能控制有关的键信号输入。输出装置504可包括显示屏等显示设备。
所述一个或者多个模块存储在所述存储器502中,当被所述一个或者多个处理器501执行时,执行上述任意方法实施例中的基于卷积神经网络的视频检索方法。
上述产品可执行本公开实施例所提供的方法,具备执行方法相应的功能模块和有益效果。未在本实施例中详尽描述的技术细节,可参见本公开实施例所提供的方法。
本公开的实施例的电子设备以多种形式存在,包括但不限于:
(1)移动通信设备:这类设备的特点是具备移动通信功能,并且以提供话音、数据通信为主要目标。这类终端包括:智能手机(例如iPhone)、多媒体手机、功能性手机,以及低端手机等。
(2)超移动个人计算机设备:这类设备属于个人计算机的范畴,有计算和处理功能,一般也具备移动上网特性。这类终端包括:PDA、MID和UMPC设备等,例如iPad。
(3)便携式娱乐设备:这类设备可以显示和播放多媒体内容。该类设备包括:音频、视频播放器(例如iPod),掌上游戏机,电子书,以及智能玩具和便携式车载导航设备。
(4)服务器:提供计算服务的设备,服务器的构成包括处理器、硬盘、内存、***总线等,服务器和通用的计算机架构类似,但是由于需要提供高可靠的服务,因此在处理能力、稳定性、可靠性、安全性、可扩展性、可管理性等方面要求较高。
(5)其他具有数据交互功能的电子设备。
以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到实施方式可借助软件加通用硬件平台的方式来实现,当然也可以通过硬件。基于这样的理解,上述技术方案本质上或者说对相关技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在计算机可读存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行实施例或者实施例的某些部分所述的方法。
注意,上述仅为本公开的较佳实施例及所运用技术原理。本领域技术人员会理解,本公开不限于这里所述的特定实施例,对本领域技术人员来说能够进行各种明显的变化、重新调整和替代而不会脱离本公开的保护范围。
工业实用性
本公开的基于卷积神经网络的视频检索方法及***提高了检索过程的鲁棒性,去除了冗余信息,提高了视频检索的速度和准确性。

Claims (19)

  1. 一种基于卷积神经网络的视频检索方法,应用于电子设备,包括:
    根据视频的检索需求,构建基于卷积神经网络的用于分类的计算模型,所述计算模型包括卷积层、池化层、全连接层以及分类器;
    通过图像数据对所述计算模型进行训练,得到优化计算模型,去除优化计算模型中的分类器,得到提取计算模型;
    对已有视频资源进行转场帧的提取,并通过所述提取计算模型提取得到转场帧的转场特征,建立转场特征数据库;以及
    将待检索的视频进行转场帧的提取,通过提取计算模型提取得到视频的转场特征,将所述转场特征在所述转场特征数据库中进行检索,得到视频的检索结果。
  2. 根据权利要求1所述的方法,其中,所述基于卷积神经网络的用于分类的计算模型包括依次连接的卷积层C1、卷积层C2、池化层P2、卷积层C3、池化层P3、卷积层C4、卷积层C5、卷积层C6、池化层P6、全连接层fc7、全连接层fc8以及分类器。
  3. 根据权利要求2所述的方法,其中,所述6个卷积层的核大小均不大于5×5,且所述卷积层C1的核大小为3×3,卷积层C2的核大小为3×3。
  4. 根据权利要求3所述的方法,其中,所述卷积层C3的核大小为5×5,所述池化层P2、池化层P3、卷积层C4、卷积层C5、卷积层C6、池化层P6的核大小均为3×3;
    所述卷积层C1、卷积层C2、卷积层C3、卷积层C4、卷积层C5、卷积层C6的步数均为1,所述池化层P2、池化层P3、池化层P6的步数均为2;
    所述卷积层C1、卷积层C2、卷积层C4、卷积层C5、卷积层C6的pad值均为1,所述卷积层C3的pad值为2,所述池化层P2、池化层P3、池化层P6的pad值为0;
    所述卷积层C1、卷积层C2的卷积核的个数均为96个,所述卷积层C3、卷积层C6的卷积核的个数均为256个,所述卷积层C4、卷积层C5的卷积核的个数均为384个。
  5. 根据权利要求2所述的方法,其中,所述全连接层fc7、全连接层fc8的节点数目分别为4096、1000;且所述全连接层fc7采用dropout方式进行数据处理;
    所述分类器为softmax分类器。
  6. 根据权利要求2所述的方法,其中,所述卷积层C1、卷积层C2、卷积层C3、卷积层C4、卷积层C5、卷积层C6以及所述全连接层fc7均采用激活函数LEAKY RELU进行数据的激活处理。
  7. 根据权利要求1所述的方法,其中,所述进行转场帧的提取的方法为:
    按照预设的算法,判断所述当前视频帧的前后两帧Y平面的灰度直方图之差是否大于预设的视频阈值,若当前视频帧的前后两帧Y平面的灰度直方图之差大于预设的视频阈值,则当前视频帧为转场帧,若当前视频帧的前后两帧Y平面的灰度直方图之差小于等于预设的视频阈值,当前视频帧不是转场帧;
    算法公式如下:
    Figure PCTCN2016103945-appb-100001
    其中,所述Hp[i]、Hn[i]分别为当前视频帧的前一帧和后一帧的Y平面的灰度直方图对应的数值,T为预设的视频阈值。
  8. 根据权利要求7所述的方法,其中,所述预设的视频阈值T的计算公式为:
    T=width*height/8,其中,width和height分别为视频帧的宽度和高度。
  9. 一种基于卷积神经网络的视频检索***,包括:
    构建模块,设置为根据视频的检索需求,构建基于卷积神经网络的用于分 类的计算模型,将构建的计算模型发送给训练模块,所述计算模型包括卷积层、池化层、全连接层以及分类器;
    训练模块,设置为接收所述构建模块发送的计算模型,通过图像数据对所述计算模型进行训练,得到优化计算模型,去除优化计算模型中的分类器,得到提取计算模型;将提取计算模型发送给数据库模块和检索模块;
    数据库模块,设置为接收所述训练模块发送的提取计算模型,对已有视频资源进行转场帧的提取,并通过所述提取计算模型提取得到转场帧的转场特征,建立转场特征数据库;以及
    检索模块,设置为接收所述训练模块发送的提取计算模型,将待检索的视频进行转场帧的提取,通过提取计算模型提取得到视频的转场特征,将所述转场特征在所述数据库模块中的转场特征数据库中进行检索,得到视频的检索结果。
  10. 根据权利要求9所述的***,其中,所述基于卷积神经网络的用于分类的计算模型包括依次连接的卷积层C1、卷积层C2、池化层P2、卷积层C3、池化层P3、卷积层C4、卷积层C5、卷积层C6、池化层P6、全连接层fc7、全连接层fc8以及分类器。
  11. 根据权利要求10所述的***,其中,所述6个卷积层的核大小均不大于5×5,且所述卷积层C1的核大小为3×3,卷积层C2的核大小为3×3。
  12. 根据权利要求11所述的***,其中,所述卷积层C3的核大小为5×5,所述池化层P2、池化层P3、卷积层C4、卷积层C5、卷积层C6、池化层P6的核大小均为3×3;
    所述卷积层C1、卷积层C2、卷积层C3、卷积层C4、卷积层C5、卷积层C6的步数均为1,所述池化层P2、池化层P3、池化层P6的步数均为2;
    所述卷积层C1、卷积层C2、卷积层C4、卷积层C5、卷积层C6的pad值均为1,所述卷积层C3的pad值为2,所述池化层P2、池化层P3、池化层P6的pad值为0;
    所述卷积层C1、卷积层C2的卷积核的个数均为96个,所述卷积层C3、卷积层C6的卷积核的个数均为256个,所述卷积层C4、卷积层C5的卷积核的个数均为384个。
  13. 根据权利要求10所述的***,其中,所述全连接层fc7、全连接层fc8的节点数目分别为4096、1000;且所述全连接层fc7采用dropout方式进行数据处理;
    所述分类器为softmax分类器。
  14. 根据权利要求10所述的***,其中,所述卷积层C1、卷积层C2、卷积层C3、卷积层C4、卷积层C5、卷积层C6以及所述全连接层fc7均采用激活函数LEAKY RELU进行数据的激活处理。
  15. 根据权利要求9所述的***,其中,所述进行转场帧的提取的方法为:
    按照预设的算法,判断所述当前视频帧的前后两帧Y平面的灰度直方图之差是否大于预设的视频阈值,若当前视频帧的前后两帧Y平面的灰度直方图之差大于预设的视频阈值,则当前视频帧为转场帧,若当前视频帧的前后两帧Y平面的灰度直方图之差小于等于预设的视频阈值,当前视频帧不是转场帧;
    算法公式如下:
    Figure PCTCN2016103945-appb-100002
    其中,所述Hp[i]、Hn[i]分别为当前视频帧的前一帧和后一帧的Y平面的灰度直方图对应的数值,T为预设的视频阈值。
  16. 根据权利要求15所述的***,其中,所述预设的视频阈值T的计算公式为:
    T=width*height/8,其中,width和height分别为视频帧的宽度和高度。
  17. 一种非暂态存储介质,存储有计算机可执行指令,所述计算机可执行指令设置为执行权利要求1至8任一项所述的基于卷积神经网络的视频检索方法。
  18. 一种计算机程序产品,所述计算机程序产品包括存储在非暂态计算机可读存储介质上的计算机程序,所述计算机程序包括程序指令,当所述程序指令被计算机执行时,使所述计算机执行权利要求1至8任一项所述的基于卷积神经网络的视频检索方法。
  19. 一种电子设备,包括至少一个处理器和与所述至少一个处理器通信连接的存储器,所述存储器用于存储可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行时,使所述至少一个处理器执行权利要求1至8任一项所述的基于卷积神经网络的视频检索方法。
PCT/CN2016/103945 2016-04-15 2016-10-31 基于卷积神经网络的视频检索方法及*** WO2017177661A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610237628.2A CN105930402A (zh) 2016-04-15 2016-04-15 基于卷积神经网络的视频检索方法及***
CN201610237628.2 2016-04-15

Publications (1)

Publication Number Publication Date
WO2017177661A1 true WO2017177661A1 (zh) 2017-10-19

Family

ID=56839123

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/103945 WO2017177661A1 (zh) 2016-04-15 2016-10-31 基于卷积神经网络的视频检索方法及***

Country Status (2)

Country Link
CN (1) CN105930402A (zh)
WO (1) WO2017177661A1 (zh)

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108804581A (zh) * 2018-05-24 2018-11-13 广州中大百迅信息技术有限公司 一种基于深度学习的同类物体检索方法及***
CN109543526A (zh) * 2018-10-19 2019-03-29 谢飞 基于深度差异性特征的真假面瘫识别***
CN109710804A (zh) * 2019-01-16 2019-05-03 信阳师范学院 一种教学视频图像知识点降维分析方法
CN109726798A (zh) * 2018-12-27 2019-05-07 北京灵汐科技有限公司 一种数据处理方法及装置
CN109857893A (zh) * 2019-01-16 2019-06-07 平安科技(深圳)有限公司 图片检索方法、装置、计算机设备及存储介质
CN110084739A (zh) * 2019-03-28 2019-08-02 东南大学 一种基于cnn的画质增强算法的fpga并行加速***
CN110148466A (zh) * 2019-05-15 2019-08-20 东北大学 一种基于迁移学习的心冲击信号房颤计算机辅助诊断方法
CN110321958A (zh) * 2019-07-08 2019-10-11 北京字节跳动网络技术有限公司 神经网络模型的训练方法、视频相似度确定方法
CN110348349A (zh) * 2019-07-01 2019-10-18 河南牧业经济学院 一种收集、分析猪行为视频数据的方法和***
CN110347600A (zh) * 2019-07-11 2019-10-18 中国人民解放军陆军工程大学 面向卷积神经网络的变异覆盖测试方法及计算机存储介质
CN110532959A (zh) * 2019-08-30 2019-12-03 大连海事大学 基于双通道三维卷积神经网络的实时暴力行为检测***
CN110674925A (zh) * 2019-08-29 2020-01-10 厦门大学 基于3d卷积神经网络的无参考vr视频质量评价方法
CN110806717A (zh) * 2019-11-21 2020-02-18 山东大齐通信电子有限公司 一种矿用通信控制***的集控台及矿用通信控制***
CN111353391A (zh) * 2020-02-17 2020-06-30 西安电子科技大学 雷达干扰效果评估方法、装置、电子设备及其存储介质
CN111476101A (zh) * 2020-03-11 2020-07-31 咪咕文化科技有限公司 视频镜头切换检测方法及装置、计算机可读存储介质
CN111831852A (zh) * 2020-07-07 2020-10-27 北京灵汐科技有限公司 一种视频检索方法、装置、设备及存储介质
CN112235434A (zh) * 2020-10-16 2021-01-15 重庆理工大学 融合k-means及其胶囊网络的DGA网络域名检测识别***
CN112966813A (zh) * 2021-03-15 2021-06-15 神思电子技术股份有限公司 一种卷积神经网络输入层装置及其工作方法
CN117269992A (zh) * 2023-08-29 2023-12-22 中国民航科学技术研究院 基于卷积神经网络的卫星导航多径信号检测方法及***

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105930402A (zh) * 2016-04-15 2016-09-07 乐视控股(北京)有限公司 基于卷积神经网络的视频检索方法及***
CN106682108B (zh) * 2016-12-06 2022-07-12 浙江大学 一种基于多模态卷积神经网络的视频检索方法
CN106886768A (zh) * 2017-03-02 2017-06-23 杭州当虹科技有限公司 一种基于深度学习的视频指纹算法
CN107527010B (zh) * 2017-07-13 2020-07-10 央视国际网络无锡有限公司 一种根据局部特征和运动矢量抽取视频基因的方法
CN107766838B (zh) * 2017-11-08 2021-06-01 央视国际网络无锡有限公司 一种视频场景切换检测方法
CN107766852A (zh) * 2017-12-06 2018-03-06 电子科技大学 一种基于卷积神经网络的人机鼠标轨迹检测方法
CN108665769B (zh) * 2018-05-11 2021-04-06 深圳市鹰硕技术有限公司 基于卷积神经网络的网络教学方法以及装置
CN108882016A (zh) * 2018-07-31 2018-11-23 成都华栖云科技有限公司 一种视频基因数据提取的方法及***
CN109815364B (zh) * 2019-01-18 2020-01-14 上海极链网络科技有限公司 一种海量视频特征提取、存储和检索方法及***
CN111723617B (zh) * 2019-03-20 2023-10-27 顺丰科技有限公司 动作识别的方法、装置、设备及存储介质
CN113139095B (zh) * 2021-05-06 2024-07-12 北京百度网讯科技有限公司 视频检索方法及装置、计算机设备和介质
CN113254703A (zh) * 2021-05-12 2021-08-13 北京百度网讯科技有限公司 视频匹配方法、视频处理方法、装置、电子设备及介质
CN113536031A (zh) * 2021-06-17 2021-10-22 北京百度网讯科技有限公司 视频搜索的方法、装置、电子设备及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100302269A1 (en) * 2009-05-26 2010-12-02 Kabushiki Kaisha Toshiba Image processing apparatus and image processing method
CN104156464A (zh) * 2014-08-20 2014-11-19 中国科学院重庆绿色智能技术研究院 基于微视频特征数据库的微视频检索方法及装置
CN104980625A (zh) * 2015-06-19 2015-10-14 新奥特(北京)视频技术有限公司 视频转场检测的方法和装置
CN105930402A (zh) * 2016-04-15 2016-09-07 乐视控股(北京)有限公司 基于卷积神经网络的视频检索方法及***

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104657748A (zh) * 2015-02-06 2015-05-27 中国石油大学(华东) 一种基于卷积神经网络的车型识别方法
CN104850836B (zh) * 2015-05-15 2018-04-10 浙江大学 基于深度卷积神经网络的害虫图像自动识别方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100302269A1 (en) * 2009-05-26 2010-12-02 Kabushiki Kaisha Toshiba Image processing apparatus and image processing method
CN104156464A (zh) * 2014-08-20 2014-11-19 中国科学院重庆绿色智能技术研究院 基于微视频特征数据库的微视频检索方法及装置
CN104980625A (zh) * 2015-06-19 2015-10-14 新奥特(北京)视频技术有限公司 视频转场检测的方法和装置
CN105930402A (zh) * 2016-04-15 2016-09-07 乐视控股(北京)有限公司 基于卷积神经网络的视频检索方法及***

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108804581A (zh) * 2018-05-24 2018-11-13 广州中大百迅信息技术有限公司 一种基于深度学习的同类物体检索方法及***
CN109543526A (zh) * 2018-10-19 2019-03-29 谢飞 基于深度差异性特征的真假面瘫识别***
CN109726798A (zh) * 2018-12-27 2019-05-07 北京灵汐科技有限公司 一种数据处理方法及装置
CN109857893A (zh) * 2019-01-16 2019-06-07 平安科技(深圳)有限公司 图片检索方法、装置、计算机设备及存储介质
CN109710804A (zh) * 2019-01-16 2019-05-03 信阳师范学院 一种教学视频图像知识点降维分析方法
CN109710804B (zh) * 2019-01-16 2022-10-18 信阳师范学院 一种教学视频图像知识点降维分析方法
CN110084739A (zh) * 2019-03-28 2019-08-02 东南大学 一种基于cnn的画质增强算法的fpga并行加速***
CN110148466A (zh) * 2019-05-15 2019-08-20 东北大学 一种基于迁移学习的心冲击信号房颤计算机辅助诊断方法
CN110348349A (zh) * 2019-07-01 2019-10-18 河南牧业经济学院 一种收集、分析猪行为视频数据的方法和***
CN110321958A (zh) * 2019-07-08 2019-10-11 北京字节跳动网络技术有限公司 神经网络模型的训练方法、视频相似度确定方法
CN110347600A (zh) * 2019-07-11 2019-10-18 中国人民解放军陆军工程大学 面向卷积神经网络的变异覆盖测试方法及计算机存储介质
CN110347600B (zh) * 2019-07-11 2023-04-07 中国人民解放军陆军工程大学 面向卷积神经网络的变异覆盖测试方法及计算机存储介质
CN110674925A (zh) * 2019-08-29 2020-01-10 厦门大学 基于3d卷积神经网络的无参考vr视频质量评价方法
CN110674925B (zh) * 2019-08-29 2023-04-18 厦门大学 基于3d卷积神经网络的无参考vr视频质量评价方法
CN110532959B (zh) * 2019-08-30 2022-10-14 大连海事大学 基于双通道三维卷积神经网络的实时暴力行为检测***
CN110532959A (zh) * 2019-08-30 2019-12-03 大连海事大学 基于双通道三维卷积神经网络的实时暴力行为检测***
CN110806717A (zh) * 2019-11-21 2020-02-18 山东大齐通信电子有限公司 一种矿用通信控制***的集控台及矿用通信控制***
CN111353391A (zh) * 2020-02-17 2020-06-30 西安电子科技大学 雷达干扰效果评估方法、装置、电子设备及其存储介质
CN111476101A (zh) * 2020-03-11 2020-07-31 咪咕文化科技有限公司 视频镜头切换检测方法及装置、计算机可读存储介质
CN111831852A (zh) * 2020-07-07 2020-10-27 北京灵汐科技有限公司 一种视频检索方法、装置、设备及存储介质
CN111831852B (zh) * 2020-07-07 2023-11-24 北京灵汐科技有限公司 一种视频检索方法、装置、设备及存储介质
CN112235434B (zh) * 2020-10-16 2021-10-26 重庆理工大学 融合k-means及其胶囊网络的DGA网络域名检测识别***
CN112235434A (zh) * 2020-10-16 2021-01-15 重庆理工大学 融合k-means及其胶囊网络的DGA网络域名检测识别***
CN112966813A (zh) * 2021-03-15 2021-06-15 神思电子技术股份有限公司 一种卷积神经网络输入层装置及其工作方法
CN117269992A (zh) * 2023-08-29 2023-12-22 中国民航科学技术研究院 基于卷积神经网络的卫星导航多径信号检测方法及***
CN117269992B (zh) * 2023-08-29 2024-04-19 中国民航科学技术研究院 基于卷积神经网络的卫星导航多径信号检测方法及***

Also Published As

Publication number Publication date
CN105930402A (zh) 2016-09-07

Similar Documents

Publication Publication Date Title
WO2017177661A1 (zh) 基于卷积神经网络的视频检索方法及***
TWI737006B (zh) 一種跨模態訊息檢索方法、裝置和儲存介質
WO2017166586A1 (zh) 基于卷积神经网络的图片鉴别方法、***和电子设备
WO2017045443A1 (zh) 一种图像检索方法及***
WO2019052403A1 (zh) 图像文本匹配模型的训练方法、双向搜索方法及相关装置
WO2017206400A1 (zh) 图像处理方法、装置及电子设备
WO2021237570A1 (zh) 影像审核方法及装置、设备、存储介质
CN112396106B (zh) 内容识别方法、内容识别模型训练方法及存储介质
CN113378770B (zh) 手势识别方法、装置、设备、存储介质
US20230057010A1 (en) Term weight generation method, apparatus, device and medium
TW201633181A (zh) 用於經非同步脈衝調制的取樣信號的事件驅動型時間迴旋
CN113626612A (zh) 一种基于知识图谱推理的预测方法和***
CN113887615A (zh) 图像处理方法、装置、设备和介质
JP2023530796A (ja) 認識モデルトレーニング方法、認識方法、装置、電子デバイス、記憶媒体及びコンピュータプログラム
WO2022227760A1 (zh) 图像检索方法、装置、电子设备及计算机可读存储介质
CN115775349A (zh) 基于多模态融合的假新闻检测方法和装置
CN112861474B (zh) 一种信息标注方法、装置、设备及计算机可读存储介质
CN113360683B (zh) 训练跨模态检索模型的方法以及跨模态检索方法和装置
CN117786058A (zh) 一种多模态大模型知识迁移框架的构建方法
US20170161322A1 (en) Method and electronic device for searching resource
Panda et al. Heritage app: annotating images on mobile phones
KR102595384B1 (ko) 문서 유사도 학습에 기반한 딥러닝 모델의 전이 학습 방법 및 시스템
KR20240052055A (ko) 교차-모달 검색 방법 및 관련 디바이스
CN113139490B (zh) 一种图像特征匹配方法、装置、计算机设备及存储介质
WO2017143979A1 (zh) 图像的检索方法及装置

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16898474

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 16898474

Country of ref document: EP

Kind code of ref document: A1