CN110209880A

CN110209880A - Video content retrieval method, Video content retrieval device and storage medium

Info

Publication number: CN110209880A
Application number: CN201811009469.6A
Authority: CN
Inventors: 孙祥学
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2018-08-31
Filing date: 2018-08-31
Publication date: 2019-09-06

Abstract

The present invention provides a kind of Video content retrieval method comprising: obtain the corresponding face retrieval information of video content, pattern retrieval information, sound retrieval information and character search information；The audio frame of video content is extracted, and extracts the video image of video content by setting time interval；Video image and audio frame are detected using default detection algorithm, to obtain face information, text information, pattern-information and the acoustic information of video content；Obtain face analog information, text analog information, pattern analog information, sound analog information；According to face analog information, text analog information, pattern analog information and sound analog information, content retrieval report is generated.The present invention also provides a kind of Video content retrieval device, the present invention simultaneously retrieves face information, text information, pattern-information and the acoustic information in video content, improves the accuracy rate of Video content retrieval and reduces the cost of Video content retrieval.

Description

Video content retrieval method, Video content retrieval device and storage medium

Technical field

The present invention relates to data processing fields, fill more particularly to a kind of Video content retrieval method, Video content retrieval It sets and storage medium.

Background technique

With the development of society, requirement of the people to various shared resources is higher and higher, such as share on the internet various Video resource or literal resource.But in order to avoid illegal video resource or literal resource are propagated on the internet, subnetwork The video resource or literal resource progress Content Advisor that the resource provisioning chamber of commerce uploads client.

Wherein machine can be used to carry out Text region operation, overall audit work to upload word content automatically for literal resource Work amount is lower.Video resource then needs manually to identify picture material therein and sound-content, due to now illegal Molecule can be inserted into illegal image or illegal sound among video resource, or in video resource picture material or sound in Appearance is modified, and the modifying point time of occurrence in these video resources is short or hiding is stronger, and the artificial of video resource is caused to examine The workload and work difficulty of core greatly increase, and manual examination and verification are easy to appear careless omission.Therefore existing Video content retrieval side The higher cost and accuracy rate of method are lower.

Summary of the invention

The embodiment of the present invention provide it is a kind of retrieval cost is relatively low and the higher video content of the accuracy rate of Video content retrieval Search method, Video content retrieval device and storage medium；To solve existing Video content retrieval method and video content inspection The lower technical problem of the higher cost and accuracy rate of rope device.

The embodiment of the present invention provides a kind of Video content retrieval method comprising:

Obtain the corresponding face retrieval information of video content, pattern retrieval information, sound retrieval information and character search Information；

The audio frame of the video content is extracted, and extracts the video image of the video content by setting time interval；

The video image and the audio frame are detected using default detection algorithm, to obtain the people of the video content Face information, text information, pattern-information and acoustic information；

It obtains in the face information of the video content and the face analog information of the face retrieval information, the video The pattern-information and the pattern of the text information of appearance and the text analog information of the character search information, the video content Retrieve the pattern analog information of information, the acoustic information letter similar to the sound of the sound retrieval information of the video content Breath；And

According to the face analog information, the text analog information, the pattern analog information and the sound phase Like information, content retrieval report is generated.

The embodiment of the present invention also provides a kind of Video content retrieval device comprising:

Data obtaining module is retrieved, for obtaining the corresponding face retrieval information of video content, pattern retrieval information, sound Retrieve information and character search information；

Content extraction module extracts the view for extracting the audio frame of the video content, and by setting time interval The video image of frequency content；

Content detection module, for detecting the video image and the audio frame using default detection algorithm, to obtain Take face information, text information, pattern-information and the acoustic information of the video content；

Analog information obtains module, for obtaining the face information of the video content and the people of the face retrieval information In face analog information, the text information of the video content and the text analog information of the character search information, the video The acoustic information and the sound of the pattern-information of appearance and the pattern analog information of pattern retrieval information, the video content Retrieve the sound analog information of information；And

Content retrieval module, for according to the face analog information, the text analog information, the similar letter of the pattern Breath and the sound analog information generate content retrieval report.

The embodiment of the present invention also provides a kind of storage medium, is stored with processor-executable instruction, described instruction by One or more processors load, to execute any of the above-described Video content retrieval method.

Compared to the prior art, Video content retrieval method of the invention, Video content retrieval device and storage medium are same When face information, text information, pattern-information and the acoustic information in video content are retrieved, improve video content The accuracy rate of retrieval and the cost for reducing Video content retrieval；Effective solution existing Video content retrieval method and view The lower technical problem of the higher cost and accuracy rate of frequency content search apparatus.

Detailed description of the invention

Fig. 1 is the flow chart of the first embodiment of Video content retrieval method of the invention；

Fig. 2 is the flow chart of the second embodiment of Video content retrieval method of the invention；

Fig. 3 is the face analog information in the step S206 of the second embodiment of Video content retrieval method of the invention Obtain flow chart；

Fig. 4 is the text analog information in the step S206 of the second embodiment of Video content retrieval method of the invention Obtain flow chart；

Fig. 5 is the structural schematic diagram of the first embodiment of Video content retrieval device of the invention；

Fig. 6 is the structural schematic diagram of the second embodiment of Video content retrieval device of the invention；

Fig. 7 is the structural schematic diagram of the content detection module of the second embodiment of Video content retrieval device of the invention；

Fig. 8 is that the analog information of the second embodiment of Video content retrieval device of the invention obtains the structural representation of module Figure；

Fig. 9 a is the corresponding clothes of specific embodiment of Video content retrieval method and Video content retrieval device of the invention The illustrative view of functional configuration at business device end；

Fig. 9 b is in the video of the specific embodiment of Video content retrieval method and Video content retrieval device of the invention Hold retrieval flow figure；

Figure 10 is the working environment structural schematic diagram of the electronic equipment where Video content retrieval device of the invention.

Specific embodiment

Schema is please referred to, wherein identical component symbol represents identical component, the principle of the present invention is to implement one It is illustrated in computing environment appropriate.The following description be based on illustrated by the specific embodiment of the invention, should not be by It is considered as the limitation present invention other specific embodiments not detailed herein.

In the following description, specific embodiments of the present invention will refer to the operation as performed by one or multi-section computer The step of and symbol illustrate, unless otherwise stating clearly.Therefore, these steps and operation be will appreciate that, mentioned for several times wherein having It include by representing with the computer disposal list of the electronic signal of the data in a structuring pattern to be executed by computer Member is manipulated.At this manipulation transforms data or the position being maintained in the memory system of the computer, it can match again Set or in addition change in a manner familiar to those skilled in the art the running of the computer.The maintained data knot of the data Structure is the provider location of the memory, has the specific feature as defined in the data format.But the principle of the invention is with above-mentioned Text illustrates, is not represented as a kind of limitation, those skilled in the art will appreciate that plurality of step as described below and Operation also may be implemented in hardware.

Video content retrieval method and Video content retrieval device of the invention may be provided in any electronic equipment, For carrying out search operaqtion to video content in terms of face information, text information, pattern-information and acoustic information four, from And effective accuracy for improving corresponding content search report.The electronic equipment includes but is not limited to wearable device, wears Equipment, medical treatment & health platform, personal computer, server computer, hand-held or laptop devices, mobile device (for example are moved Mobile phone, personal digital assistant (PDA, Personal Digital Assistant), media player etc.), multiprocessor System, consumer electronic devices, minicomputer, mainframe computer, the distributed computing including above-mentioned arbitrary system or equipment Environment, etc..The electronic equipment is preferably the mobile terminal or fixed terminal that search operaqtion is carried out to video content, and the movement is whole End or fixed terminal can examine respectively face information, text information, pattern-information and the acoustic information in video content Rope operation, to improve the accuracy rate of Video content retrieval, reduces the cost of Video content retrieval.

Fig. 1 is please referred to, Fig. 1 is the flow chart of the first embodiment of Video content retrieval method of the invention.The present embodiment Video content retrieval method above-mentioned electronic equipment can be used to be implemented, the Video content retrieval method packet of the present embodiment It includes:

Step S101, obtain the corresponding face retrieval information of video content, pattern retrieval information, sound retrieval information and Character search information；

Step S102 extracts the audio frame of video content, and the video image of video content is extracted by setting time interval；

Step S103 detects video image and audio frame using default detection algorithm, to obtain the face of video content Information, text information, pattern-information and acoustic information；

Step S104 obtains the face information of video content and face analog information, the video content of face retrieval information Text information and the text analog information of character search information, the pattern-information of video content and pattern retrieval information pattern The sound analog information of analog information, the acoustic information of video content and sound retrieval information；

Step S105, according to face analog information, text analog information, pattern analog information and sound analog information, Generate content retrieval report.

The following detailed description of the detailed process of each step of the Video content retrieval method of the present embodiment.

In step s101, Video content retrieval device obtains the corresponding face retrieval information of video content, pattern retrieval Information, sound retrieval information and character search information.

Before Video content retrieval device carries out search operaqtion to video content, a content-data to be retrieved can be first created Library, the content data base to be retrieved may include face retrieval information, pattern retrieval information, sound retrieval information and character search Information etc..

Here face retrieval information is the face information for needing to retrieve, such as the face information of Zhang San.Pattern retrieval letter The pattern-information that breath is retrieved for needs, such as the flag information of illegal organization.Sound retrieval information is the knowledge of pre-set voice Other sensitive word, such as the name voice of Zhang San.Character search information is pre-set Text region sensitive word, such as the name of Zhang San Word text etc..

Video content retrieval device can retrieve information according to the face retrieval information of acquisition, pattern in this way, sound retrieval is believed The information creatings content data base to be retrieved such as breath and character search information.

In step s 102, after having created content data base to be retrieved, Video content retrieval device can be mentioned user The video content of confession carries out contents extraction operation；Specifically, video content can be separated into video frame by Video content retrieval device And audio frame, the video image of video content is then extracted by setting time interval (such as 500ms).The audio frame is for mentioning The acoustic information in video content is taken, which is used to extract face information, text information and the figure in video content Case information.

In step s 103, the video figure that Video content retrieval device is obtained using default detection algorithm detecting step S102 Picture and video frame, to obtain the face information in video content, text information, pattern-information and acoustic information.

Here default detection algorithm can be literary for the face neural network of detection face information, the OCR of detection text information The pattern nerve of word recognizer (Optical Character Recognition, optical character identification), detection pattern information Network and ASR speech recognition algorithm (Automatic Speech Recognition, the automatic speech knowledge for detecting acoustic information Not) etc..

Video content device can be believed by the face information in above-mentioned default detection algorithm acquisition video content, text in this way Breath, pattern-information and acoustic information.

In step S104, what the face information and step S101 that Video content retrieval device obtains step S103 were set Face retrieval information is compared, to obtain the face information of video content and the face analog information of face retrieval information. The face analog information includes that the corresponding relationship of face information and face retrieval information, face information are believed with corresponding face retrieval Human face similarity degree between breath.

The character search of Video content retrieval device obtains step S103 simultaneously text information and step S101 setting Information is compared, to obtain the text information of video content and the text analog information of character search information.The text phase It include the corresponding relationship of text information Yu character search information like information.

In addition the pattern for pattern-information and step the S101 setting that Video content retrieval device obtains step S103 is retrieved Information is compared, to obtain the pattern-information of video content and the pattern analog information of pattern retrieval information.The pattern phase It include the corresponding relationship of pattern-information and pattern retrieval information like information.

Furthermore the sound retrieval for acoustic information and step the S101 setting that Video content retrieval device obtains step S103 Information is compared, to obtain the acoustic information of video content and the sound analog information of sound retrieval information.The sound phase It include the corresponding relationship of acoustic information Yu sound retrieval information like information.

In step s105, face analog information, the text that Video content retrieval device is obtained according to step S104 are similar Information, pattern analog information and sound analog information generate the content retrieval report of the video content.Content retrieval report It whether can reflect in video content containing face retrieval information, pattern the retrieval information, sound retrieval set in step S101 Information and character search information, and the confidence level with corresponding retrieval information is provided, as occurred " Zhang San " people in video content Face, confidence level 95%；Occurs the sound clip etc. of " Zhang San " in the text or video content for occurring " Zhang San " in video content.

The Video content retrieval process of the Video content retrieval method of the present embodiment is completed in this way.

The Video content retrieval method of the present embodiment simultaneously believes the face information in video content, text information, pattern Breath and acoustic information are retrieved, and are improved the accuracy rate of Video content retrieval and are reduced the cost of Video content retrieval.

Referring to figure 2., Fig. 2 is the flow chart of the second embodiment of Video content retrieval method of the invention.The present embodiment Video content retrieval method above-mentioned electronic equipment can be used to be implemented, the Video content retrieval method packet of the present embodiment It includes:

Step S201 obtains multiple face sample images, and uses the default face nerve of multiple face sample images training Network；

Step S202 obtains multiple pattern sample images, and uses multiple pattern sample images training predetermined pattern nerve Network；

Step S203, obtain the corresponding face retrieval information of video content, pattern retrieval information, sound retrieval information and Character search information；

Step S204 extracts the audio frame of video content, and the video image of video content is extracted by setting time interval；

Step S205 detects video image and audio frame using default detection algorithm, to obtain the face of video content Information, text information, pattern-information and acoustic information；

Step S206 obtains the face information of video content and face analog information, the video content of face retrieval information Text information and the text analog information of character search information, the pattern-information of video content and pattern retrieval information pattern The sound analog information of analog information, the acoustic information of video content and sound retrieval information；

Step S207, according to face analog information, text analog information, pattern analog information and sound analog information, Generate content retrieval report.

In step s 201, Video content retrieval device can obtain multiple face sample images, and based on depth convolution mind Through network model, the default face neural network of multiple face sample images training is used.The face neural network is used for video Face in the video image of content is identified, to extract the face information in video content.

In step S202, Video content retrieval device can obtain multiple pattern sample images, and based on open source model Faster-rcnn uses multiple pattern sample images training predetermined pattern neural network.The pattern neural network is used for video Pattern in the video image of content is identified, to extract the pattern-information in video content.

In step S203, Video content retrieval device obtains the corresponding face retrieval information of video content, pattern retrieval Information, sound retrieval information and character search information.

In step S204, Video content retrieval device can carry out contents extraction behaviour to the video content that user provides Make；Specifically, video content can be separated into video frame and audio frame by Video content retrieval device, then by between setting time The video image of video content is extracted every (such as 500ms).The audio frame is used to extract the acoustic information in video content, the view Frequency image is used to extract face information, text information and the pattern-information in video content.

In step S205, Video content retrieval device uses the video figure for presetting detection algorithm detecting step S204 acquisition Picture and video frame, to obtain the face information in video content, text information, pattern-information and acoustic information.

Video content retrieval device can pass through the face information in above-mentioned default detection algorithm acquisition video content, text in this way Word information, pattern-information and acoustic information.

In step S206, what the face information and step S203 that Video content retrieval device obtains step S205 were set Face retrieval information is compared, to obtain the face information of video content and the face analog information of face retrieval information. The face analog information includes that the corresponding relationship of face information and face retrieval information, face information are believed with corresponding face retrieval Human face similarity degree between breath.

Specifically referring to figure 3., Fig. 3 is in the step S206 of the second embodiment of Video content retrieval method of the invention The acquisition flow chart of face analog information.Step S206 includes:

Step S301, Video content retrieval device obtain video content face information face information feature vector, with And the face retrieval information eigenvector of face retrieval information.Here face information may include left eye, right eye, nose, Zuo Zui Angle, the right corners of the mouth, Video content retrieval device can extract corresponding face information feature vector based on the coordinate of above-mentioned face information. Same Video content retrieval device can extract corresponding face retrieval information eigenvector based on the coordinate of face retrieval information.

Step S302, Video content retrieval device calculate the face information feature vector for the face information that step S301 is obtained With the vector distance of the face retrieval information eigenvector of all people's face retrieval information.Video content retrieval device can lead in this way It crosses above-mentioned vector distance and judges similarity between face information and face retrieval information.Here open source similarity searching can be used The similarity of library faiss progress feature vector.

Step S303, Video content retrieval device obtain the face of face information feature vector and all face retrieval information Retrieve the vector distance of information eigenvector, and the corresponding people of face retrieval information eigenvector that will there is minimum vector distance Face retrieval information is set as the corresponding face retrieval information of face information.And feature vector and corresponding face according to face information The minimum vector distance between information eigenvector is retrieved, determines that the face between face information and face retrieval information is similar Degree, i.e., minimum vector distance is smaller, and human face similarity degree is higher；Minimum vector distance is bigger, and human face similarity degree is lower.

The character search of Video content retrieval device obtains step S205 simultaneously text information and step S203 setting Information is compared, to obtain the text information of video content and the text analog information of character search information.The text phase It include the corresponding relationship of text information Yu character search information like information.

Specifically referring to figure 4., Fig. 4 is in the step S206 of the second embodiment of Video content retrieval method of the invention The acquisition flow chart of text analog information.Step S206 includes:

Step S401 judges whether text information is identical as any character search information, such as identical, then goes to step S402；If not identical, then step S403 is gone to；

Step S402 then determines identical document retrieval information if text information is identical as any character search information For character search information corresponding with text information.

Step S403, as text information is different from any character search information, it is determined that the text information does not correspond to any Character search information.

In addition the pattern for pattern-information and step the S203 setting that Video content retrieval device obtains step S205 is retrieved Information is compared, to obtain the pattern-information of video content and the pattern analog information of pattern retrieval information.The pattern phase It include the corresponding relationship of pattern-information and pattern retrieval information like information.

Specifically, Video content retrieval device directly by identical pattern retrieval information be determined as it is corresponding with pattern-information Pattern retrieves information.

Furthermore the sound retrieval for acoustic information and step the S203 setting that Video content retrieval device obtains step S205 Information is compared, to obtain the acoustic information of video content and the sound analog information of sound retrieval information.The sound phase It include the corresponding relationship of acoustic information Yu sound retrieval information like information.

Specifically, Video content retrieval device directly identical sound retrieval information is determined as it is corresponding with acoustic information Sound retrieval information.

In step S207, Video content retrieval device is similar according to face analog information, the text that step S206 is obtained Information, pattern analog information and sound analog information generate the content retrieval report of the video content.Content retrieval report It may include face information search report, retrieving text information report, pattern-information search report and acoustic information search report.

Specifically, Video content retrieval device displaying period according to face information, corresponding face retrieval information with And human face similarity degree determines face information search report；As video content occurs " opening in 14 points of 30 seconds to 15 minutes periods Such as there is repeatedly " Zhang San " face, then video within the period of the first setting value (such as 30 seconds) in three " faces, confidence level 95% Above-mentioned face information search report can be merged operation by content search apparatus, as video content 14 points 30 seconds to 15 minutes Period in there is " Zhang San " face three times, confidence level 95%.

Video content retrieval device can also be true according to the displaying period of text information and corresponding character search information Determine retrieving text information report；If video content occurs the text of " Zhang San " in 14 points of 30 seconds to 15 minutes periods, such as exist Occurs repeatedly " Zhang San " text in the period of second setting value, then Video content retrieval device can be by above-mentioned retrieving text information Report merges operation, as video content " Zhang San " text three times occurs within 14 points of 30 seconds to 15 minutes periods.

Video content retrieval device can also be true according to the displaying period of pattern-information and corresponding pattern retrieval information Determine pattern-information search report；As video content the flag figure of illegal organization occurs in 14 points of 30 seconds to 15 minutes periods Such as there is the flag pattern of multiple illegal organization within the period of third setting value in case, then Video content retrieval device can incite somebody to action Above-mentioned pattern-information search report merges operation, as video content occurs within 14 points of 30 seconds to 15 minutes periods The flag pattern of illegal organization three times.

Video content retrieval device can also be true according to the displaying period of acoustic information and corresponding sound retrieval information Determine acoustic information search report；If video content occurs the sound clip of " Zhang San " in 14 points of 30 seconds to 15 minutes periods, Occurs the sound clip of repeatedly " Zhang San " such as within the period of the 4th setting value, then Video content retrieval device can be by above-mentioned sound Message breath search report merges operation, as video content occurs three times within 14 points of 30 seconds to 15 minutes periods The sound clip of " Zhang San ".

Whether content retrieval report in this way can reflect in video content containing the face retrieval set in step S203 Information, pattern retrieval information, sound retrieval information and character search information, and provide to have and correspond to the credible of retrieval information Degree.

On the basis of first embodiment, the Video content retrieval method of the present embodiment is by presetting face nerve net Network and pattern neural network, further improve the accuracy rate of Video content retrieval；Face analog information in the present embodiment, Text analog information, pattern analog information and sound analog information acquisition process further reduced corresponding content retrieval report The manufacturing cost of announcement.

The present invention also provides a kind of Video content retrieval devices, and referring to figure 5., Fig. 5 is Video content retrieval of the invention The structural schematic diagram of the first embodiment of device.Above-mentioned video content can be used to examine for the Video content retrieval device of the present embodiment The first embodiment of Suo Fangfa is implemented, and the Video content retrieval device 50 of the present embodiment includes retrieval data obtaining module 51, content extraction module 52, content detection module 53, analog information obtain module 54 and content retrieval module 55.

Retrieval data obtaining module 51 is for obtaining the corresponding face retrieval information of video content, pattern retrieval information, sound Sound retrieves information and character search information；When content extraction module 52 is set for extracting the audio frame of video content, and pressing Between interval extract video content video image；Content detection module 53 be used for using default detection algorithm detection video image with And audio frame, to obtain face information, text information, pattern-information and the acoustic information of video content；Analog information obtains Module 54 is used to obtain the face information of video content and the text of the face analog information of face retrieval information, video content is believed It ceases similar to the pattern retrieval pattern of information to the pattern-information of the text analog information of the character search information, video content The sound analog information of information, the acoustic information of video content and sound retrieval information；Content retrieval module 55 is used for according to people Face analog information, text analog information, pattern analog information and sound analog information generate content retrieval report.

The Video content retrieval device 50 of the present embodiment is in use, retrieval data obtaining module 51 first obtains video content Corresponding face retrieval information, pattern retrieval information, sound retrieval information and character search information.

Before data obtaining module 51 is retrieved to video content progress search operaqtion, a content number to be retrieved can be first created According to library, which may include face retrieval information, pattern retrieval information, sound retrieval information and text inspection Rope information etc..

Retrieval data obtaining module 51 can retrieve information, sound retrieval according to the face retrieval information of acquisition, pattern in this way The information creatings such as information and character search information content data base to be retrieved.

After having created content data base to be retrieved, content extraction module 52 can be carried out the video content that user provides Contents extraction operation；Specifically, video content can be separated into video frame and audio frame by content extraction module 52, then by setting Fix time interval (such as 500ms) extract video content video image.The audio frame is used to extract the sound in video content Information, the video image are used to extract face information, text information and the pattern-information in video content.

Then content detection module 53 using the video image that default detection algorithm detection content extraction module 52 obtains with And video frame, to obtain the face information in video content, text information, pattern-information and acoustic information.

Content detection module 53 can pass through the face information in above-mentioned default detection algorithm acquisition video content, text in this way Information, pattern-information and acoustic information.

Subsequent analog information obtains the face information that module 54 obtains content detection module 53 and retrieval acquisition of information mould The face retrieval information of block setting is compared, to obtain the face information of video content and the face phase of face retrieval information Like information.The face analog information includes the corresponding relationship of face information and face retrieval information, face information and corresponding people Face retrieves the human face similarity degree between information.

Analog information obtains the text information and retrieval acquisition of information mould that module 54 obtains content detection module 53 simultaneously The character search information of block setting is compared, to obtain the text information of video content and the text phase of character search information Like information.The text analog information includes the corresponding relationship of text information Yu character search information.

In addition analog information obtains the pattern-information and retrieval acquisition of information mould that module 54 obtains content detection module 53 The pattern retrieval information of block setting is compared, to obtain the pattern-information and the pattern phase of pattern retrieval information of video content Like information.The pattern analog information includes the corresponding relationship of pattern-information and pattern retrieval information.

Furthermore analog information obtains the acoustic information and retrieval acquisition of information mould that module 54 obtains content detection module 53 The sound retrieval information of block setting is compared, to obtain the acoustic information of video content and the sound phase of sound retrieval information Like information.The sound analog information includes the corresponding relationship of acoustic information Yu sound retrieval information.

Last content retrieval module 55 obtains face analog information, the similar letter of text that module 54 obtains according to analog information Breath, pattern analog information and sound analog information generate the content retrieval report of the video content.Content retrieval report can With whether face retrieval information, pattern retrieval information, sound retrieval information and text containing setting are examined in reflecting video content Rope information, and the confidence level with corresponding retrieval information is provided, as occurred " Zhang San " face, confidence level 95% in video content； Occurs the sound clip etc. of " Zhang San " in the text or video content for occurring " Zhang San " in video content.

The Video content retrieval process of the Video content retrieval device 50 of the present embodiment is completed in this way.

The Video content retrieval device of the present embodiment simultaneously believes the face information in video content, text information, pattern Breath and acoustic information are retrieved, and are improved the accuracy rate of Video content retrieval and are reduced the cost of Video content retrieval.

Fig. 6 is please referred to, Fig. 6 is the structural schematic diagram of the second embodiment of Video content retrieval device of the invention.This reality The second embodiment of above-mentioned Video content retrieval method can be used to be implemented for the Video content retrieval device for applying example, this implementation The Video content retrieval device 60 of example includes face neural metwork training module 61, pattern neural metwork training module 62, retrieval Data obtaining module 63, content extraction module 64, content detection module 65, analog information obtain module 66 and content retrieval mould Block 67.

Face neural metwork training module 61 uses multiple face sample images for obtaining multiple face sample images The default face neural network of training；Pattern neural metwork training module 62 uses more for obtaining multiple pattern sample images A pattern sample image training predetermined pattern neural network.Retrieval data obtaining module 63 is for obtaining the corresponding people of video content Face retrieves information, pattern retrieval information, sound retrieval information and character search information；Content extraction module 64 is for extracting view The audio frame of frequency content, and by the video image of setting time interval extraction video content；Content detection module 65 is for using Default detection algorithm detection video image and audio frame, to obtain face information, the text information, pattern-information of video content And acoustic information；Analog information obtains module 66 and is used to obtain the face information of video content and the face of face retrieval information Analog information, the text information of video content and the text analog information of character search information, video content pattern-information with Pattern retrieves the sound analog information of the pattern analog information of information, the acoustic information of video content and sound retrieval information；It is interior Hold retrieval module 67 to be used for according to face analog information, text analog information, pattern analog information and sound analog information, it is raw It is reported at content retrieval.

Fig. 7 is please referred to, Fig. 7 is the knot of the content detection module of the second embodiment of Video content retrieval device of the invention Structure schematic diagram.The content detection module 65 includes face information detection unit 71, text information detection unit 72, pattern-information inspection Survey unit 73 and sound information detection unit 74.

Face information detection unit 71 is used to detect the face area in the video image using default face neural network Domain, to obtain the face information of video content；Text information detection unit 72 is used to detect video figure using Text region algorithm Character area as in, to obtain the text information of video content；Pattern-information detection unit 73 is used for using predetermined pattern mind Through the area of the pattern in network detection video image, to obtain the pattern-information of video content；Sound information detection unit 74 is used In carrying out speech recognition operation to audio frame, to obtain the acoustic information of video content.

Fig. 8 is please referred to, Fig. 8 is that the analog information of the second embodiment of Video content retrieval device of the invention obtains module Structural schematic diagram.It includes feature vector acquiring unit 81, vector distance computing unit 82, people that the analog information, which obtains module 66, Face analog information acquiring unit 83, text information judging unit 84, text analog information acquiring unit 85, pattern analog information obtain Take unit 86 and sound analog information acquiring unit 87.

Feature vector acquiring unit 81 is used to obtain the face information feature vector of the face information of video content, Yi Jiren The face retrieval information eigenvector of face retrieval information；Vector distance computing unit 82 is used to calculate the face information of face information The vector distance of the face retrieval information eigenvector of feature vector and all face retrieval information；Face analog information obtains single Member 83 is for determining that face retrieval information corresponding with face information and corresponding face are similar according to the smallest vector distance Degree；Text information judging unit 84 is for judging whether text information is identical as any character search information；Text analog information Acquiring unit 85 is used for as text information is identical with any character search information, then identical character search information is determined as and The corresponding character search information of text information；If text information is not identical as any character search information, then text information is pair Answer any character search information；Pattern analog information acquiring unit 86 is used to for identical pattern retrieval information being determined as and pattern The corresponding pattern of information retrieves information；Sound analog information acquiring unit 87 be used for by identical sound retrieval information be determined as with The corresponding sound retrieval information of acoustic information.

The Video content retrieval device 60 of the present embodiment in use, first face neural metwork training module 61 can obtain it is more A face sample image, and it is based on depth convolutional neural networks model, use the default face mind of multiple face sample images training Through network.The face neural network is for identifying the face in the video image of video content, to extract in video Face information in appearance.

Subsequent pattern neural metwork training module 62 can obtain multiple pattern sample images, and based on open source model Faster-rcnn uses multiple pattern sample images training predetermined pattern neural network.The pattern neural network is used for video Pattern in the video image of content is identified, to extract the pattern-information in video content.

Then retrieval data obtaining module 63 obtains the corresponding face retrieval information of video content, pattern retrieves information, sound Sound retrieves information and character search information.

Before data obtaining module 63 is retrieved to video content progress search operaqtion, a content number to be retrieved can be first created According to library, which may include face retrieval information, pattern retrieval information, sound retrieval information and text inspection Rope information etc..

Retrieval data obtaining module 63 can retrieve information, sound retrieval according to the face retrieval information of acquisition, pattern in this way The information creatings such as information and character search information content data base to be retrieved.

Subsequent content extraction module 64 can carry out contents extraction operation to the video content that user provides；Specifically, interior Video content can be separated into video frame and audio frame by holding extraction module 64, then press setting time interval (such as 500ms) Extract the video image of video content.The audio frame is used to extract the acoustic information in video content, and the video image is for mentioning Take face information, text information and the pattern-information in video content.

Then content detection module 65 detects the video image obtained and video frame using default detection algorithm, to obtain Take face information, text information, pattern-information and the acoustic information in video content.

Specifically, the face information detection unit 71 of content detection module 65 detects video using default face neural network Human face region in image, to obtain the face information of video content；The text information detection unit 72 of content detection module 65 Using the character area in Text region algorithm detection video image, to obtain the text information of video content；Content detection mould The pattern-information detection unit 73 of block 65 is using the area of the pattern in predetermined pattern neural network detection video image, to obtain view The pattern-information of frequency content；The sound information detection unit 74 of content detection module 65 carries out speech recognition operation to audio frame, To obtain the acoustic information of video content.

Content detection module 65 can pass through the face information in above-mentioned default detection algorithm acquisition video content, text in this way Information, pattern-information and acoustic information.

The face information that subsequent analog information obtains module 66 and will acquire is compared with the face retrieval information of setting, from And obtain the face information of video content and the face analog information of face retrieval information.The face analog information includes face letter Human face similarity degree between breath and the corresponding relationship of face retrieval information, face information and corresponding face retrieval information.

Specifically the acquisition process of face analog information includes:

The feature vector acquiring unit 81 that analog information obtains module 66 obtains the face letter of the face information of video content Cease the face retrieval information eigenvector of feature vector and face retrieval information.Here face information may include left eye, the right side Eye, nose, the left corners of the mouth, the right corners of the mouth, feature vector acquiring unit 81 can extract corresponding people based on the coordinate of above-mentioned face information Face information eigenvector.Same feature vector acquiring unit 81 can extract corresponding face inspection based on the coordinate of face retrieval information Rope information eigenvector.

The face information that the vector distance computing unit 82 that analog information obtains module 66 calculates the face information obtained is special Levy the vector distance of the face retrieval information eigenvector of vector and all people's face retrieval information.Vector distance calculates single in this way Member 82 can judge the similarity between face information and face retrieval information by above-mentioned vector distance.Here open source phase can be used The similarity of feature vector is carried out like property search library faiss.

The face analog information acquiring unit 83 that analog information obtains module 66 obtains face information feature vector and owns The vector distance of the face retrieval information eigenvector of face retrieval information, and will there is the face retrieval of minimum vector distance to believe The corresponding face retrieval information of breath feature vector is set as the corresponding face retrieval information of face information.And it is special according to face information The minimum vector distance between the corresponding face retrieval information eigenvector of vector sum is levied, determines that face information and face retrieval are believed Human face similarity degree between breath, i.e., minimum vector distance is smaller, and human face similarity degree is higher；Minimum vector distance is bigger, face phase It is lower like spending.

The text information that analog information obtains module 66 and will acquire is compared with the character search information of setting, to obtain Take the text information of video content and the text analog information of character search information.The text analog information include text information with The corresponding relationship of character search information.

Specifically the acquisition process of text analog information includes:

Analog information obtain module 66 text information judging unit 84 judge text information whether with any character search Information is identical；

If text information is identical as any character search information, then the text analog information that analog information obtains module 66 obtains Take unit 85 that identical document retrieval information is determined as character search information corresponding with text information.

If text information is different from any character search information, then the text analog information that analog information obtains module 66 obtains Unit 85 is taken to determine that the text information does not correspond to any character search information.

In addition the pattern-information that analog information obtains that module 66 will acquire is compared with the pattern of setting retrieval information, from And obtain the pattern-information of video content and the pattern analog information of pattern retrieval information.The pattern analog information includes pattern letter The corresponding relationship of breath and pattern retrieval information.

Specifically, the pattern analog information acquiring unit 86 that analog information obtains module 66 directly retrieves identical pattern Information is determined as pattern retrieval information corresponding with pattern-information.

Furthermore the acoustic information that analog information acquisition module 66 will acquire is compared with the sound retrieval information of setting, from And obtain the acoustic information of video content and the sound analog information of sound retrieval information.The sound analog information includes sound letter The corresponding relationship of breath and sound retrieval information.

Specifically, analog information obtains the sound analog information acquiring unit 87 of module 66 directly by identical sound retrieval Information is determined as sound retrieval information corresponding with acoustic information.

Last content retrieval module 67 obtains face analog information, the similar letter of text that module 66 obtains according to analog information Breath, pattern analog information and sound analog information generate the content retrieval report of the video content.Content retrieval report can Including face information search report, retrieving text information report, pattern-information search report and acoustic information search report.

Specifically, the displaying period according to face information of content retrieval module 67, corresponding face retrieval information and Human face similarity degree determines face information search report；As video content " Zhang San " occurs in 14 points of 30 seconds to 15 minutes periods Such as there is repeatedly " Zhang San " face, then content retrieval within the period of the first setting value (such as 30 seconds) in face, confidence level 95% Above-mentioned face information search report can be merged operation by module, if video content is in 14 points of 30 seconds to 15 minutes periods Inside there is " Zhang San " face three times, confidence level 95%.

Content retrieval module 67 can also be determined according to the displaying period of text information and corresponding character search information Retrieving text information report；If video content occurs the text of " Zhang San " in 14 points of 30 seconds to 15 minutes periods, such as the Occurs repeatedly " Zhang San " text in the period of two setting values, then content retrieval module 67 can report above-mentioned retrieving text information Operation is merged, as video content " Zhang San " text three times occurs within 14 points of 30 seconds to 15 minutes periods.

Content retrieval module 67 can also be determined according to the displaying period of pattern-information and corresponding pattern retrieval information Pattern-information search report；If there is the flag pattern of illegal organization in 14 points of 30 seconds to 15 minutes periods in video content, Such as occurs the flag pattern of multiple illegal organization within the period of third setting value, then content retrieval module can be by above-mentioned pattern Information retrieval report merges operation, as video content occurs three times illegally within 14 points of 30 seconds to 15 minutes periods The flag pattern of tissue.

Content retrieval module 67 can also be determined according to the displaying period of acoustic information and corresponding sound retrieval information Acoustic information search report；If video content occurs the sound clip of " Zhang San " in 14 points of 30 seconds to 15 minutes periods, such as Occurs the sound clip of repeatedly " Zhang San " within the period of the 4th setting value, then content retrieval module can be by above sound information Search report merges operation, as video content " Zhang San " three times occurs within 14 points of 30 seconds to 15 minutes periods Sound clip.

Content retrieval report can reflect the face retrieval information for whether containing setting in video content, pattern inspection in this way Rope information, sound retrieval information and character search information, and the confidence level with corresponding retrieval information is provided.

The Video content retrieval process of the Video content retrieval device 60 of the present embodiment is completed in this way.

On the basis of first embodiment, the Video content retrieval device of the present embodiment is by presetting face nerve net Network and pattern neural network, further improve the accuracy rate of Video content retrieval；Face analog information in the present embodiment, Text analog information, pattern analog information and sound analog information acquisition process further reduced corresponding content retrieval report The manufacturing cost of announcement.

Illustrate Video content retrieval method and Video content retrieval device of the invention below by a specific embodiment Concrete operating principle.Please referring to Fig. 9 a and Fig. 9 b, Fig. 9 a is that Video content retrieval method of the invention and Video content retrieval fill The illustrative view of functional configuration of the corresponding server end of the specific embodiment set；Fig. 9 b be Video content retrieval method of the invention and The Video content retrieval flow chart of the specific embodiment of Video content retrieval device.The Video content retrieval of this specific embodiment fills It installs in the server terminal that can carry out search operaqtion to video content.Fig. 9 a is please referred to, which includes view Frequency content access module 91, video content processing module 92, face recognition module 93, Text region module 94, image recognition mould Block 95, speech recognition module 96, data memory module 97 and output module 98.

Fig. 9 b is please referred to, the Video content retrieval device in this specific embodiment carries out the process packet of Video content retrieval It includes:

Step S901, video content AM access module 91 by data-interface 9A receive sensitive face database, sensitive sound information, Sensitive image information and sensitive text information etc..And by above-mentioned sensitive face database, sensitive sound information, sensitive image information with And sensitive text information is stored in data memory module 97.

Step S902, video content AM access module 91 receives the video content for needing to carry out search operaqtion, in rear video Hold processing module 92 and contents extraction operation is carried out to the video content.Video content is separated into video frame and audio first Frame；Video content processing module 92 is using the human face region in default face neural network detection video image, to obtain video The face information of content；Using the character area in Text region algorithm detection video image, to obtain the text of video content Information；Using the area of the pattern in predetermined pattern neural network detection video image, to obtain the pattern-information of video content；It is right Audio frame carries out transcoding, resampling and speech recognition operation, to obtain the acoustic information of video content.

Step S903, face recognition module 93 compare face information and the sensitive face information in sensitive face database Compared with judging whether there is sensitive face information corresponding with face information, and obtain sensitive face information corresponding with face information.

Model alignment such as is carried out to human face region, finds 5 coordinates (left eye, right eye, nose, the left corners of the mouth, the right corners of the mouth), Then feature vector is extracted according to 5 coordinates, goes data Layer to carry out inquiry comparison with feature vector, finds out most like candidate People and corresponding confidence level score value, and return result to output module.

Text information is compared by step S904, Text region module 94 with sensitive text information, judge whether there is with The corresponding sensitive text information of text information, to obtain sensitive text information corresponding with text information.

The recognition engine text for such as using excellent figure carries out text comparison to the picture that video content AM access module is submitted, and Return result to output module.

Pattern-information is compared by step S905, picture recognition module 95 with sensitivity pattern information, judge whether there is with The corresponding sensitivity pattern information of pattern-information, to obtain sensitivity pattern information corresponding with pattern-information.

Image recognition is such as carried out to the pattern-information that video content AM access module is submitted using open source model faster-rcnn And compare, and result is fed back into output module.

Voice messaging is compared by step S906, speech recognition module 96 with sensitive voice messaging, judge whether there is with The corresponding sensitive voice messaging of voice messaging, to obtain sensitive voice messaging corresponding with voice messaging.

ASR speech recognition engine is such as listened using wechat intelligence, the audio stream after transcoding resampling is unified to speech recognition module Speech recognition after VAD is cut offline is carried out, and returns result to output module.

Step S907, output module 98 according to obtain sensitive face information, sensitive text information, sensitivity pattern information with And sensitive voice messaging, the content retrieval report of corresponding video content is generated, and content retrieval report is fed back into client 9B.Concretely occur " Zhang San " face, confidence level 95% in video content；Occur the text or view of " Zhang San " in video content Occurs the sound clip etc. of " Zhang San " in frequency content.

It completes in the Video content retrieval method of this specific embodiment and the video of Video content retrieval device in this way Hold retrieving.

Video content retrieval method and Video content retrieval device of the invention simultaneously in video content face information, Text information, pattern-information and acoustic information are retrieved, and are improved the accuracy rate of Video content retrieval and are reduced video The cost of content retrieval；The effective solution higher cost of existing Video content retrieval method and Video content retrieval device And the technical problem that accuracy rate is lower.

" component ", " module ", " system ", " interface ", " process " etc. are generally intended to as used herein the term Refer to computer related entity: hardware, the combination of hardware and software, software or software in execution.For example, component can be but not It is limited to be the process on a processor of running, processor, object, executable application, thread, program and/or the computer executed. By diagram, both the application and the controller run on the controller can be component.One or more components can have It is in the process executed and/or thread, and component can be located on a computer and/or be distributed in two or more meters Between calculation machine.

Figure 10 and the discussion below, which provide, sets the electronics where realizing Video content retrieval device of the present invention Brief, summary the description of standby working environment.The working environment of Figure 10 be only an example of working environment appropriate simultaneously And suggestion is not intended to about the purposes of working environment or any restrictions of the range of function.Example electronic equipment 1012 includes but not It is limited to wearable device, helmet, medical treatment & health platform, personal computer, server computer, hand-held or on knee sets Standby, mobile device (such as mobile phone, personal digital assistant (PDA), media player etc.), multicomputer system, consumption Type electronic equipment, minicomputer, mainframe computer, distributed computing environment including above-mentioned arbitrary system or equipment, etc..

Although not requiring, in the common background that " computer-readable instruction " is executed by one or more electronic equipments Lower description embodiment.Computer-readable instruction can be distributed via computer-readable medium and (be discussed below).It is computer-readable Instruction can be implemented as program module, for example executes particular task or realize the function of particular abstract data type, object, application Programming interface (API), data structure etc..Typically, the function of the computer-readable instruction can be in various environment arbitrarily Combination or distribution.

Figure 10 illustrates the electronic equipment including one or more embodiments in Video content retrieval device of the invention 1012 example.In one configuration, electronic equipment 1012 includes at least one processing unit 1016 and memory 1018.According to The exact configuration and type of electronic equipment, memory 1018 can be volatibility (such as RAM), it is non-volatile (such as ROM, flash memory etc.) or both certain combination.The configuration is illustrated in Figure 10 by dotted line 1014.

In other embodiments, electronic equipment 1012 may include supplementary features and/or function.For example, equipment 1012 is also It may include additional storage device (such as removable and/or non-removable) comprising but it is not limited to magnetic memory apparatus, light Storage device etc..This additional memory devices are illustrated in Figure 10 by storage device 1020.In one embodiment, for real The computer-readable instruction of existing one or more embodiments provided in this article can be in storage device 1020.Storage device 1020 other computer-readable instructions that can also be stored for realizing operating system, application program etc..Computer-readable instruction It can be loaded into memory 1018 and be executed by such as processing unit 1016.

Term as used herein " computer-readable medium " includes computer storage medium.Computer storage medium includes The volatibility that any method or technique of the information of such as computer-readable instruction or other data etc is realized for storage With non-volatile, removable and nonremovable medium.Memory 1018 and storage device 1020 are the realities of computer storage medium Example.Computer storage medium includes but is not limited to RAM, ROM, EEPROM, flash memory or other memory technologies, CD-ROM, number Universal disc (DVD) or other light storage devices, cassette tape, tape, disk storage device or other magnetic storage apparatus can be with Any other medium for storing expectation information and can be accessed by electronic equipment 1012.Any such computer storage is situated between Matter can be a part of electronic equipment 1012.

Electronic equipment 1012 can also include the communication connection 1026 for allowing electronic equipment 1012 to communicate with other equipment.It is logical Letter connection 1026 can include but is not limited to modem, network interface card (NIC), integrated network interface, radiofrequency launcher/ Receiver, infrared port, USB connection or other interfaces for electronic equipment 1012 to be connected to other electronic equipments.Communication Connection 1026 may include wired connection or wireless connection.Communication connection 1026 can emit and/or receive communication medium.

Term " computer-readable medium " may include communication media.Communication media typically comprises computer-readable instruction Or other data in " the own modulated data signal " of such as carrier wave or other transmission mechanisms etc, and passed including any information Send medium.Term " own modulated data signal " may include such signal: one or more of the characteristics of signals is according to general Mode of the information coding into signal is set or changed.

Electronic equipment 1012 may include input equipment 1024, for example, keyboard, mouse, pen, voice-input device, touch it is defeated Enter equipment, infrared camera, video input apparatus and/or any other input equipment.It also may include that output is set in equipment 1012 Standby 1022, such as one or more displays, loudspeaker, printer and/or other any output equipments.1024 He of input equipment Output equipment 1022 can be connected to electronic equipment 1012 via wired connection, wireless connection or any combination thereof.In a reality It applies in example, input equipment or output equipment from another electronic equipment are used as the input equipment of electronic equipment 1012 1024 or output equipment 1022.

The component of electronic equipment 1012 can be connected by various interconnection (such as bus).Such interconnection may include outer Enclose component interconnection (PCI) (such as quick PCI), universal serial bus (USB), firewire (IEEE 1394), optical bus structure etc. Deng.In another embodiment, the component of electronic equipment 1012 can pass through network interconnection.For example, memory 1018 can be by Multiple physical memory cells arcs composition in different physical locations, by network interconnection.

It would be recognized by those skilled in the art that the storage equipment for storing computer-readable instruction can be across network point Cloth.For example, can store via the electronic equipment 1030 that network 1028 accesses for realizing one provided by the present invention or The computer-readable instruction of multiple embodiments.The accessible electronic equipment 1030 of electronic equipment 1012 and downloading computer is readable What is instructed is part or all of for execution.Alternatively, electronic equipment 1012 can be downloaded a plurality of computer-readable as needed It instructs or some instruction can execute at electronic equipment 1012 and some instructions can be held at electronic equipment 1030 Row.

There is provided herein the various operations of embodiment.In one embodiment, one or more operations can be with structure At the computer-readable instruction stored on one or more computer-readable mediums, will make to succeed in one's scheme when being executed by electronic equipment It calculates equipment and executes the operation.Describing the sequences of some or all of operations, to should not be construed as to imply that these operations necessarily suitable Sequence is relevant.It will be appreciated by those skilled in the art that the alternative sequence of the benefit with this specification.Furthermore, it is to be understood that Not all operation must exist in each embodiment provided in this article.

Moreover, although the disclosure, this field skill has shown and described relative to one or more implementations Art personnel will be appreciated that equivalent variations and modification based on the reading and understanding to the specification and drawings.The disclosure include it is all this The modifications and variations of sample, and be limited only by the scope of the following claims.In particular, to by said modules (such as element, Resource etc.) the various functions that execute, term for describing such components is intended to correspond to the specified function for executing the component The random component (unless otherwise instructed) of energy (such as it is functionally of equal value), even if illustrated herein with execution in structure The disclosure exemplary implementations in function open structure it is not equivalent.In addition, although the special characteristic of the disclosure Through being disclosed relative to the only one in several implementations, but this feature can with such as can be to given or specific application For be expectation and one or more other features combinations of other advantageous implementations.Moreover, with regard to term " includes ", " tool Have ", " containing " or its deformation be used in specific embodiments or claims for, such term be intended to with term The similar mode of "comprising" includes.

Each functional unit in the embodiment of the present invention can integrate in a processing module, be also possible to each unit list It is solely physically present, can also be integrated in two or more units in a module.Above-mentioned integrated module can both use Formal implementation of hardware can also be realized in the form of software function module.If the integrated module is with software function The form of module is realized and when sold or used as an independent product, also can store in computer-readable storage Jie In matter.Storage medium mentioned above can be read-only memory, disk or CD etc..Above-mentioned each device or system, can be with Execute the method in correlation method embodiment.

Although the serial number before embodiment only makes for convenience of description in conclusion the present invention is disclosed above with embodiment With not causing to limit to the sequence of various embodiments of the present invention.Also, above-described embodiment is not intended to limit the invention, this field Those of ordinary skill, without departing from the spirit and scope of the present invention, can make it is various change and retouch, therefore it is of the invention Protection scope subjects to the scope of the claims.

Claims

1. a kind of Video content retrieval method characterized by comprising

The video image and the audio frame are detected using default detection algorithm, to obtain the face letter of the video content Breath, text information, pattern-information and acoustic information；

Obtain the face information of the video content and the face analog information of the face retrieval information, the video content Text information and the text analog information of the character search information, the pattern-information of the video content and the pattern are retrieved The sound analog information of the pattern analog information of information, the acoustic information of the video content and the sound retrieval information；With And

According to the face analog information, the text analog information, the pattern analog information and the similar letter of the sound Breath generates content retrieval report.

2. Video content retrieval method according to claim 1, which is characterized in that described to be detected using default detection algorithm The video image and the video frame, with obtain the face information of the video content, text information, pattern-information and The step of acoustic information includes:

The human face region in the video image is detected using default face neural network, to obtain the face of the video content Information；The character area in the video image is detected using Text region algorithm, to obtain the text letter of the video content Breath；The area of the pattern in the video image is detected using predetermined pattern neural network, to obtain the pattern of the video content Information；Speech recognition operation is carried out to the audio frame, to obtain the acoustic information of the video content.

3. Video content retrieval method according to claim 2, which is characterized in that the Video content retrieval method is also wrapped It includes:

Multiple face sample images are obtained, and use multiple face sample image training default face neural network；

Multiple pattern sample images are obtained, and use multiple pattern sample image training predetermined pattern neural network.

4. Video content retrieval method according to claim 2, which is characterized in that the people for obtaining the video content The step of face analog information of face information and the face retrieval information includes:

Obtain the face inspection of the face information feature vector and the face retrieval information of the face information of the video content Rope information eigenvector；

Calculate the face information face information feature vector and all face retrieval information face retrieval information characteristics to The vector distance of amount；

According to the smallest vector distance, determine that face retrieval information corresponding with the face information and corresponding face are similar Degree.

5. Video content retrieval method according to claim 2, which is characterized in that

The step of text information and the text analog information of the character search information for obtaining the video content includes:

Judge whether the text information is identical as any character search information；

If so, identical character search information is then determined as character search information corresponding with the text information；If not, institute It states text information and does not correspond to any character search information；

The step of pattern-information for obtaining the video content and the pattern retrieve the pattern analog information of information include:

Identical pattern retrieval information is determined as pattern retrieval information corresponding with the pattern-information；

The step of acoustic information and the sound analog information of the sound retrieval information for obtaining the video content includes:

Identical sound retrieval information is determined as sound retrieval information corresponding with the acoustic information.

6. Video content retrieval method according to claim 1, which is characterized in that described according to the similar letter of the face Breath, the text analog information, the pattern analog information and the sound analog information generate the step of content retrieval report Suddenly include:

Face information is determined according to the displaying period of the face information, corresponding face retrieval information and human face similarity degree Search report；Retrieving text information report is determined according to the displaying period of the text information and corresponding character search information It accuses；Pattern-information search report is determined according to the displaying period of the pattern-information and corresponding pattern retrieval information；Root Acoustic information search report is determined according to the displaying period and corresponding sound retrieval information of the acoustic information.

7. Video content retrieval method according to claim 6, which is characterized in that described according to the similar letter of the face Breath, the text analog information, the pattern analog information and the sound analog information generate the step of content retrieval report Suddenly further include:

Face information search report by the time difference for showing the period less than the first setting value merges operation；When will show Between section time difference less than the second setting value retrieving text information report merge operation；It will show the time difference of period Pattern-information search report less than third setting value merges operation；The time difference of period will be shown less than the 4th setting The acoustic information search report of value merges operation.

8. a kind of Video content retrieval device characterized by comprising

Data obtaining module is retrieved, for obtaining the corresponding face retrieval information of video content, pattern retrieval information, sound retrieval Information and character search information；

Content extraction module is extracted in the video for extracting the audio frame of the video content, and by setting time interval The video image of appearance；

Content detection module, for detecting the video image and the audio frame using default detection algorithm, to obtain State face information, text information, pattern-information and the acoustic information of video content；

Analog information obtains module, the face phase of face information and the face retrieval information for obtaining the video content Like information, the text information of the video content and the text analog information of the character search information, the video content The acoustic information and the sound retrieval of pattern-information and the pattern analog information of pattern retrieval information, the video content The sound analog information of information；And

Content retrieval module, for according to the face analog information, the text analog information, the pattern analog information with And the sound analog information, generate content retrieval report.

9. Video content retrieval device according to claim 8, which is characterized in that the content detection module includes:

Face information detection unit, for using default face neural network to detect the human face region in the video image, with Obtain the face information of the video content；

Text information detection unit, for detecting the character area in the video image using Text region algorithm, to obtain The text information of the video content；

Pattern-information detection unit, for detecting the area of the pattern in the video image using predetermined pattern neural network, with Obtain the pattern-information of the video content；

Sound information detection unit, for carrying out speech recognition operation to the audio frame, to obtain the sound of the video content Message breath.

10. Video content retrieval device according to claim 9, which is characterized in that the Video content retrieval device packet It includes:

Face neural metwork training module for obtaining multiple face sample images, and uses multiple face sample images The training default face neural network；

Pattern neural metwork training module for obtaining multiple pattern sample images, and uses multiple pattern sample images The training predetermined pattern neural network.

11. Video content retrieval device according to claim 9, which is characterized in that the analog information obtains module packet It includes:

Feature vector acquiring unit, the face information feature vector of the face information for obtaining the video content, Yi Jisuo State the face retrieval information eigenvector of face retrieval information；

Vector distance computing unit, for calculating the face information feature vector and all face retrieval information of the face information Face retrieval information eigenvector vector distance；

Face analog information acquiring unit, for determining face corresponding with the face information according to the smallest vector distance Retrieve information and corresponding human face similarity degree.

12. Video content retrieval device according to claim 9, which is characterized in that the analog information obtains module packet It includes:

Text information judging unit, for judging whether the text information is identical as any character search information；

Text analog information acquiring unit, it is identical as any character search information for such as text information, then it will be identical Character search information is determined as character search information corresponding with the text information；As the text information not with any text It is identical to retrieve information, then the text information is corresponding any character search information；

Pattern analog information acquiring unit, for identical pattern retrieval information to be determined as figure corresponding with the pattern-information Case retrieves information；

Sound analog information acquiring unit, for identical sound retrieval information to be determined as sound corresponding with the acoustic information Sound retrieves information.

13. Video content retrieval device according to claim 8, which is characterized in that the content retrieval module is specifically used According to the displaying period of the face information, corresponding face retrieval information and human face similarity degree determine face information examine Rope report；Retrieving text information report is determined according to the displaying period of the text information and corresponding character search information It accuses；Pattern-information search report is determined according to the displaying period of the pattern-information and corresponding pattern retrieval information；Root Acoustic information search report is determined according to the displaying period and corresponding sound retrieval information of the acoustic information.

14. Video content retrieval device according to claim 13, which is characterized in that the content retrieval module is also used to Face information search report by the time difference for showing the period less than the first setting value merges operation；It will show the period Time difference less than the second setting value retrieving text information report merge operation；The time difference for showing the period is less than The pattern-information search report of third setting value merges operation；The time difference of period will be shown less than the 4th setting value Acoustic information search report merges operation.

15. a kind of storage medium is stored with processor-executable instruction, described instruction is by one or more processors Load, to execute such as Video content retrieval method any in claim 1-7.