CN110209880A - Video content retrieval method, Video content retrieval device and storage medium - Google Patents
Video content retrieval method, Video content retrieval device and storage medium Download PDFInfo
- Publication number
- CN110209880A CN110209880A CN201811009469.6A CN201811009469A CN110209880A CN 110209880 A CN110209880 A CN 110209880A CN 201811009469 A CN201811009469 A CN 201811009469A CN 110209880 A CN110209880 A CN 110209880A
- Authority
- CN
- China
- Prior art keywords
- information
- face
- video content
- retrieval
- pattern
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/783—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
- G06F16/7834—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using audio features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/783—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
- G06F16/7837—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content
- G06F16/784—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content the detected or recognised objects being people
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/783—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
- G06F16/7844—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using original textual content or text extracted from visual content or transcript of audio data
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Library & Information Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention provides a kind of Video content retrieval method comprising: obtain the corresponding face retrieval information of video content, pattern retrieval information, sound retrieval information and character search information;The audio frame of video content is extracted, and extracts the video image of video content by setting time interval;Video image and audio frame are detected using default detection algorithm, to obtain face information, text information, pattern-information and the acoustic information of video content;Obtain face analog information, text analog information, pattern analog information, sound analog information;According to face analog information, text analog information, pattern analog information and sound analog information, content retrieval report is generated.The present invention also provides a kind of Video content retrieval device, the present invention simultaneously retrieves face information, text information, pattern-information and the acoustic information in video content, improves the accuracy rate of Video content retrieval and reduces the cost of Video content retrieval.
Description
Technical field
The present invention relates to data processing fields, fill more particularly to a kind of Video content retrieval method, Video content retrieval
It sets and storage medium.
Background technique
With the development of society, requirement of the people to various shared resources is higher and higher, such as share on the internet various
Video resource or literal resource.But in order to avoid illegal video resource or literal resource are propagated on the internet, subnetwork
The video resource or literal resource progress Content Advisor that the resource provisioning chamber of commerce uploads client.
Wherein machine can be used to carry out Text region operation, overall audit work to upload word content automatically for literal resource
Work amount is lower.Video resource then needs manually to identify picture material therein and sound-content, due to now illegal
Molecule can be inserted into illegal image or illegal sound among video resource, or in video resource picture material or sound in
Appearance is modified, and the modifying point time of occurrence in these video resources is short or hiding is stronger, and the artificial of video resource is caused to examine
The workload and work difficulty of core greatly increase, and manual examination and verification are easy to appear careless omission.Therefore existing Video content retrieval side
The higher cost and accuracy rate of method are lower.
Summary of the invention
The embodiment of the present invention provide it is a kind of retrieval cost is relatively low and the higher video content of the accuracy rate of Video content retrieval
Search method, Video content retrieval device and storage medium;To solve existing Video content retrieval method and video content inspection
The lower technical problem of the higher cost and accuracy rate of rope device.
The embodiment of the present invention provides a kind of Video content retrieval method comprising:
Obtain the corresponding face retrieval information of video content, pattern retrieval information, sound retrieval information and character search
Information;
The audio frame of the video content is extracted, and extracts the video image of the video content by setting time interval;
The video image and the audio frame are detected using default detection algorithm, to obtain the people of the video content
Face information, text information, pattern-information and acoustic information;
It obtains in the face information of the video content and the face analog information of the face retrieval information, the video
The pattern-information and the pattern of the text information of appearance and the text analog information of the character search information, the video content
Retrieve the pattern analog information of information, the acoustic information letter similar to the sound of the sound retrieval information of the video content
Breath;And
According to the face analog information, the text analog information, the pattern analog information and the sound phase
Like information, content retrieval report is generated.
The embodiment of the present invention also provides a kind of Video content retrieval device comprising:
Data obtaining module is retrieved, for obtaining the corresponding face retrieval information of video content, pattern retrieval information, sound
Retrieve information and character search information;
Content extraction module extracts the view for extracting the audio frame of the video content, and by setting time interval
The video image of frequency content;
Content detection module, for detecting the video image and the audio frame using default detection algorithm, to obtain
Take face information, text information, pattern-information and the acoustic information of the video content;
Analog information obtains module, for obtaining the face information of the video content and the people of the face retrieval information
In face analog information, the text information of the video content and the text analog information of the character search information, the video
The acoustic information and the sound of the pattern-information of appearance and the pattern analog information of pattern retrieval information, the video content
Retrieve the sound analog information of information;And
Content retrieval module, for according to the face analog information, the text analog information, the similar letter of the pattern
Breath and the sound analog information generate content retrieval report.
The embodiment of the present invention also provides a kind of storage medium, is stored with processor-executable instruction, described instruction by
One or more processors load, to execute any of the above-described Video content retrieval method.
Compared to the prior art, Video content retrieval method of the invention, Video content retrieval device and storage medium are same
When face information, text information, pattern-information and the acoustic information in video content are retrieved, improve video content
The accuracy rate of retrieval and the cost for reducing Video content retrieval;Effective solution existing Video content retrieval method and view
The lower technical problem of the higher cost and accuracy rate of frequency content search apparatus.
Detailed description of the invention
Fig. 1 is the flow chart of the first embodiment of Video content retrieval method of the invention;
Fig. 2 is the flow chart of the second embodiment of Video content retrieval method of the invention;
Fig. 3 is the face analog information in the step S206 of the second embodiment of Video content retrieval method of the invention
Obtain flow chart;
Fig. 4 is the text analog information in the step S206 of the second embodiment of Video content retrieval method of the invention
Obtain flow chart;
Fig. 5 is the structural schematic diagram of the first embodiment of Video content retrieval device of the invention;
Fig. 6 is the structural schematic diagram of the second embodiment of Video content retrieval device of the invention;
Fig. 7 is the structural schematic diagram of the content detection module of the second embodiment of Video content retrieval device of the invention;
Fig. 8 is that the analog information of the second embodiment of Video content retrieval device of the invention obtains the structural representation of module
Figure;
Fig. 9 a is the corresponding clothes of specific embodiment of Video content retrieval method and Video content retrieval device of the invention
The illustrative view of functional configuration at business device end;
Fig. 9 b is in the video of the specific embodiment of Video content retrieval method and Video content retrieval device of the invention
Hold retrieval flow figure;
Figure 10 is the working environment structural schematic diagram of the electronic equipment where Video content retrieval device of the invention.
Specific embodiment
Schema is please referred to, wherein identical component symbol represents identical component, the principle of the present invention is to implement one
It is illustrated in computing environment appropriate.The following description be based on illustrated by the specific embodiment of the invention, should not be by
It is considered as the limitation present invention other specific embodiments not detailed herein.
In the following description, specific embodiments of the present invention will refer to the operation as performed by one or multi-section computer
The step of and symbol illustrate, unless otherwise stating clearly.Therefore, these steps and operation be will appreciate that, mentioned for several times wherein having
It include by representing with the computer disposal list of the electronic signal of the data in a structuring pattern to be executed by computer
Member is manipulated.At this manipulation transforms data or the position being maintained in the memory system of the computer, it can match again
Set or in addition change in a manner familiar to those skilled in the art the running of the computer.The maintained data knot of the data
Structure is the provider location of the memory, has the specific feature as defined in the data format.But the principle of the invention is with above-mentioned
Text illustrates, is not represented as a kind of limitation, those skilled in the art will appreciate that plurality of step as described below and
Operation also may be implemented in hardware.
Video content retrieval method and Video content retrieval device of the invention may be provided in any electronic equipment,
For carrying out search operaqtion to video content in terms of face information, text information, pattern-information and acoustic information four, from
And effective accuracy for improving corresponding content search report.The electronic equipment includes but is not limited to wearable device, wears
Equipment, medical treatment & health platform, personal computer, server computer, hand-held or laptop devices, mobile device (for example are moved
Mobile phone, personal digital assistant (PDA, Personal Digital Assistant), media player etc.), multiprocessor
System, consumer electronic devices, minicomputer, mainframe computer, the distributed computing including above-mentioned arbitrary system or equipment
Environment, etc..The electronic equipment is preferably the mobile terminal or fixed terminal that search operaqtion is carried out to video content, and the movement is whole
End or fixed terminal can examine respectively face information, text information, pattern-information and the acoustic information in video content
Rope operation, to improve the accuracy rate of Video content retrieval, reduces the cost of Video content retrieval.
Fig. 1 is please referred to, Fig. 1 is the flow chart of the first embodiment of Video content retrieval method of the invention.The present embodiment
Video content retrieval method above-mentioned electronic equipment can be used to be implemented, the Video content retrieval method packet of the present embodiment
It includes:
Step S101, obtain the corresponding face retrieval information of video content, pattern retrieval information, sound retrieval information and
Character search information;
Step S102 extracts the audio frame of video content, and the video image of video content is extracted by setting time interval;
Step S103 detects video image and audio frame using default detection algorithm, to obtain the face of video content
Information, text information, pattern-information and acoustic information;
Step S104 obtains the face information of video content and face analog information, the video content of face retrieval information
Text information and the text analog information of character search information, the pattern-information of video content and pattern retrieval information pattern
The sound analog information of analog information, the acoustic information of video content and sound retrieval information;
Step S105, according to face analog information, text analog information, pattern analog information and sound analog information,
Generate content retrieval report.
The following detailed description of the detailed process of each step of the Video content retrieval method of the present embodiment.
In step s101, Video content retrieval device obtains the corresponding face retrieval information of video content, pattern retrieval
Information, sound retrieval information and character search information.
Before Video content retrieval device carries out search operaqtion to video content, a content-data to be retrieved can be first created
Library, the content data base to be retrieved may include face retrieval information, pattern retrieval information, sound retrieval information and character search
Information etc..
Here face retrieval information is the face information for needing to retrieve, such as the face information of Zhang San.Pattern retrieval letter
The pattern-information that breath is retrieved for needs, such as the flag information of illegal organization.Sound retrieval information is the knowledge of pre-set voice
Other sensitive word, such as the name voice of Zhang San.Character search information is pre-set Text region sensitive word, such as the name of Zhang San
Word text etc..
Video content retrieval device can retrieve information according to the face retrieval information of acquisition, pattern in this way, sound retrieval is believed
The information creatings content data base to be retrieved such as breath and character search information.
In step s 102, after having created content data base to be retrieved, Video content retrieval device can be mentioned user
The video content of confession carries out contents extraction operation;Specifically, video content can be separated into video frame by Video content retrieval device
And audio frame, the video image of video content is then extracted by setting time interval (such as 500ms).The audio frame is for mentioning
The acoustic information in video content is taken, which is used to extract face information, text information and the figure in video content
Case information.
In step s 103, the video figure that Video content retrieval device is obtained using default detection algorithm detecting step S102
Picture and video frame, to obtain the face information in video content, text information, pattern-information and acoustic information.
Here default detection algorithm can be literary for the face neural network of detection face information, the OCR of detection text information
The pattern nerve of word recognizer (Optical Character Recognition, optical character identification), detection pattern information
Network and ASR speech recognition algorithm (Automatic Speech Recognition, the automatic speech knowledge for detecting acoustic information
Not) etc..
Video content device can be believed by the face information in above-mentioned default detection algorithm acquisition video content, text in this way
Breath, pattern-information and acoustic information.
In step S104, what the face information and step S101 that Video content retrieval device obtains step S103 were set
Face retrieval information is compared, to obtain the face information of video content and the face analog information of face retrieval information.
The face analog information includes that the corresponding relationship of face information and face retrieval information, face information are believed with corresponding face retrieval
Human face similarity degree between breath.
The character search of Video content retrieval device obtains step S103 simultaneously text information and step S101 setting
Information is compared, to obtain the text information of video content and the text analog information of character search information.The text phase
It include the corresponding relationship of text information Yu character search information like information.
In addition the pattern for pattern-information and step the S101 setting that Video content retrieval device obtains step S103 is retrieved
Information is compared, to obtain the pattern-information of video content and the pattern analog information of pattern retrieval information.The pattern phase
It include the corresponding relationship of pattern-information and pattern retrieval information like information.
Furthermore the sound retrieval for acoustic information and step the S101 setting that Video content retrieval device obtains step S103
Information is compared, to obtain the acoustic information of video content and the sound analog information of sound retrieval information.The sound phase
It include the corresponding relationship of acoustic information Yu sound retrieval information like information.
In step s105, face analog information, the text that Video content retrieval device is obtained according to step S104 are similar
Information, pattern analog information and sound analog information generate the content retrieval report of the video content.Content retrieval report
It whether can reflect in video content containing face retrieval information, pattern the retrieval information, sound retrieval set in step S101
Information and character search information, and the confidence level with corresponding retrieval information is provided, as occurred " Zhang San " people in video content
Face, confidence level 95%;Occurs the sound clip etc. of " Zhang San " in the text or video content for occurring " Zhang San " in video content.
The Video content retrieval process of the Video content retrieval method of the present embodiment is completed in this way.
The Video content retrieval method of the present embodiment simultaneously believes the face information in video content, text information, pattern
Breath and acoustic information are retrieved, and are improved the accuracy rate of Video content retrieval and are reduced the cost of Video content retrieval.
Referring to figure 2., Fig. 2 is the flow chart of the second embodiment of Video content retrieval method of the invention.The present embodiment
Video content retrieval method above-mentioned electronic equipment can be used to be implemented, the Video content retrieval method packet of the present embodiment
It includes:
Step S201 obtains multiple face sample images, and uses the default face nerve of multiple face sample images training
Network;
Step S202 obtains multiple pattern sample images, and uses multiple pattern sample images training predetermined pattern nerve
Network;
Step S203, obtain the corresponding face retrieval information of video content, pattern retrieval information, sound retrieval information and
Character search information;
Step S204 extracts the audio frame of video content, and the video image of video content is extracted by setting time interval;
Step S205 detects video image and audio frame using default detection algorithm, to obtain the face of video content
Information, text information, pattern-information and acoustic information;
Step S206 obtains the face information of video content and face analog information, the video content of face retrieval information
Text information and the text analog information of character search information, the pattern-information of video content and pattern retrieval information pattern
The sound analog information of analog information, the acoustic information of video content and sound retrieval information;
Step S207, according to face analog information, text analog information, pattern analog information and sound analog information,
Generate content retrieval report.
The following detailed description of the detailed process of each step of the Video content retrieval method of the present embodiment.
In step s 201, Video content retrieval device can obtain multiple face sample images, and based on depth convolution mind
Through network model, the default face neural network of multiple face sample images training is used.The face neural network is used for video
Face in the video image of content is identified, to extract the face information in video content.
In step S202, Video content retrieval device can obtain multiple pattern sample images, and based on open source model
Faster-rcnn uses multiple pattern sample images training predetermined pattern neural network.The pattern neural network is used for video
Pattern in the video image of content is identified, to extract the pattern-information in video content.
In step S203, Video content retrieval device obtains the corresponding face retrieval information of video content, pattern retrieval
Information, sound retrieval information and character search information.
Before Video content retrieval device carries out search operaqtion to video content, a content-data to be retrieved can be first created
Library, the content data base to be retrieved may include face retrieval information, pattern retrieval information, sound retrieval information and character search
Information etc..
Video content retrieval device can retrieve information according to the face retrieval information of acquisition, pattern in this way, sound retrieval is believed
The information creatings content data base to be retrieved such as breath and character search information.
In step S204, Video content retrieval device can carry out contents extraction behaviour to the video content that user provides
Make;Specifically, video content can be separated into video frame and audio frame by Video content retrieval device, then by between setting time
The video image of video content is extracted every (such as 500ms).The audio frame is used to extract the acoustic information in video content, the view
Frequency image is used to extract face information, text information and the pattern-information in video content.
In step S205, Video content retrieval device uses the video figure for presetting detection algorithm detecting step S204 acquisition
Picture and video frame, to obtain the face information in video content, text information, pattern-information and acoustic information.
Here default detection algorithm can be literary for the face neural network of detection face information, the OCR of detection text information
The pattern nerve of word recognizer (Optical Character Recognition, optical character identification), detection pattern information
Network and ASR speech recognition algorithm (Automatic Speech Recognition, the automatic speech knowledge for detecting acoustic information
Not) etc..
Video content retrieval device can pass through the face information in above-mentioned default detection algorithm acquisition video content, text in this way
Word information, pattern-information and acoustic information.
In step S206, what the face information and step S203 that Video content retrieval device obtains step S205 were set
Face retrieval information is compared, to obtain the face information of video content and the face analog information of face retrieval information.
The face analog information includes that the corresponding relationship of face information and face retrieval information, face information are believed with corresponding face retrieval
Human face similarity degree between breath.
Specifically referring to figure 3., Fig. 3 is in the step S206 of the second embodiment of Video content retrieval method of the invention
The acquisition flow chart of face analog information.Step S206 includes:
Step S301, Video content retrieval device obtain video content face information face information feature vector, with
And the face retrieval information eigenvector of face retrieval information.Here face information may include left eye, right eye, nose, Zuo Zui
Angle, the right corners of the mouth, Video content retrieval device can extract corresponding face information feature vector based on the coordinate of above-mentioned face information.
Same Video content retrieval device can extract corresponding face retrieval information eigenvector based on the coordinate of face retrieval information.
Step S302, Video content retrieval device calculate the face information feature vector for the face information that step S301 is obtained
With the vector distance of the face retrieval information eigenvector of all people's face retrieval information.Video content retrieval device can lead in this way
It crosses above-mentioned vector distance and judges similarity between face information and face retrieval information.Here open source similarity searching can be used
The similarity of library faiss progress feature vector.
Step S303, Video content retrieval device obtain the face of face information feature vector and all face retrieval information
Retrieve the vector distance of information eigenvector, and the corresponding people of face retrieval information eigenvector that will there is minimum vector distance
Face retrieval information is set as the corresponding face retrieval information of face information.And feature vector and corresponding face according to face information
The minimum vector distance between information eigenvector is retrieved, determines that the face between face information and face retrieval information is similar
Degree, i.e., minimum vector distance is smaller, and human face similarity degree is higher;Minimum vector distance is bigger, and human face similarity degree is lower.
The character search of Video content retrieval device obtains step S205 simultaneously text information and step S203 setting
Information is compared, to obtain the text information of video content and the text analog information of character search information.The text phase
It include the corresponding relationship of text information Yu character search information like information.
Specifically referring to figure 4., Fig. 4 is in the step S206 of the second embodiment of Video content retrieval method of the invention
The acquisition flow chart of text analog information.Step S206 includes:
Step S401 judges whether text information is identical as any character search information, such as identical, then goes to step
S402;If not identical, then step S403 is gone to;
Step S402 then determines identical document retrieval information if text information is identical as any character search information
For character search information corresponding with text information.
Step S403, as text information is different from any character search information, it is determined that the text information does not correspond to any
Character search information.
In addition the pattern for pattern-information and step the S203 setting that Video content retrieval device obtains step S205 is retrieved
Information is compared, to obtain the pattern-information of video content and the pattern analog information of pattern retrieval information.The pattern phase
It include the corresponding relationship of pattern-information and pattern retrieval information like information.
Specifically, Video content retrieval device directly by identical pattern retrieval information be determined as it is corresponding with pattern-information
Pattern retrieves information.
Furthermore the sound retrieval for acoustic information and step the S203 setting that Video content retrieval device obtains step S205
Information is compared, to obtain the acoustic information of video content and the sound analog information of sound retrieval information.The sound phase
It include the corresponding relationship of acoustic information Yu sound retrieval information like information.
Specifically, Video content retrieval device directly identical sound retrieval information is determined as it is corresponding with acoustic information
Sound retrieval information.
In step S207, Video content retrieval device is similar according to face analog information, the text that step S206 is obtained
Information, pattern analog information and sound analog information generate the content retrieval report of the video content.Content retrieval report
It may include face information search report, retrieving text information report, pattern-information search report and acoustic information search report.
Specifically, Video content retrieval device displaying period according to face information, corresponding face retrieval information with
And human face similarity degree determines face information search report;As video content occurs " opening in 14 points of 30 seconds to 15 minutes periods
Such as there is repeatedly " Zhang San " face, then video within the period of the first setting value (such as 30 seconds) in three " faces, confidence level 95%
Above-mentioned face information search report can be merged operation by content search apparatus, as video content 14 points 30 seconds to 15 minutes
Period in there is " Zhang San " face three times, confidence level 95%.
Video content retrieval device can also be true according to the displaying period of text information and corresponding character search information
Determine retrieving text information report;If video content occurs the text of " Zhang San " in 14 points of 30 seconds to 15 minutes periods, such as exist
Occurs repeatedly " Zhang San " text in the period of second setting value, then Video content retrieval device can be by above-mentioned retrieving text information
Report merges operation, as video content " Zhang San " text three times occurs within 14 points of 30 seconds to 15 minutes periods.
Video content retrieval device can also be true according to the displaying period of pattern-information and corresponding pattern retrieval information
Determine pattern-information search report;As video content the flag figure of illegal organization occurs in 14 points of 30 seconds to 15 minutes periods
Such as there is the flag pattern of multiple illegal organization within the period of third setting value in case, then Video content retrieval device can incite somebody to action
Above-mentioned pattern-information search report merges operation, as video content occurs within 14 points of 30 seconds to 15 minutes periods
The flag pattern of illegal organization three times.
Video content retrieval device can also be true according to the displaying period of acoustic information and corresponding sound retrieval information
Determine acoustic information search report;If video content occurs the sound clip of " Zhang San " in 14 points of 30 seconds to 15 minutes periods,
Occurs the sound clip of repeatedly " Zhang San " such as within the period of the 4th setting value, then Video content retrieval device can be by above-mentioned sound
Message breath search report merges operation, as video content occurs three times within 14 points of 30 seconds to 15 minutes periods
The sound clip of " Zhang San ".
Whether content retrieval report in this way can reflect in video content containing the face retrieval set in step S203
Information, pattern retrieval information, sound retrieval information and character search information, and provide to have and correspond to the credible of retrieval information
Degree.
The Video content retrieval process of the Video content retrieval method of the present embodiment is completed in this way.
On the basis of first embodiment, the Video content retrieval method of the present embodiment is by presetting face nerve net
Network and pattern neural network, further improve the accuracy rate of Video content retrieval;Face analog information in the present embodiment,
Text analog information, pattern analog information and sound analog information acquisition process further reduced corresponding content retrieval report
The manufacturing cost of announcement.
The present invention also provides a kind of Video content retrieval devices, and referring to figure 5., Fig. 5 is Video content retrieval of the invention
The structural schematic diagram of the first embodiment of device.Above-mentioned video content can be used to examine for the Video content retrieval device of the present embodiment
The first embodiment of Suo Fangfa is implemented, and the Video content retrieval device 50 of the present embodiment includes retrieval data obtaining module
51, content extraction module 52, content detection module 53, analog information obtain module 54 and content retrieval module 55.
Retrieval data obtaining module 51 is for obtaining the corresponding face retrieval information of video content, pattern retrieval information, sound
Sound retrieves information and character search information;When content extraction module 52 is set for extracting the audio frame of video content, and pressing
Between interval extract video content video image;Content detection module 53 be used for using default detection algorithm detection video image with
And audio frame, to obtain face information, text information, pattern-information and the acoustic information of video content;Analog information obtains
Module 54 is used to obtain the face information of video content and the text of the face analog information of face retrieval information, video content is believed
It ceases similar to the pattern retrieval pattern of information to the pattern-information of the text analog information of the character search information, video content
The sound analog information of information, the acoustic information of video content and sound retrieval information;Content retrieval module 55 is used for according to people
Face analog information, text analog information, pattern analog information and sound analog information generate content retrieval report.
The Video content retrieval device 50 of the present embodiment is in use, retrieval data obtaining module 51 first obtains video content
Corresponding face retrieval information, pattern retrieval information, sound retrieval information and character search information.
Before data obtaining module 51 is retrieved to video content progress search operaqtion, a content number to be retrieved can be first created
According to library, which may include face retrieval information, pattern retrieval information, sound retrieval information and text inspection
Rope information etc..
Here face retrieval information is the face information for needing to retrieve, such as the face information of Zhang San.Pattern retrieval letter
The pattern-information that breath is retrieved for needs, such as the flag information of illegal organization.Sound retrieval information is the knowledge of pre-set voice
Other sensitive word, such as the name voice of Zhang San.Character search information is pre-set Text region sensitive word, such as the name of Zhang San
Word text etc..
Retrieval data obtaining module 51 can retrieve information, sound retrieval according to the face retrieval information of acquisition, pattern in this way
The information creatings such as information and character search information content data base to be retrieved.
After having created content data base to be retrieved, content extraction module 52 can be carried out the video content that user provides
Contents extraction operation;Specifically, video content can be separated into video frame and audio frame by content extraction module 52, then by setting
Fix time interval (such as 500ms) extract video content video image.The audio frame is used to extract the sound in video content
Information, the video image are used to extract face information, text information and the pattern-information in video content.
Then content detection module 53 using the video image that default detection algorithm detection content extraction module 52 obtains with
And video frame, to obtain the face information in video content, text information, pattern-information and acoustic information.
Here default detection algorithm can be literary for the face neural network of detection face information, the OCR of detection text information
The pattern nerve of word recognizer (Optical Character Recognition, optical character identification), detection pattern information
Network and ASR speech recognition algorithm (Automatic Speech Recognition, the automatic speech knowledge for detecting acoustic information
Not) etc..
Content detection module 53 can pass through the face information in above-mentioned default detection algorithm acquisition video content, text in this way
Information, pattern-information and acoustic information.
Subsequent analog information obtains the face information that module 54 obtains content detection module 53 and retrieval acquisition of information mould
The face retrieval information of block setting is compared, to obtain the face information of video content and the face phase of face retrieval information
Like information.The face analog information includes the corresponding relationship of face information and face retrieval information, face information and corresponding people
Face retrieves the human face similarity degree between information.
Analog information obtains the text information and retrieval acquisition of information mould that module 54 obtains content detection module 53 simultaneously
The character search information of block setting is compared, to obtain the text information of video content and the text phase of character search information
Like information.The text analog information includes the corresponding relationship of text information Yu character search information.
In addition analog information obtains the pattern-information and retrieval acquisition of information mould that module 54 obtains content detection module 53
The pattern retrieval information of block setting is compared, to obtain the pattern-information and the pattern phase of pattern retrieval information of video content
Like information.The pattern analog information includes the corresponding relationship of pattern-information and pattern retrieval information.
Furthermore analog information obtains the acoustic information and retrieval acquisition of information mould that module 54 obtains content detection module 53
The sound retrieval information of block setting is compared, to obtain the acoustic information of video content and the sound phase of sound retrieval information
Like information.The sound analog information includes the corresponding relationship of acoustic information Yu sound retrieval information.
Last content retrieval module 55 obtains face analog information, the similar letter of text that module 54 obtains according to analog information
Breath, pattern analog information and sound analog information generate the content retrieval report of the video content.Content retrieval report can
With whether face retrieval information, pattern retrieval information, sound retrieval information and text containing setting are examined in reflecting video content
Rope information, and the confidence level with corresponding retrieval information is provided, as occurred " Zhang San " face, confidence level 95% in video content;
Occurs the sound clip etc. of " Zhang San " in the text or video content for occurring " Zhang San " in video content.
The Video content retrieval process of the Video content retrieval device 50 of the present embodiment is completed in this way.
The Video content retrieval device of the present embodiment simultaneously believes the face information in video content, text information, pattern
Breath and acoustic information are retrieved, and are improved the accuracy rate of Video content retrieval and are reduced the cost of Video content retrieval.
Fig. 6 is please referred to, Fig. 6 is the structural schematic diagram of the second embodiment of Video content retrieval device of the invention.This reality
The second embodiment of above-mentioned Video content retrieval method can be used to be implemented for the Video content retrieval device for applying example, this implementation
The Video content retrieval device 60 of example includes face neural metwork training module 61, pattern neural metwork training module 62, retrieval
Data obtaining module 63, content extraction module 64, content detection module 65, analog information obtain module 66 and content retrieval mould
Block 67.
Face neural metwork training module 61 uses multiple face sample images for obtaining multiple face sample images
The default face neural network of training;Pattern neural metwork training module 62 uses more for obtaining multiple pattern sample images
A pattern sample image training predetermined pattern neural network.Retrieval data obtaining module 63 is for obtaining the corresponding people of video content
Face retrieves information, pattern retrieval information, sound retrieval information and character search information;Content extraction module 64 is for extracting view
The audio frame of frequency content, and by the video image of setting time interval extraction video content;Content detection module 65 is for using
Default detection algorithm detection video image and audio frame, to obtain face information, the text information, pattern-information of video content
And acoustic information;Analog information obtains module 66 and is used to obtain the face information of video content and the face of face retrieval information
Analog information, the text information of video content and the text analog information of character search information, video content pattern-information with
Pattern retrieves the sound analog information of the pattern analog information of information, the acoustic information of video content and sound retrieval information;It is interior
Hold retrieval module 67 to be used for according to face analog information, text analog information, pattern analog information and sound analog information, it is raw
It is reported at content retrieval.
Fig. 7 is please referred to, Fig. 7 is the knot of the content detection module of the second embodiment of Video content retrieval device of the invention
Structure schematic diagram.The content detection module 65 includes face information detection unit 71, text information detection unit 72, pattern-information inspection
Survey unit 73 and sound information detection unit 74.
Face information detection unit 71 is used to detect the face area in the video image using default face neural network
Domain, to obtain the face information of video content;Text information detection unit 72 is used to detect video figure using Text region algorithm
Character area as in, to obtain the text information of video content;Pattern-information detection unit 73 is used for using predetermined pattern mind
Through the area of the pattern in network detection video image, to obtain the pattern-information of video content;Sound information detection unit 74 is used
In carrying out speech recognition operation to audio frame, to obtain the acoustic information of video content.
Fig. 8 is please referred to, Fig. 8 is that the analog information of the second embodiment of Video content retrieval device of the invention obtains module
Structural schematic diagram.It includes feature vector acquiring unit 81, vector distance computing unit 82, people that the analog information, which obtains module 66,
Face analog information acquiring unit 83, text information judging unit 84, text analog information acquiring unit 85, pattern analog information obtain
Take unit 86 and sound analog information acquiring unit 87.
Feature vector acquiring unit 81 is used to obtain the face information feature vector of the face information of video content, Yi Jiren
The face retrieval information eigenvector of face retrieval information;Vector distance computing unit 82 is used to calculate the face information of face information
The vector distance of the face retrieval information eigenvector of feature vector and all face retrieval information;Face analog information obtains single
Member 83 is for determining that face retrieval information corresponding with face information and corresponding face are similar according to the smallest vector distance
Degree;Text information judging unit 84 is for judging whether text information is identical as any character search information;Text analog information
Acquiring unit 85 is used for as text information is identical with any character search information, then identical character search information is determined as and
The corresponding character search information of text information;If text information is not identical as any character search information, then text information is pair
Answer any character search information;Pattern analog information acquiring unit 86 is used to for identical pattern retrieval information being determined as and pattern
The corresponding pattern of information retrieves information;Sound analog information acquiring unit 87 be used for by identical sound retrieval information be determined as with
The corresponding sound retrieval information of acoustic information.
The Video content retrieval device 60 of the present embodiment in use, first face neural metwork training module 61 can obtain it is more
A face sample image, and it is based on depth convolutional neural networks model, use the default face mind of multiple face sample images training
Through network.The face neural network is for identifying the face in the video image of video content, to extract in video
Face information in appearance.
Subsequent pattern neural metwork training module 62 can obtain multiple pattern sample images, and based on open source model
Faster-rcnn uses multiple pattern sample images training predetermined pattern neural network.The pattern neural network is used for video
Pattern in the video image of content is identified, to extract the pattern-information in video content.
Then retrieval data obtaining module 63 obtains the corresponding face retrieval information of video content, pattern retrieves information, sound
Sound retrieves information and character search information.
Before data obtaining module 63 is retrieved to video content progress search operaqtion, a content number to be retrieved can be first created
According to library, which may include face retrieval information, pattern retrieval information, sound retrieval information and text inspection
Rope information etc..
Retrieval data obtaining module 63 can retrieve information, sound retrieval according to the face retrieval information of acquisition, pattern in this way
The information creatings such as information and character search information content data base to be retrieved.
Subsequent content extraction module 64 can carry out contents extraction operation to the video content that user provides;Specifically, interior
Video content can be separated into video frame and audio frame by holding extraction module 64, then press setting time interval (such as 500ms)
Extract the video image of video content.The audio frame is used to extract the acoustic information in video content, and the video image is for mentioning
Take face information, text information and the pattern-information in video content.
Then content detection module 65 detects the video image obtained and video frame using default detection algorithm, to obtain
Take face information, text information, pattern-information and the acoustic information in video content.
Here default detection algorithm can be literary for the face neural network of detection face information, the OCR of detection text information
The pattern nerve of word recognizer (Optical Character Recognition, optical character identification), detection pattern information
Network and ASR speech recognition algorithm (Automatic Speech Recognition, the automatic speech knowledge for detecting acoustic information
Not) etc..
Specifically, the face information detection unit 71 of content detection module 65 detects video using default face neural network
Human face region in image, to obtain the face information of video content;The text information detection unit 72 of content detection module 65
Using the character area in Text region algorithm detection video image, to obtain the text information of video content;Content detection mould
The pattern-information detection unit 73 of block 65 is using the area of the pattern in predetermined pattern neural network detection video image, to obtain view
The pattern-information of frequency content;The sound information detection unit 74 of content detection module 65 carries out speech recognition operation to audio frame,
To obtain the acoustic information of video content.
Content detection module 65 can pass through the face information in above-mentioned default detection algorithm acquisition video content, text in this way
Information, pattern-information and acoustic information.
The face information that subsequent analog information obtains module 66 and will acquire is compared with the face retrieval information of setting, from
And obtain the face information of video content and the face analog information of face retrieval information.The face analog information includes face letter
Human face similarity degree between breath and the corresponding relationship of face retrieval information, face information and corresponding face retrieval information.
Specifically the acquisition process of face analog information includes:
The feature vector acquiring unit 81 that analog information obtains module 66 obtains the face letter of the face information of video content
Cease the face retrieval information eigenvector of feature vector and face retrieval information.Here face information may include left eye, the right side
Eye, nose, the left corners of the mouth, the right corners of the mouth, feature vector acquiring unit 81 can extract corresponding people based on the coordinate of above-mentioned face information
Face information eigenvector.Same feature vector acquiring unit 81 can extract corresponding face inspection based on the coordinate of face retrieval information
Rope information eigenvector.
The face information that the vector distance computing unit 82 that analog information obtains module 66 calculates the face information obtained is special
Levy the vector distance of the face retrieval information eigenvector of vector and all people's face retrieval information.Vector distance calculates single in this way
Member 82 can judge the similarity between face information and face retrieval information by above-mentioned vector distance.Here open source phase can be used
The similarity of feature vector is carried out like property search library faiss.
The face analog information acquiring unit 83 that analog information obtains module 66 obtains face information feature vector and owns
The vector distance of the face retrieval information eigenvector of face retrieval information, and will there is the face retrieval of minimum vector distance to believe
The corresponding face retrieval information of breath feature vector is set as the corresponding face retrieval information of face information.And it is special according to face information
The minimum vector distance between the corresponding face retrieval information eigenvector of vector sum is levied, determines that face information and face retrieval are believed
Human face similarity degree between breath, i.e., minimum vector distance is smaller, and human face similarity degree is higher;Minimum vector distance is bigger, face phase
It is lower like spending.
The text information that analog information obtains module 66 and will acquire is compared with the character search information of setting, to obtain
Take the text information of video content and the text analog information of character search information.The text analog information include text information with
The corresponding relationship of character search information.
Specifically the acquisition process of text analog information includes:
Analog information obtain module 66 text information judging unit 84 judge text information whether with any character search
Information is identical;
If text information is identical as any character search information, then the text analog information that analog information obtains module 66 obtains
Take unit 85 that identical document retrieval information is determined as character search information corresponding with text information.
If text information is different from any character search information, then the text analog information that analog information obtains module 66 obtains
Unit 85 is taken to determine that the text information does not correspond to any character search information.
In addition the pattern-information that analog information obtains that module 66 will acquire is compared with the pattern of setting retrieval information, from
And obtain the pattern-information of video content and the pattern analog information of pattern retrieval information.The pattern analog information includes pattern letter
The corresponding relationship of breath and pattern retrieval information.
Specifically, the pattern analog information acquiring unit 86 that analog information obtains module 66 directly retrieves identical pattern
Information is determined as pattern retrieval information corresponding with pattern-information.
Furthermore the acoustic information that analog information acquisition module 66 will acquire is compared with the sound retrieval information of setting, from
And obtain the acoustic information of video content and the sound analog information of sound retrieval information.The sound analog information includes sound letter
The corresponding relationship of breath and sound retrieval information.
Specifically, analog information obtains the sound analog information acquiring unit 87 of module 66 directly by identical sound retrieval
Information is determined as sound retrieval information corresponding with acoustic information.
Last content retrieval module 67 obtains face analog information, the similar letter of text that module 66 obtains according to analog information
Breath, pattern analog information and sound analog information generate the content retrieval report of the video content.Content retrieval report can
Including face information search report, retrieving text information report, pattern-information search report and acoustic information search report.
Specifically, the displaying period according to face information of content retrieval module 67, corresponding face retrieval information and
Human face similarity degree determines face information search report;As video content " Zhang San " occurs in 14 points of 30 seconds to 15 minutes periods
Such as there is repeatedly " Zhang San " face, then content retrieval within the period of the first setting value (such as 30 seconds) in face, confidence level 95%
Above-mentioned face information search report can be merged operation by module, if video content is in 14 points of 30 seconds to 15 minutes periods
Inside there is " Zhang San " face three times, confidence level 95%.
Content retrieval module 67 can also be determined according to the displaying period of text information and corresponding character search information
Retrieving text information report;If video content occurs the text of " Zhang San " in 14 points of 30 seconds to 15 minutes periods, such as the
Occurs repeatedly " Zhang San " text in the period of two setting values, then content retrieval module 67 can report above-mentioned retrieving text information
Operation is merged, as video content " Zhang San " text three times occurs within 14 points of 30 seconds to 15 minutes periods.
Content retrieval module 67 can also be determined according to the displaying period of pattern-information and corresponding pattern retrieval information
Pattern-information search report;If there is the flag pattern of illegal organization in 14 points of 30 seconds to 15 minutes periods in video content,
Such as occurs the flag pattern of multiple illegal organization within the period of third setting value, then content retrieval module can be by above-mentioned pattern
Information retrieval report merges operation, as video content occurs three times illegally within 14 points of 30 seconds to 15 minutes periods
The flag pattern of tissue.
Content retrieval module 67 can also be determined according to the displaying period of acoustic information and corresponding sound retrieval information
Acoustic information search report;If video content occurs the sound clip of " Zhang San " in 14 points of 30 seconds to 15 minutes periods, such as
Occurs the sound clip of repeatedly " Zhang San " within the period of the 4th setting value, then content retrieval module can be by above sound information
Search report merges operation, as video content " Zhang San " three times occurs within 14 points of 30 seconds to 15 minutes periods
Sound clip.
Content retrieval report can reflect the face retrieval information for whether containing setting in video content, pattern inspection in this way
Rope information, sound retrieval information and character search information, and the confidence level with corresponding retrieval information is provided.
The Video content retrieval process of the Video content retrieval device 60 of the present embodiment is completed in this way.
On the basis of first embodiment, the Video content retrieval device of the present embodiment is by presetting face nerve net
Network and pattern neural network, further improve the accuracy rate of Video content retrieval;Face analog information in the present embodiment,
Text analog information, pattern analog information and sound analog information acquisition process further reduced corresponding content retrieval report
The manufacturing cost of announcement.
Illustrate Video content retrieval method and Video content retrieval device of the invention below by a specific embodiment
Concrete operating principle.Please referring to Fig. 9 a and Fig. 9 b, Fig. 9 a is that Video content retrieval method of the invention and Video content retrieval fill
The illustrative view of functional configuration of the corresponding server end of the specific embodiment set;Fig. 9 b be Video content retrieval method of the invention and
The Video content retrieval flow chart of the specific embodiment of Video content retrieval device.The Video content retrieval of this specific embodiment fills
It installs in the server terminal that can carry out search operaqtion to video content.Fig. 9 a is please referred to, which includes view
Frequency content access module 91, video content processing module 92, face recognition module 93, Text region module 94, image recognition mould
Block 95, speech recognition module 96, data memory module 97 and output module 98.
Fig. 9 b is please referred to, the Video content retrieval device in this specific embodiment carries out the process packet of Video content retrieval
It includes:
Step S901, video content AM access module 91 by data-interface 9A receive sensitive face database, sensitive sound information,
Sensitive image information and sensitive text information etc..And by above-mentioned sensitive face database, sensitive sound information, sensitive image information with
And sensitive text information is stored in data memory module 97.
Step S902, video content AM access module 91 receives the video content for needing to carry out search operaqtion, in rear video
Hold processing module 92 and contents extraction operation is carried out to the video content.Video content is separated into video frame and audio first
Frame;Video content processing module 92 is using the human face region in default face neural network detection video image, to obtain video
The face information of content;Using the character area in Text region algorithm detection video image, to obtain the text of video content
Information;Using the area of the pattern in predetermined pattern neural network detection video image, to obtain the pattern-information of video content;It is right
Audio frame carries out transcoding, resampling and speech recognition operation, to obtain the acoustic information of video content.
Step S903, face recognition module 93 compare face information and the sensitive face information in sensitive face database
Compared with judging whether there is sensitive face information corresponding with face information, and obtain sensitive face information corresponding with face information.
Model alignment such as is carried out to human face region, finds 5 coordinates (left eye, right eye, nose, the left corners of the mouth, the right corners of the mouth),
Then feature vector is extracted according to 5 coordinates, goes data Layer to carry out inquiry comparison with feature vector, finds out most like candidate
People and corresponding confidence level score value, and return result to output module.
Text information is compared by step S904, Text region module 94 with sensitive text information, judge whether there is with
The corresponding sensitive text information of text information, to obtain sensitive text information corresponding with text information.
The recognition engine text for such as using excellent figure carries out text comparison to the picture that video content AM access module is submitted, and
Return result to output module.
Pattern-information is compared by step S905, picture recognition module 95 with sensitivity pattern information, judge whether there is with
The corresponding sensitivity pattern information of pattern-information, to obtain sensitivity pattern information corresponding with pattern-information.
Image recognition is such as carried out to the pattern-information that video content AM access module is submitted using open source model faster-rcnn
And compare, and result is fed back into output module.
Voice messaging is compared by step S906, speech recognition module 96 with sensitive voice messaging, judge whether there is with
The corresponding sensitive voice messaging of voice messaging, to obtain sensitive voice messaging corresponding with voice messaging.
ASR speech recognition engine is such as listened using wechat intelligence, the audio stream after transcoding resampling is unified to speech recognition module
Speech recognition after VAD is cut offline is carried out, and returns result to output module.
Step S907, output module 98 according to obtain sensitive face information, sensitive text information, sensitivity pattern information with
And sensitive voice messaging, the content retrieval report of corresponding video content is generated, and content retrieval report is fed back into client
9B.Concretely occur " Zhang San " face, confidence level 95% in video content;Occur the text or view of " Zhang San " in video content
Occurs the sound clip etc. of " Zhang San " in frequency content.
It completes in the Video content retrieval method of this specific embodiment and the video of Video content retrieval device in this way
Hold retrieving.
Video content retrieval method and Video content retrieval device of the invention simultaneously in video content face information,
Text information, pattern-information and acoustic information are retrieved, and are improved the accuracy rate of Video content retrieval and are reduced video
The cost of content retrieval;The effective solution higher cost of existing Video content retrieval method and Video content retrieval device
And the technical problem that accuracy rate is lower.
" component ", " module ", " system ", " interface ", " process " etc. are generally intended to as used herein the term
Refer to computer related entity: hardware, the combination of hardware and software, software or software in execution.For example, component can be but not
It is limited to be the process on a processor of running, processor, object, executable application, thread, program and/or the computer executed.
By diagram, both the application and the controller run on the controller can be component.One or more components can have
It is in the process executed and/or thread, and component can be located on a computer and/or be distributed in two or more meters
Between calculation machine.
Figure 10 and the discussion below, which provide, sets the electronics where realizing Video content retrieval device of the present invention
Brief, summary the description of standby working environment.The working environment of Figure 10 be only an example of working environment appropriate simultaneously
And suggestion is not intended to about the purposes of working environment or any restrictions of the range of function.Example electronic equipment 1012 includes but not
It is limited to wearable device, helmet, medical treatment & health platform, personal computer, server computer, hand-held or on knee sets
Standby, mobile device (such as mobile phone, personal digital assistant (PDA), media player etc.), multicomputer system, consumption
Type electronic equipment, minicomputer, mainframe computer, distributed computing environment including above-mentioned arbitrary system or equipment, etc..
Although not requiring, in the common background that " computer-readable instruction " is executed by one or more electronic equipments
Lower description embodiment.Computer-readable instruction can be distributed via computer-readable medium and (be discussed below).It is computer-readable
Instruction can be implemented as program module, for example executes particular task or realize the function of particular abstract data type, object, application
Programming interface (API), data structure etc..Typically, the function of the computer-readable instruction can be in various environment arbitrarily
Combination or distribution.
Figure 10 illustrates the electronic equipment including one or more embodiments in Video content retrieval device of the invention
1012 example.In one configuration, electronic equipment 1012 includes at least one processing unit 1016 and memory 1018.According to
The exact configuration and type of electronic equipment, memory 1018 can be volatibility (such as RAM), it is non-volatile (such as
ROM, flash memory etc.) or both certain combination.The configuration is illustrated in Figure 10 by dotted line 1014.
In other embodiments, electronic equipment 1012 may include supplementary features and/or function.For example, equipment 1012 is also
It may include additional storage device (such as removable and/or non-removable) comprising but it is not limited to magnetic memory apparatus, light
Storage device etc..This additional memory devices are illustrated in Figure 10 by storage device 1020.In one embodiment, for real
The computer-readable instruction of existing one or more embodiments provided in this article can be in storage device 1020.Storage device
1020 other computer-readable instructions that can also be stored for realizing operating system, application program etc..Computer-readable instruction
It can be loaded into memory 1018 and be executed by such as processing unit 1016.
Term as used herein " computer-readable medium " includes computer storage medium.Computer storage medium includes
The volatibility that any method or technique of the information of such as computer-readable instruction or other data etc is realized for storage
With non-volatile, removable and nonremovable medium.Memory 1018 and storage device 1020 are the realities of computer storage medium
Example.Computer storage medium includes but is not limited to RAM, ROM, EEPROM, flash memory or other memory technologies, CD-ROM, number
Universal disc (DVD) or other light storage devices, cassette tape, tape, disk storage device or other magnetic storage apparatus can be with
Any other medium for storing expectation information and can be accessed by electronic equipment 1012.Any such computer storage is situated between
Matter can be a part of electronic equipment 1012.
Electronic equipment 1012 can also include the communication connection 1026 for allowing electronic equipment 1012 to communicate with other equipment.It is logical
Letter connection 1026 can include but is not limited to modem, network interface card (NIC), integrated network interface, radiofrequency launcher/
Receiver, infrared port, USB connection or other interfaces for electronic equipment 1012 to be connected to other electronic equipments.Communication
Connection 1026 may include wired connection or wireless connection.Communication connection 1026 can emit and/or receive communication medium.
Term " computer-readable medium " may include communication media.Communication media typically comprises computer-readable instruction
Or other data in " the own modulated data signal " of such as carrier wave or other transmission mechanisms etc, and passed including any information
Send medium.Term " own modulated data signal " may include such signal: one or more of the characteristics of signals is according to general
Mode of the information coding into signal is set or changed.
Electronic equipment 1012 may include input equipment 1024, for example, keyboard, mouse, pen, voice-input device, touch it is defeated
Enter equipment, infrared camera, video input apparatus and/or any other input equipment.It also may include that output is set in equipment 1012
Standby 1022, such as one or more displays, loudspeaker, printer and/or other any output equipments.1024 He of input equipment
Output equipment 1022 can be connected to electronic equipment 1012 via wired connection, wireless connection or any combination thereof.In a reality
It applies in example, input equipment or output equipment from another electronic equipment are used as the input equipment of electronic equipment 1012
1024 or output equipment 1022.
The component of electronic equipment 1012 can be connected by various interconnection (such as bus).Such interconnection may include outer
Enclose component interconnection (PCI) (such as quick PCI), universal serial bus (USB), firewire (IEEE 1394), optical bus structure etc.
Deng.In another embodiment, the component of electronic equipment 1012 can pass through network interconnection.For example, memory 1018 can be by
Multiple physical memory cells arcs composition in different physical locations, by network interconnection.
It would be recognized by those skilled in the art that the storage equipment for storing computer-readable instruction can be across network point
Cloth.For example, can store via the electronic equipment 1030 that network 1028 accesses for realizing one provided by the present invention or
The computer-readable instruction of multiple embodiments.The accessible electronic equipment 1030 of electronic equipment 1012 and downloading computer is readable
What is instructed is part or all of for execution.Alternatively, electronic equipment 1012 can be downloaded a plurality of computer-readable as needed
It instructs or some instruction can execute at electronic equipment 1012 and some instructions can be held at electronic equipment 1030
Row.
There is provided herein the various operations of embodiment.In one embodiment, one or more operations can be with structure
At the computer-readable instruction stored on one or more computer-readable mediums, will make to succeed in one's scheme when being executed by electronic equipment
It calculates equipment and executes the operation.Describing the sequences of some or all of operations, to should not be construed as to imply that these operations necessarily suitable
Sequence is relevant.It will be appreciated by those skilled in the art that the alternative sequence of the benefit with this specification.Furthermore, it is to be understood that
Not all operation must exist in each embodiment provided in this article.
Moreover, although the disclosure, this field skill has shown and described relative to one or more implementations
Art personnel will be appreciated that equivalent variations and modification based on the reading and understanding to the specification and drawings.The disclosure include it is all this
The modifications and variations of sample, and be limited only by the scope of the following claims.In particular, to by said modules (such as element,
Resource etc.) the various functions that execute, term for describing such components is intended to correspond to the specified function for executing the component
The random component (unless otherwise instructed) of energy (such as it is functionally of equal value), even if illustrated herein with execution in structure
The disclosure exemplary implementations in function open structure it is not equivalent.In addition, although the special characteristic of the disclosure
Through being disclosed relative to the only one in several implementations, but this feature can with such as can be to given or specific application
For be expectation and one or more other features combinations of other advantageous implementations.Moreover, with regard to term " includes ", " tool
Have ", " containing " or its deformation be used in specific embodiments or claims for, such term be intended to with term
The similar mode of "comprising" includes.
Each functional unit in the embodiment of the present invention can integrate in a processing module, be also possible to each unit list
It is solely physically present, can also be integrated in two or more units in a module.Above-mentioned integrated module can both use
Formal implementation of hardware can also be realized in the form of software function module.If the integrated module is with software function
The form of module is realized and when sold or used as an independent product, also can store in computer-readable storage Jie
In matter.Storage medium mentioned above can be read-only memory, disk or CD etc..Above-mentioned each device or system, can be with
Execute the method in correlation method embodiment.
Although the serial number before embodiment only makes for convenience of description in conclusion the present invention is disclosed above with embodiment
With not causing to limit to the sequence of various embodiments of the present invention.Also, above-described embodiment is not intended to limit the invention, this field
Those of ordinary skill, without departing from the spirit and scope of the present invention, can make it is various change and retouch, therefore it is of the invention
Protection scope subjects to the scope of the claims.
Claims (15)
1. a kind of Video content retrieval method characterized by comprising
Obtain the corresponding face retrieval information of video content, pattern retrieval information, sound retrieval information and character search information;
The audio frame of the video content is extracted, and extracts the video image of the video content by setting time interval;
The video image and the audio frame are detected using default detection algorithm, to obtain the face letter of the video content
Breath, text information, pattern-information and acoustic information;
Obtain the face information of the video content and the face analog information of the face retrieval information, the video content
Text information and the text analog information of the character search information, the pattern-information of the video content and the pattern are retrieved
The sound analog information of the pattern analog information of information, the acoustic information of the video content and the sound retrieval information;With
And
According to the face analog information, the text analog information, the pattern analog information and the similar letter of the sound
Breath generates content retrieval report.
2. Video content retrieval method according to claim 1, which is characterized in that described to be detected using default detection algorithm
The video image and the video frame, with obtain the face information of the video content, text information, pattern-information and
The step of acoustic information includes:
The human face region in the video image is detected using default face neural network, to obtain the face of the video content
Information;The character area in the video image is detected using Text region algorithm, to obtain the text letter of the video content
Breath;The area of the pattern in the video image is detected using predetermined pattern neural network, to obtain the pattern of the video content
Information;Speech recognition operation is carried out to the audio frame, to obtain the acoustic information of the video content.
3. Video content retrieval method according to claim 2, which is characterized in that the Video content retrieval method is also wrapped
It includes:
Multiple face sample images are obtained, and use multiple face sample image training default face neural network;
Multiple pattern sample images are obtained, and use multiple pattern sample image training predetermined pattern neural network.
4. Video content retrieval method according to claim 2, which is characterized in that the people for obtaining the video content
The step of face analog information of face information and the face retrieval information includes:
Obtain the face inspection of the face information feature vector and the face retrieval information of the face information of the video content
Rope information eigenvector;
Calculate the face information face information feature vector and all face retrieval information face retrieval information characteristics to
The vector distance of amount;
According to the smallest vector distance, determine that face retrieval information corresponding with the face information and corresponding face are similar
Degree.
5. Video content retrieval method according to claim 2, which is characterized in that
The step of text information and the text analog information of the character search information for obtaining the video content includes:
Judge whether the text information is identical as any character search information;
If so, identical character search information is then determined as character search information corresponding with the text information;If not, institute
It states text information and does not correspond to any character search information;
The step of pattern-information for obtaining the video content and the pattern retrieve the pattern analog information of information include:
Identical pattern retrieval information is determined as pattern retrieval information corresponding with the pattern-information;
The step of acoustic information and the sound analog information of the sound retrieval information for obtaining the video content includes:
Identical sound retrieval information is determined as sound retrieval information corresponding with the acoustic information.
6. Video content retrieval method according to claim 1, which is characterized in that described according to the similar letter of the face
Breath, the text analog information, the pattern analog information and the sound analog information generate the step of content retrieval report
Suddenly include:
Face information is determined according to the displaying period of the face information, corresponding face retrieval information and human face similarity degree
Search report;Retrieving text information report is determined according to the displaying period of the text information and corresponding character search information
It accuses;Pattern-information search report is determined according to the displaying period of the pattern-information and corresponding pattern retrieval information;Root
Acoustic information search report is determined according to the displaying period and corresponding sound retrieval information of the acoustic information.
7. Video content retrieval method according to claim 6, which is characterized in that described according to the similar letter of the face
Breath, the text analog information, the pattern analog information and the sound analog information generate the step of content retrieval report
Suddenly further include:
Face information search report by the time difference for showing the period less than the first setting value merges operation;When will show
Between section time difference less than the second setting value retrieving text information report merge operation;It will show the time difference of period
Pattern-information search report less than third setting value merges operation;The time difference of period will be shown less than the 4th setting
The acoustic information search report of value merges operation.
8. a kind of Video content retrieval device characterized by comprising
Data obtaining module is retrieved, for obtaining the corresponding face retrieval information of video content, pattern retrieval information, sound retrieval
Information and character search information;
Content extraction module is extracted in the video for extracting the audio frame of the video content, and by setting time interval
The video image of appearance;
Content detection module, for detecting the video image and the audio frame using default detection algorithm, to obtain
State face information, text information, pattern-information and the acoustic information of video content;
Analog information obtains module, the face phase of face information and the face retrieval information for obtaining the video content
Like information, the text information of the video content and the text analog information of the character search information, the video content
The acoustic information and the sound retrieval of pattern-information and the pattern analog information of pattern retrieval information, the video content
The sound analog information of information;And
Content retrieval module, for according to the face analog information, the text analog information, the pattern analog information with
And the sound analog information, generate content retrieval report.
9. Video content retrieval device according to claim 8, which is characterized in that the content detection module includes:
Face information detection unit, for using default face neural network to detect the human face region in the video image, with
Obtain the face information of the video content;
Text information detection unit, for detecting the character area in the video image using Text region algorithm, to obtain
The text information of the video content;
Pattern-information detection unit, for detecting the area of the pattern in the video image using predetermined pattern neural network, with
Obtain the pattern-information of the video content;
Sound information detection unit, for carrying out speech recognition operation to the audio frame, to obtain the sound of the video content
Message breath.
10. Video content retrieval device according to claim 9, which is characterized in that the Video content retrieval device packet
It includes:
Face neural metwork training module for obtaining multiple face sample images, and uses multiple face sample images
The training default face neural network;
Pattern neural metwork training module for obtaining multiple pattern sample images, and uses multiple pattern sample images
The training predetermined pattern neural network.
11. Video content retrieval device according to claim 9, which is characterized in that the analog information obtains module packet
It includes:
Feature vector acquiring unit, the face information feature vector of the face information for obtaining the video content, Yi Jisuo
State the face retrieval information eigenvector of face retrieval information;
Vector distance computing unit, for calculating the face information feature vector and all face retrieval information of the face information
Face retrieval information eigenvector vector distance;
Face analog information acquiring unit, for determining face corresponding with the face information according to the smallest vector distance
Retrieve information and corresponding human face similarity degree.
12. Video content retrieval device according to claim 9, which is characterized in that the analog information obtains module packet
It includes:
Text information judging unit, for judging whether the text information is identical as any character search information;
Text analog information acquiring unit, it is identical as any character search information for such as text information, then it will be identical
Character search information is determined as character search information corresponding with the text information;As the text information not with any text
It is identical to retrieve information, then the text information is corresponding any character search information;
Pattern analog information acquiring unit, for identical pattern retrieval information to be determined as figure corresponding with the pattern-information
Case retrieves information;
Sound analog information acquiring unit, for identical sound retrieval information to be determined as sound corresponding with the acoustic information
Sound retrieves information.
13. Video content retrieval device according to claim 8, which is characterized in that the content retrieval module is specifically used
According to the displaying period of the face information, corresponding face retrieval information and human face similarity degree determine face information examine
Rope report;Retrieving text information report is determined according to the displaying period of the text information and corresponding character search information
It accuses;Pattern-information search report is determined according to the displaying period of the pattern-information and corresponding pattern retrieval information;Root
Acoustic information search report is determined according to the displaying period and corresponding sound retrieval information of the acoustic information.
14. Video content retrieval device according to claim 13, which is characterized in that the content retrieval module is also used to
Face information search report by the time difference for showing the period less than the first setting value merges operation;It will show the period
Time difference less than the second setting value retrieving text information report merge operation;The time difference for showing the period is less than
The pattern-information search report of third setting value merges operation;The time difference of period will be shown less than the 4th setting value
Acoustic information search report merges operation.
15. a kind of storage medium is stored with processor-executable instruction, described instruction is by one or more processors
Load, to execute such as Video content retrieval method any in claim 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811009469.6A CN110209880A (en) | 2018-08-31 | 2018-08-31 | Video content retrieval method, Video content retrieval device and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811009469.6A CN110209880A (en) | 2018-08-31 | 2018-08-31 | Video content retrieval method, Video content retrieval device and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110209880A true CN110209880A (en) | 2019-09-06 |
Family
ID=67779926
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811009469.6A Pending CN110209880A (en) | 2018-08-31 | 2018-08-31 | Video content retrieval method, Video content retrieval device and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110209880A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110866491A (en) * | 2019-11-13 | 2020-03-06 | 腾讯科技(深圳)有限公司 | Target retrieval method, device, computer readable storage medium and computer equipment |
CN112132794A (en) * | 2020-09-14 | 2020-12-25 | 杭州安恒信息技术股份有限公司 | Text positioning method, device and equipment for audit video and readable storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106610969A (en) * | 2015-10-21 | 2017-05-03 | 上海文广互动电视有限公司 | Multimodal information-based video content auditing system and method |
CN107688571A (en) * | 2016-08-04 | 2018-02-13 | 上海德拓信息技术股份有限公司 | The video retrieval method of diversification |
CN108171207A (en) * | 2018-01-17 | 2018-06-15 | 百度在线网络技术(北京)有限公司 | Face identification method and device based on video sequence |
-
2018
- 2018-08-31 CN CN201811009469.6A patent/CN110209880A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106610969A (en) * | 2015-10-21 | 2017-05-03 | 上海文广互动电视有限公司 | Multimodal information-based video content auditing system and method |
CN107688571A (en) * | 2016-08-04 | 2018-02-13 | 上海德拓信息技术股份有限公司 | The video retrieval method of diversification |
CN108171207A (en) * | 2018-01-17 | 2018-06-15 | 百度在线网络技术(北京)有限公司 | Face identification method and device based on video sequence |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110866491A (en) * | 2019-11-13 | 2020-03-06 | 腾讯科技(深圳)有限公司 | Target retrieval method, device, computer readable storage medium and computer equipment |
CN110866491B (en) * | 2019-11-13 | 2023-11-24 | 腾讯科技(深圳)有限公司 | Target retrieval method, apparatus, computer-readable storage medium, and computer device |
CN112132794A (en) * | 2020-09-14 | 2020-12-25 | 杭州安恒信息技术股份有限公司 | Text positioning method, device and equipment for audit video and readable storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111260545B (en) | Method and device for generating image | |
JP6681342B2 (en) | Behavioral event measurement system and related method | |
CN109543058B (en) | Method, electronic device, and computer-readable medium for detecting image | |
CN109034069B (en) | Method and apparatus for generating information | |
US20160041894A1 (en) | Structured logging and instrumentation framework | |
JP2019212290A (en) | Method and device for processing video | |
WO2020024484A1 (en) | Method and device for outputting data | |
CN109189879B (en) | Electronic book display method and device | |
US9552342B2 (en) | Generating a collage for rendering on a client computing device | |
JP7153004B2 (en) | COMMUNITY Q&A DATA VERIFICATION METHOD, APPARATUS, COMPUTER DEVICE, AND STORAGE MEDIUM | |
CN109408821B (en) | Corpus generation method and device, computing equipment and storage medium | |
CN104866275B (en) | Method and device for acquiring image information | |
CN111522987A (en) | Image auditing method and device and computer readable storage medium | |
CN109918669A (en) | Entity determines method, apparatus and storage medium | |
CN110728319B (en) | Image generation method and device and computer storage medium | |
CN111275784A (en) | Method and device for generating image | |
CN111027419B (en) | Method, device, equipment and medium for detecting video irrelevant content | |
CN113688310B (en) | Content recommendation method, device, equipment and storage medium | |
CN109947971B (en) | Image retrieval method, image retrieval device, electronic equipment and storage medium | |
WO2015148420A1 (en) | User inactivity aware recommendation system | |
CN111199157A (en) | Text data processing method and device | |
CN106530377B (en) | Method and apparatus for manipulating three-dimensional animated characters | |
CN116628150A (en) | Method, apparatus, device and storage medium for question and answer | |
CN110209880A (en) | Video content retrieval method, Video content retrieval device and storage medium | |
CN113837257B (en) | Target detection method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |