CN109495766A

CN109495766A - A kind of method, apparatus, equipment and the storage medium of video audit

Info

Publication number: CN109495766A
Application number: CN201811438719.8A
Authority: CN
Inventors: 石峰; 刘振强; 梁柱锦
Original assignee: Guangzhou Baiguoyuan Information Technology Co Ltd
Current assignee: Guangzhou Baiguoyuan Information Technology Co Ltd
Priority date: 2018-11-27
Filing date: 2018-11-27
Publication date: 2019-03-19

Abstract

The invention discloses method, apparatus, equipment and the storage mediums of a kind of audit of video.Wherein, this method comprises: extracting the different type content information in pending video；According to the correlation between different type content information, the feature vector of different type content information is merged, the feature vector of pending video is obtained；According to the feature vector of the pending video, specific gravity of the pending video under different default audit classifications is respectively obtained；According to the specific gravity under different default audit classifications, the audit classification of the pending video is determined.Technical solution provided in an embodiment of the present invention realizes the video audit under polymorphic type fusion, solves the problems, such as that there are limitations for video audit in the prior art, improve the comprehensive and accuracy of video audit.

Description

A kind of method, apparatus, equipment and the storage medium of video audit

Technical field

A kind of audited the present embodiments relate to Internet technical field more particularly to video method, apparatus, equipment and Storage medium.

Background technique

With the rapid development of Internet technology, internet video flow is increased significantly in recent years, short-sighted frequency, live streaming etc. Various novel user's original contents promote the video of transmission on Internet more and more abundant.At the same time, largely it is related to terrified, sudden and violent The violation video of the topics such as power, pornographic or political sensitivity, can be also uploaded on internet by illegal user and fast propagation comes out. At this point, how to be in high efficiency and low cost filtered these violation videos is that the application of the internet videos such as short-sighted frequency, live streaming produces The common difficulty faced in product.

Since the video resource amount propagated in internet is increasing, by manually to the video resource for being uploaded to internet Whether audited comprising violation content, is necessarily required to consume a large amount of human costs, and efficiency is lower；Machine is usually utilized at present Video content is audited in device study automatically, passes through picture, text or the sound under different modalities in single analysis video content The information such as sound judge current video with the presence or absence of violation content, to accordingly be audited to internet video.

And the information under the single types such as picture, text or sound in video is mostly only analyzed in the prior art, come true Whether in violation of rules and regulations to determine content in video, the audit of internet video is had some limitations, reduces video audit Accuracy.

Summary of the invention

The embodiment of the invention provides method, apparatus, equipment and the storage mediums of a kind of audit of video, to solve existing skill There are problems that limitation for video audit in art, realizes the video audit under polymorphic type fusion, improve the complete of video audit Face property and accuracy.

In a first aspect, the embodiment of the invention provides a kind of methods of video audit, this method comprises:

Extract the different type content information in pending video；

According to the correlation between different type content information, the feature vector of different type content information is merged, is obtained The feature vector of pending video；

According to the feature vector of the pending video, pending video is respectively obtained under the default audit classification of difference Specific gravity；

According to the specific gravity under different default audit classifications, the audit classification of the pending video is determined.

Further, according to the correlation between different type content information, the feature of different type content information is merged Vector obtains the feature vector of pending video, comprising:

Each different type content information is inputted into the fusion learning model constructed in advance, is learnt by the fusion Study submodel in model under different type extracts the feature vector of the different type content information respectively；

According to the correlation between different type content information, pass through the fusion submodel pair in the fusion learning model The feature vector of different type content information is merged, and the feature vector of pending video is obtained.

Further, according to the feature vector of the pending video, it is default careful in difference to respectively obtain pending video Specific gravity under core classification, comprising:

According to the feature vector of the pending video, respectively by the default regression function in the fusion submodel To specific gravity of the pending video under different default audit classifications.

Further, the fusion learning model is by executing operations described below building:

Different types of samples of content information in training sample is extracted, the training sample is that target audits going through under classification History video；

Extract the feature vector of the samples of content information respectively by the study submodel under different type, and according to each Correlation between the samples of content information is carried out by feature vector of the fusion submodel to each samples of content information Fusion, obtains the feature vector of training sample；

According to the feature vector of the training sample, respectively obtained by the default regression function in the fusion submodel Specific gravity of the training sample under different default audit classifications；

Classification and the specific gravity in the case where Bu Tong presetting audit classification are audited according to the target of training sample, determines corresponding classification Loss, and the Classification Loss is subjected to backpropagation, each study submodel and fusion submodel are modified, and continue to obtain Training sample new under the target audit classification is taken, until the Classification Loss under target audit classification is lower than default loss Threshold value；

The training sample reacquired under other audit classifications is trained again, until under preset all audits classification Classification Loss be below corresponding default loss threshold value, then by obtained each study submodel and fusant model construction be melt Close learning model.

Further, the different type content information includes sequence of pictures, tonic train and the text in pending video Word sequence.

Further, the different type content information in pending video is extracted, comprising:

After the pending video segmentation, video frame is extracted in the pending video of segmentation；

The video frame extracted is combined, corresponding sequence of pictures is obtained.

After the pending video segmentation, the resampling audio-frequency information in the pending video of segmentation；

The spectrum signature in the audio-frequency information of resampling is extracted by Mel-cepstral MFC algorithm；

The spectrum signature extracted is combined, corresponding tonic train is obtained.

The text information in the pending video is obtained by optical character identification OCR algorithm, obtains corresponding text Sequence.

Further, according to the specific gravity under different default audit classifications, the audit classification of the pending video is determined, Include:

Specific gravity of the pending video under violation classification exceeds preset violation threshold value, then sends out to manual examination and verification platform Send the pending video；

The audit classification of the pending video is determined according to the feedback information of the manual examination and verification platform.

Second aspect, the embodiment of the invention provides a kind of device of video audit, which includes:

Information extraction modules, for extracting the different type content information in pending video；

Fusion Features module, for merging different type content letter according to the correlation between different type content information The feature vector of breath obtains the feature vector of pending video；

Specific gravity determining module respectively obtains pending video not for the feature vector according to the pending video With the specific gravity under default audit classification；

Category determination module is audited, for determining the pending view according to the specific gravity under different default audit classifications The audit classification of frequency.

Further, the Fusion Features module, is specifically used for:

Further, the specific gravity determining module, is specifically used for:

Further, the information extraction modules, are specifically used for:

Further, the information extraction modules, also particularly useful for:

Further, the audit category determination module, is specifically used for:

The third aspect, the embodiment of the invention provides a kind of equipment, which includes:

One or more processors；

Storage device, for storing one or more programs；

When one or more of programs are executed by one or more of processors, so that one or more of processing The method that device realizes the audit of video described in any embodiment of that present invention.

Fourth aspect, the embodiment of the invention provides a kind of computer readable storage mediums, are stored thereon with computer journey Sequence realizes the method for the audit of video described in any embodiment of that present invention when the program is executed by processor.

The embodiment of the invention provides method, apparatus, equipment and the storage mediums of a kind of audit of video, pending by extracting The feature vector of different type content information in core video merges each feature vector according to the correlation between feature vector, and Specific gravity of the pending video under different default audit classifications is respectively obtained by fused feature vector, so that it is determined that pending The audit classification of core video realizes the video audit under polymorphic type fusion, solves and audit presence for video in the prior art The problem of limitation, improves the comprehensive and accuracy of video audit.

Detailed description of the invention

By reading a detailed description of non-restrictive embodiments in the light of the attached drawings below, of the invention other Feature, objects and advantages will become more apparent upon:

Fig. 1 is a kind of flow chart of the method for video audit that the embodiment of the present invention one provides；

Fig. 2 is a kind of schematic illustration of video review process provided by Embodiment 2 of the present invention；

Fig. 3 is the building schematic diagram in the method for the video audit that the embodiment of the present invention three provides to fusion learning model；

A kind of scene frame of the applicable application scenarios of the method for video audit that Fig. 4 A is provided by the embodiment of the present invention four Composition；

Fig. 4 B is the schematic illustration for the video review process that the embodiment of the present invention four provides；

Fig. 5 is a kind of structural schematic diagram of the device for video audit that the embodiment of the present invention five provides；

Fig. 6 is a kind of structural schematic diagram for equipment that the embodiment of the present invention six provides.

Specific embodiment

The present invention is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining the present invention rather than limiting the invention.It also should be noted that in order to just Only the parts related to the present invention are shown in description, attached drawing rather than entire infrastructure.

The present embodiments relate to optical character identification (Optical Character Recognition, OCR), from All kinds of Internet technologies such as right Language Processing, speech recognition, computer vision and machine learning, are primarily adapted for use in and upload to user Short-sighted frequency, a kind of novel user original content in the internet videos stream such as live video (User Generated Content, UGC) in the content auditing of video.The embodiment of the present invention mainly uses multiple deep neural network models in a large amount of existing videos Different type content information carry out corresponding analysis, and the correlation between variant type content information is carried out effective It excavates, so that variant type content information be merged, obtains corresponding trained submodel, and by multiple trained submodels Building fusion learning model, merges the different type content information in pending video according to the fusion learning model, The specific gravity under different default audit classifications is obtained, so that it is determined that the audit classification of the pending video, it can be to pending view Inherent correlation in frequency between all types of content informations is effectively excavated, and the comprehensive and quasi- of video audit is improved True property.

Embodiment one

Fig. 1 is a kind of flow chart of the method for video audit that the embodiment of the present invention one provides, and the present embodiment can be applied to In the video audit terminal that any original video that can be uploaded to user is audited.The scheme of the embodiment of the present invention can be with Suitable for how to solve the problems, such as video audit, there are limitations.A kind of method of video audit provided in this embodiment can be by The device of video provided in an embodiment of the present invention audit executes, which can carry out reality by way of software and/or hardware It is existing, and be integrated in the equipment for executing this method, which can be any intelligence for carrying corresponding video auditing capabilities It can terminal device.

Specifically, this method may include steps of with reference to Fig. 1:

S110 extracts the different type content information in pending video.

Wherein, pending video can be uploaded in internet by types of applications program for any user and be propagated , the user's original video audited；Short-sighted frequency, the network direct broadcasting video that such as user records.Exist in order to prevent Fast propagation of the user's original video of violation video content in internet, it is therefore desirable to before video transmission, to upload The content of all types of user original video audited, the video filtering that violation content will be present comes out.Meanwhile in different type Holding information is the content information under the variant type isolated in pending video, such as picture, sound and text；This implementation Different type content information may include sequence of pictures, tonic train and the word sequence in pending video in example.Specifically, In order to solve the problems, such as video audit, there are limitations, it is necessary first to determine the content in pending video under each type, with Continue after an action of the bowels and the inherent correlation between different type content information is effectively excavated, obtains fused Global Information.

Optionally, in the present embodiment firstly the need of obtaining corresponding pending video, can by with each user terminal It establishes and is wirelessly connected, this pending video can be obtained when user uploads corresponding video；Administrator can also be passed through Directly by corresponding pending video input into the terminal device for executing the video reviewing method provided in the present embodiment, thus Directly obtain this pending video.After getting pending video, existing information isolation technics can be passed through first The different types content informations such as the picture, sound, text for including in pending video are extracted respectively, so as to subsequent to pending Inherent correlation in video between different type content information is effectively excavated, and the Global Information of the pending video is obtained Feature.

Optionally, by different type content information in this present embodiment may include sequence of pictures in pending video, Tonic train and word sequence, therefore when extracting the different type content information in pending video, it may include to pending The extraction respectively of the content information of this three aspect of sequence of pictures, tonic train and word sequence in core video.

1) it in the present embodiment when being extracted to the sequence of pictures in pending video, extracts in pending video not Same type content information, can specifically include: after pending video segmentation, extract video in the pending video of segmentation Frame；The video frame extracted is combined, corresponding sequence of pictures is obtained.

Specifically, due in the pending video that gets may continuous multiple frames video image be it is similar, at this time for Reduction data processing amount can carry out segment processing to pending video according to certain time interval in the present embodiment, make Obtaining has certain similitude between each video frame for including in each section of pending video, at this time in each section of pending video A frame video image is randomly selected in all videos frame for including, gives up other video frames in the pending video of the segmentation, So that it is guaranteed that there are the differences of certain picture for the video frame extracted in the pending video of each segmentation, to greatly reduce The redundancy of video frame during processing improves the subsequent analysis rate to video frame.

Further, the video frame screen size of the pending video of difference uploaded due to different user may be different, and When analyzing the sequence of pictures in different pending videos, need to guarantee the picture ruler of the sequence of pictures of subsequent analysis It is very little identical, therefore after extracting video frame in the pending video of each segmentation, it needs to zoom to each video frame predefined Screen size, to guarantee subsequent to analyze the sequence of pictures of identical size；Meanwhile it is zooming to for extracting is predetermined The video frame of the screen size of justice is grouped together according to the time sequencing of segmentation, obtains different type content in pending video Sequence of pictures in information, it is subsequent that corresponding analysis and fusion are carried out to the sequence of pictures, improve the rate of video audit.

2) it in the present embodiment when being extracted to the tonic train in pending video, extracts in pending video not Same type content information, can specifically include: after pending video segmentation, the resampling audio in the pending video of segmentation Information；The frequency in the audio-frequency information of resampling is extracted by Mel-cepstral (Mel-Frequency Cepstrum, MFC) algorithm Spectrum signature；The spectrum signature extracted is combined, corresponding tonic train is obtained.

Wherein, MFC is the frequency spectrum that can be used to represent short-term voice signal, is based on indicating with nonlinear melscale Log spectrum and its linear cosine conversion, the frequency band on MPC is uniformly distributed on melscale at this time, that is to say, that Such frequency band can it is relatively general used by linear cepstrum representation method, more connect with the nonlinear auditory system of the mankind Closely, therefore generally MFC algorithm is selected to observe the phonetic feature in voice identification system, such as: it can recognize automatically User penetrates the number that phone is said.

Specifically, for the inherent correlation between subsequent analysis audio-frequency information and sequence of pictures, can get to Audit video after, according to sequence of pictures handle in identical time interval the pending video is segmented, with guarantee with Matching degree between sequence of pictures；At this time since different pending videos are in initial record, corresponding audio-frequency information acquisition When selected sound frequency and different, it is therefore desirable to according to frequency predetermined to each in the pending video of each segmentation The audio-frequency information being segmented in pending video carries out resampling, to ensure at the subsequent audio-frequency information under same frequency Reason.

Further, it is obtaining in each segmentation after the audio-frequency information of resampling, in order to be carried out to the audio-frequency information of each segmentation Analysis, the spectrum signature of the audio-frequency information of resampling in each segmentation can be obtained by MFC algorithm, to recognize in the segmentation The voice messaging of audio-frequency information.Wherein, it since MFC algorithm is when obtaining corresponding frequency spectrum, disposably can not all obtain whole The corresponding spectrum signature of the audio-frequency information of resampling in a segmentation, at this time can be by presetting the sliding window of fixed size Mouthful, which is slided on the audio-frequency information of each segmentation, and extract audio in the sliding window every time using MFC algorithm The corresponding spectrum signature of signal, at this time since sliding window is in sliding process, MFC algorithm is extracting the frequency spectrum in window every time When feature, different sliding windows may include same audio-frequency information, can repeat at this time to the spectrum signature of the audio-frequency information It extracts, it is ensured that the complete and accuracy of the corresponding spectrum signature of the audio signal improves video and audits effect.It is adopted again extracting After spectrum signature in the audio-frequency information of sample, it can combine the spectrum signature extracted one according to corresponding time sequencing It rises, obtains the tonic train in pending video in different type content information, it is subsequent that corresponding point is carried out to the tonic train Analysis and fusion, improve the rate of video audit.In addition, other audio feature extraction algorithms can also be used in the present embodiment, such as Linear prediction cepstrum coefficient algorithm, to replace MFC algorithm to extract the spectrum signature in the audio-frequency information of resampling.

3) it in the present embodiment when being extracted to the word sequence in pending video, extracts in pending video not Same type content information, can specifically include: obtaining the text information in pending video by OCR algorithm, obtains corresponding Word sequence.

Wherein, OCR algorithm refers to that electronic equipment (such as scanner or digital camera) checks print on paper or screen The character of display determines character shape by dark, the bright mode of detection character, is then translated into shape with character identifying method The process of computword, to carry out automatic identification to text.

Specifically, due to the source of the text information in pending video can be the barrage occurred in pending video, It comments on the comment information in area, text information etc. in the text information and video scene that are superimposed on video pictures, but is not limited to These types of source, it is also possible to which there are other text sources.It can be by existing OCR algorithm to the pending view in the present embodiment The text information occurred in frequency is identified, at this time since the comment information of text information and comment area in barrage is by it He watches the text information of user's input of the pending video, can directly acquire in background system, therefore the present embodiment In only need to obtain the self-contained text information of pending video using existing OCR algorithm, and utilize Word2Vec algorithm Alignment processing is carried out to the text information of acquisition, thus effectively by these text informations be mapped as the dense feature of low-dimensional to Amount, obtains corresponding word sequence.In addition, in the present embodiment can also use other word embedded mobile GISs, as Glove algorithm, WordRank model, FastText classifier etc. carry out alignment processing instead of text information of the Word2Vec algorithm to acquisition, obtain To corresponding word sequence.

S120 merges the feature vector of different type content information according to the correlation between different type content information, Obtain the feature vector of pending video.

Specifically, when extracting the different type content information in pending video, in order to analyze different type content Correlation between information, it is necessary first to obtain the feature vector of variant type content information respectively；It can be at this time difference Corresponding single type analysis model is set separately in type content information, by existing neural network, support vector machine, at random The methods of forest or big data analysis, using the historic training data under a large amount of the type to each single type analysis model into Different type content information can be obtained variant type content by the corresponding single type analysis model by row training The feature vector of information, this feature vector, which can correspond to, to be represented the different type content information and is different from other content information Characteristic.

Optionally, when obtaining the feature vector of different type content information, by right in different type content information It answers the characteristic value under each dimension to be analyzed, the correlation between each different type content information is judged, thus to variant The feature vector of type content information is merged.Optionally, by same in the feature vector to different type content information Characteristic value in dimension is analyzed, judge variant type content information in the dimension in correlation, thus to each Characteristic value of the feature vector on same dimension is merged, obtain capable of indicating the feature of the overall permanence of pending video to Amount.

Illustratively, the sequence of pictures, tonic train and the word sequence that include in extracting pending video etc. are different When type content information, the analysis of single type is carried out to sequence of pictures, tonic train and word sequence respectively, to obtain figure Piece sequence, tonic train and the corresponding feature vector of word sequence, at this time extract sequence of pictures and tonic train Feature vector can be to be analyzed according to corresponding time dimension, therefore can be corresponding to sequence of pictures and tonic train Specific features value in feature vector under same time dimension is analyzed, and judges the correlation of the two；Simultaneously because literary It may include barrage and comment information in word sequence, therefore requirement of real-time is lower, at this time on analyzing each time dimension Picture feature and when audio frequency characteristics, be referred to various features value in the corresponding feature vector of whole word sequence carry out it is whole Body analysis, thus according to the inherent correlation between sequence of pictures, tonic train and word sequence three, by sequence of pictures, sound Frequency sequence and the corresponding feature vector of word sequence are merged, to obtain to indicate the overall permanence of the pending video The feature vector of information is analyzed in different audit classifications so as to subsequent.

S130 respectively obtains pending video under the default audit classification of difference according to the feature vector of pending video Specific gravity.

Wherein, presetting audit classification is any one video type that pending video may belong to, and may include normal Classification and violation classification, wherein violation classification can also be subdivided into violence classification, terrified classification, pornographic classification and political sensitivity Classification etc. contains the violation type of various violation contents.

Specifically, obtaining the feature vector of the pending video in the feature vector for merging variant category content information Afterwards, the feature vector of the pending video can be analyzed, judges the pending video and each different default audit classes Difference degree between the video of not lower the included audit classification, and it is pre- according to the feature vector and difference of pending video If corresponding to the difference degree of video in audit classification, pending video ratio shared under different default audit classifications is determined Weight, so as to the subsequent audit classification for judging the pending video according to the specific gravity.Optionally, analysis can be passed through in the present embodiment The feature of a large amount of history videos under the default audit classification of difference, to analyze the view for being included under different default audit classifications In frequency should existing common feature, audit institute under classifications with different preset subsequently through the feature vector for analyzing pending video Difference degree between the common feature that should include determines specific gravity of the pending video under different default audit classifications.

S140 determines the audit classification of pending video according to the specific gravity under different default audit classifications.

Specifically, in the present embodiment when obtaining specific gravity of the pending video under different default audit classifications, by right Each specific gravity is analyzed, and judges whether the pending video is normal video.Illustratively, if the pending video is in normal class Specific gravity in not is much larger than the specific gravity in the violation classification of each subdivision, it is determined that the audit classification of the pending video is positive Normal classification；If the specific gravity of the pending video in normal category compared with lower than in the violation classification of each subdivision specific gravity it With, and the specific gravity in the violence classification in violation classification under subdivision is much higher than the specific gravity in other violation classifications, it is determined that it should The audit classification of pending video is the violence classification in violation classification.Meanwhile if default audit classification only includes normal category With two kinds of violation classification, then it is directly that the higher audit classifications of specific gravity of pending video in both default audit classifications are true It is set to the audit classification of the pending video.

Optionally, in order to avoid machine audits the audit error to the audit classification of pending video in the present embodiment, also The audit accuracy to pending video can be further promoted in such a way that machine is audited and manual examination and verification combine；This When, according to the specific gravity under different default audit classifications, determines the audit classification of pending video, can specifically include: pending Specific gravity of the core video under violation classification exceeds preset violation threshold value, then sends pending video to manual examination and verification platform；Root The audit classification of pending video is determined according to the feedback information of manual examination and verification platform.

Specifically, violation threshold value is to determine corresponding specific gravity when pending video includes the violation content under the violation classification Lower limit value；In the present embodiment ratio (Suspected Tllegal Push can be pushed according to certain doubtful violation Ratio, SIPR) corresponding each violation threshold value under different violation classifications is defined respectively, wherein the SIPR is to push to manually put down daily Platform data volume is divided by daily business total amount of data；If pending video is under violation classification in different subdivision violation classifications at this time Specific gravity is below corresponding violation threshold value, it is determined that the audit classification of the pending video is normal category；If pending video Under violation classification some subdivision violation classification in specific gravity exceed the corresponding violation threshold value of subdivision violation classification, then this When the pending video may include corresponding violation content.

Optionally, in order to further enhance the accuracy of video audit, the present embodiment is in pending video in a certain violation When specific gravity under classification exceeds corresponding preset violation threshold value, which can be sent to corresponding manual examination and verification and put down Platform further carries out manual examination and verification to the pending video by staff again, determines the audit classification of the pending video, And after the completion of manual examination and verification, manual examination and verification result is returned to as corresponding feedback information by manual examination and verification platform and executes sheet In the video audit terminal of the method for video audit in embodiment, to keep the video audit terminal true according to the feedback information Fixed corresponding manual examination and verification are as a result, so that it is determined that the audit classification of the pending video.

Technical solution provided in this embodiment, by extract the feature of different type content information in pending video to Amount merges each feature vector according to the correlation between feature vector, and is respectively obtained by fused feature vector pending Specific gravity of the core video under different default audit classifications, so that it is determined that the audit classification of pending video, realizes polymorphic type fusion Under video audit, solve the problems, such as to improve the complete of video audit in the prior art for video audit there are limitation Face property and accuracy.

Embodiment two

Fig. 2 is a kind of schematic illustration of video review process provided by Embodiment 2 of the present invention.Be in the present embodiment It is optimized on the basis of technical solution provided by the above embodiment.Specifically, mainly in different type in the present embodiment Hold the specific gravity determination process of the fusion process and pending video of the feature vector of information under different default audit classifications into The detailed explanation of row.

Optionally, it may include steps of in the present embodiment:

S210 extracts the different type content information in pending video.

S220 inputs variant type content information into the fusion learning model constructed in advance, learns mould by fusion Study submodel in type under different type extracts the feature vector of different type content information respectively.

Wherein, fusion learning model is that one kind can carry out unitary class to the different type content information in pending video After type analysis, variant type content information is directly merged, it is different default careful under polymorphic type fusion to obtain the pending video The training pattern of specific gravity in core classification can be composed of multiple submodels.Wherein, it can wrap in the fusion learning model The study submodel that correspondence analysis is carried out to type content information variant under single type is included, and to variant type content The fusion submodel that the analysis result of information is merged；Specifically, extracted in the study submodel and pending video Different type content information corresponds, and in different type content information may include sequence of pictures, audio sequence in the present embodiment Column and three kinds of word sequence, therefore as shown in Fig. 2, in the present embodiment merge learning model in include three kinds of study submodels, divide It is not corresponded with sequence of pictures, tonic train and the word sequence extracted in pending video, thus respectively to pending Sequence of pictures, tonic train and word sequence in video carry out single type analysis, to extract sequence of pictures, audio sequence Arrange feature vector corresponding with word sequence；Fusion submodel can to three kinds learn submodel in extract sequence of pictures, Tonic train and the corresponding feature vector of word sequence are merged, and the overall permanence that can indicate the pending video is obtained Feature vector.Optionally, the study submodel in the present embodiment and fusion submodel are a kind of deep neural network model respectively, By carrying out self study and adaptive training to a large amount of history videos, enables to learn submodel and fusant model capability divides Do not have a certain particular procedure ability, to obtain corresponding target analysis result.Learn submodel and fusion in the present embodiment Submodel can also replace depth by other machines learning model, such as Xgboost, support vector machines, random forest model Neural network model is trained.

It optionally, first will respectively not in the present embodiment after extracting the different type content information in pending video Same type content information is separately input in the fusion learning model constructed in advance, that is, will be mentioned from pending video respectively Sequence of pictures, tonic train and the word sequence of taking-up are input in fusion learning model, by fusion learning model and not Same type content information learn correspondingly submodel respectively to variant type content information carry out feature extraction, by with The corresponding study submodel of sequence of pictures extracts the feature vector of sequence of pictures；Pass through study submodel corresponding with tonic train Extract the feature vector of tonic train；The feature vector of word sequence is extracted by study submodel corresponding with word sequence； So as to the subsequent inherent correlation according between sequence of pictures, tonic train and word sequence, each feature vector is carried out special Levy fusion treatment.

S230 passes through the fusion submodel in fusion learning model according to the correlation between different type content information The feature vector of different type content information is merged, the feature vector of pending video is obtained.

Specifically, after respectively obtaining the feature vector of different type content information by each study submodel, it will respectively not The feature vector of same type content information is respectively by the fusion submodel in fusion learning model, so that fusion submodel is effective Excavate the correlation between variant type content information, and according to the correlation to the feature of different type content information to Amount is merged, and the feature vector that can indicate the overall permanence of pending video is obtained.Optionally, respectively will in the present embodiment The feature vector of sequence of pictures, tonic train and word sequence is by fusion submodel, analysis picture sequence, tonic train and text Inherent correlation between word sequence, such as can by analysis picture sequence and the feature vector of tonic train in the same time Characteristic value in dimension judges the correlation of sequence of pictures and tonic train, due to including barrage and comment in word sequence Information etc., real-time is lower, therefore after determining the correlation of sequence of pictures and tonic train, analyzes the spy of word sequence again Levy correlation of the vector in other dimensions between sequence of pictures and tonic train, thus to sequence of pictures, tonic train and The feature vector of word sequence is merged, and the feature vector of pending video is obtained.Specifically, can pass through in the present embodiment The Nonlinear Mapping in submodel between each neuron is merged to merge the feature vector of variant type content information one It rises, obtains the feature vector of pending video.Wherein, it can learn and store a large amount of input and output modes to reflect in Nonlinear Mapping Relationship is penetrated, without understanding the math equation for describing this mapping relations in advance, as long as enough sample datas pair can be provided Network model carries out learning training in advance, just can complete to tie up the Nonlinear Mapping that the input space ties up output space to m from n.

S240 is respectively obtained according to the feature vector of pending video by the default regression function in fusion submodel Specific gravity of the pending video under different default audit classifications.

Wherein, default regression function is a kind of regression model for multi-classification algorithm in neural network, and being used for will be pending Classification results normalization of the core video in different default audit classifications.Optionally, letter is returned using Softmax in the present embodiment Output of the feature vector of pending video in each default audit type can be mapped to corresponding (0,1) section by number It is interior, to obtain specific gravity of the pending video under variant default audit classification, pending video can be guaranteed not at this time It is 1 with the sum of the specific gravity under default audit classification.It should be noted that the default regression function in fusion submodel can be Softmax regression function, Logistic regression function and polynomial regression function etc. can be according to the features of pending video Vector determines any one in the regression model of shared specific gravity under different audit classifications, does not limit in the present embodiment this It is fixed.

Optionally, the feature vector of the pending video is analyzed in the present embodiment, judge the pending video with Difference degree between the video for the audit classification for being included under each different default audit classifications, so that it is determined that this is pending The feature vector of video is corresponding in different default audit classifications to be analyzed as a result, and by the Softmax in fusion submodel Regression function is by the analysis exported in variant default audit classification as a result, being mapped in corresponding (0,1) section, thus really Specific gravity of the fixed pending video under variant default audit classification, determines examining for pending video according to the specific gravity so as to subsequent Core classification.

S250 determines the audit classification of pending video according to the specific gravity under different default audit classifications.

Technical solution provided in this embodiment, by the fusion learning model that constructs in advance to different type content information Feature vector is merged, and respectively obtains pending video under the default audit classification of difference according to fused feature vector Specific gravity realize that the lower video of polymorphic type fusion is audited, solve the prior art so that it is determined that the audit classification of pending video In for video audit there are problems that limitation, improve video audit comprehensive and accuracy.

Embodiment three

Fig. 3 is the building schematic diagram in the method for the video audit that the embodiment of the present invention three provides to fusion learning model. The present embodiment is to optimize on the basis of the above embodiments, the mainly specific training to fusion learning model in the present embodiment Process carries out detailed explanation.

Optionally, the present embodiment may include steps of:

S310 extracts different types of samples of content information in training sample.

Wherein, training sample is the history video under target audit classification.In order to guarantee the fusion learning model energy of building It is enough that corresponding audit is carried out to all types of pending videos, it needs in advance to having determined largely going through for audit classification belonging to video History video is trained.At this point, first by obtain it is a large amount of it is of all categories under history video as training sample, the wherein history The source of video can be the short-sighted frequency in internet or internet live video for being uploaded to that internet is propagated, and be also possible to it The video that his approach obtains is not construed as limiting the source of history video in the present embodiment.Meanwhile a large amount of training samples that will acquire It is configured to corresponding training set, and is labeled for each training sample, is previously according to the content in the history video Each training sample sets corresponding sample label, which is the audit classification of target belonging to training sample, including Normal category and violation classification, or violation classification is specifically segmented, therefore the violation label in the present embodiment can be set For single tag along sort or more tag along sorts, to judge whether violation classification is subdivided into specific violation type.

Optionally, the present embodiment carries out deep neural network training for the training sample under each audit classification respectively, Obtaining label information in training set first is the training sample that a certain target audits classification, and to different type in the training sample Sample information separated, to extract in the different types of samples such as the picture, sound, text for including in training sample Hold information.Specifically, the samples pictures sequence for including in training sample, sample audio sequence can be extracted in the present embodiment respectively With sample word sequence, each submodel is trained so as to subsequent.

S320, extracts the feature vector of samples of content information by the study submodel under different type respectively, and according to Correlation between each samples of content information merges the feature vector of each samples of content information by merging submodel, Obtain the feature vector of training sample.

Specifically, in extracting training sample after different types of each samples of content information, as shown in Figure 3, respectively Each samples of content information is input in corresponding study submodel, that is, samples pictures sequence inputting is instructed to for picture In experienced study submodel, by sample audio sequence inputting into the study submodel for audio training, by sample text sequence Column are input in the study submodel for text training, thus according to preset training parameter in each study submodel, The feature in samples pictures sequence, sample audio sequence and sample word sequence is extracted respectively, to obtain each sample The feature vector of content information；The feature vector of each samples of content information is input to simultaneously in fusion submodel at this time, is passed through Preset training parameter judges the correlation between each samples of content information in fusion submodel, that is, is joined according to training Inherent correlation between number analysis samples pictures sequence, sample audio sequence and the feature vector of sample word sequence, thus Make to merge feature vector of the submodel according to the correlation analysis result using corresponding training parameter to each samples of content information It is merged, obtains the feature vector that can indicate the overall permanence of the training sample.Specifically, passing through fusion in the present embodiment Nonlinear Mapping in submodel between each neuron is by the feature of each samples of content information different types of in training sample Vector Fusion together, obtains the feature vector of the training sample.

S340 respectively obtains instruction by the default regression function in fusion submodel according to the feature vector of training sample Practice specific gravity of the sample under different default audit classifications.

Optionally, in the feature vector for obtaining training sample, according to default in each difference in advance in fusion submodel The model parameter feature set under audit classification, the feature vector of training of judgement sample are pre- under each different default audit classifications If model parameter feature between difference degree, it is right under the default audit classifications of difference to obtain the feature vector of the training sample The compartment analysis answered is as a result, simultaneously by the default regression function in fusion submodel, that is, Softmax regression function will be The compartment analysis exported in variant default audit classification is as a result, be mapped in corresponding (0,1) section, so that it is determined that the training Specific gravity of the sample under variant default audit classification, determines that the training sample was trained at this according to the specific gravity so as to subsequent Affiliated audit classification obtained in journey.

S350 audits classification and the specific gravity in the case where Bu Tong presetting audit classification according to the target of training sample, determines and correspond to Classification Loss, and Classification Loss is subjected to backpropagation, each study submodel and fusion submodel is modified, and continues Training sample new under target audit classification is obtained, until the Classification Loss under target audit classification is lower than default loss threshold value.

Specifically, can substantially determine this instruction when obtaining specific gravity of the training sample under different default audit classifications The audit classification that the fusion learning model analyzes training sample during white silk, the audit classification are estimated value, this When can the target according to belonging to the training sample audit classification with according to Bu Tong preset audit classification under specific gravity determination Classification is audited, that is, the concrete class of video audit and this estimated value are judged, determines and exists in this audit Classification Loss, which can clearly indicate that each submodel of currently training is quasi- for the classification of target audit type True degree.Optionally, the Classification Loss that this training can be judged using any existing loss function, does not limit this It is fixed.Existed in the present embodiment by the sample label to training sample, that is, corresponding target audit classification and training sample Specific gravity under the default audit classification of difference seeks corresponding cross entropy to determine that this trains corresponding Classification Loss.Meanwhile this In embodiment after obtaining Classification Loss, it is also necessary to judge the Classification Loss, if the Classification Loss of this training exceeds Preset loss threshold value, the accuracy for illustrating that each submodel of this training audits video is not also high, needs to be instructed again Practice；The Classification Loss that this training obtains is subjected to backpropagation according to model training process at this time, that is, the classification is damaged It loses successively back through fusion submodel and each study submodel, and according to the Classification Loss to the training parameter in each submodel It is modified, to constantly adjust the training parameter in each submodel, the classification accuracy of each submodel is continuously improved.

Further, after being modified to each submodel, acquisition can be continued in the training set constructed in advance and belonged to The target audits training sample new under classification, obtains the new classification damage determined when the training sample new to this is trained It loses, circuits sequentially, until the Classification Loss under target audit classification is lower than preset loss threshold value, illustrate each of this training Submodel has reached certain accuracy to video audit, instructs again without auditing the training sample under classification to the target Practice, other audit classifications are trained at this time.

S360, the training sample reacquired under other audit classifications are trained again, until preset all audits Classification Loss under classification is below corresponding default loss threshold value, then by obtained each study submodel and fusion submodel structure It builds to merge learning model.

Specifically, when auditing the training sample training completion under classification to a certain target, it is also necessary to default audit class Training samples under other audit classifications in not are trained, referring to above-mentioned training process, respectively to normal category and The corresponding training sample of each specific violation type segmented in violation classification is trained, until under preset all audits classifications Classification Loss be below corresponding default loss threshold value, can determine at this time currently training each submodel can be to any The audit classification of pending video carries out accurate judgement, is at this time this by obtained each study submodel and fusant model construction Fusion learning model in embodiment carries out accurate reviews to pending video so as to subsequent.

Technical solution provided in this embodiment, by inputting samples of content information different types of in a large amount of training samples It is trained into each study submodel and fusion submodel, building can believe the different type content in pending video The fusion learning model analyzed and merged is ceased, to realize the video audit under polymorphic type fusion, solves the prior art In for video audit there are problems that limitation, improve video audit comprehensive and accuracy.

Example IV

A kind of scene frame of the applicable application scenarios of the method for video audit that Fig. 4 A is provided by the embodiment of the present invention four Composition, Fig. 4 B are the schematic illustration for the video review process that the embodiment of the present invention four provides.Mainly with specific in the present embodiment Application scenarios detailed process that video is audited be described in detail.It include that video audit is whole referring to Fig. 4 A, in the present embodiment End 40, user terminal 41 and manual examination and verification platform 42；Video audit terminal 40 respectively with user terminal 41 and manual examination and verification platform 42 establish wireless connection.

Optionally, user can upload corresponding pending video by the user terminal 41 at place, in pending video Before propagating in internet, video audit terminal 40 can obtain the pending view that user newly uploads on user terminal 41 first Frequently, and using the method for the video audit provided in the embodiment of the present invention the pending video is audited, it is pending to obtain this Specific gravity of the core video under different default audit classifications, finds the specific gravity of the pending video under violation classification at this time, if The pending video can be sent to corresponding people beyond corresponding preset violation threshold value by the specific gravity under a certain violation classification Work audits platform 42, is further manually examined again the pending video by the staff at 42 end of manual examination and verification platform Core, so that it is determined that the audit classification of the pending video, and after the completion of manual examination and verification, it will manually be examined by manual examination and verification platform 42 Core result returns in video audit terminal 40 as corresponding feedback information, to keep the video audit terminal 40 anti-according to this Feedforward information determines corresponding manual examination and verification as a result, so that it is determined that the audit classification of the pending video.If under each violation classification Specific gravity be below corresponding preset violation threshold value, then can directly determine the pending video is normal video.The present embodiment In machine audit and by way of manual examination and verification combine, further promote the accuracy of video audit.

Specifically, video audit terminal 40 is when auditing the pending video, first to pending referring to Fig. 4 B Video carries out segment processing, and extracts the image information in the pending video under different type, audio-frequency information and text letter Breath, while corresponding video frame is extracted in image information after segmentation, and be combined sequentially in time, it is corresponded to Sequence of pictures；Resampling is carried out to the audio-frequency information after segmentation, extracts the frequency spectrum spy of audio-frequency information after the resampling of each segmentation Sign, and be combined sequentially in time, obtain corresponding tonic train；Obtain the text information in pending video, and benefit It is handled with text information of the Word2Vec algorithm to acquisition, obtains corresponding word sequence；By sequence of pictures, tonic train Pass through corresponding study submodel in fusion learning model respectively with word sequence, extracts sequence of pictures, tonic train and text The feature vector of word sequence, thus according to fusion submodel each feature vector extracted is merged, and according to fusion after Feature vector determine specific gravity of the pending video under different default audit classifications, and then pending view is determined according to the specific gravity The audit classification of frequency.

It should be noted that being not construed as limiting in the present embodiment for the quantity of user terminal 41, according to the use of uploaded videos Amount amount determines.

Embodiment five

Fig. 5 is a kind of structural schematic diagram of the device for video audit that the embodiment of the present invention five provides, specifically, such as Fig. 5 It is shown, the apparatus may include:

Information extraction modules 510, for extracting the different type content information in pending video；

Fusion Features module 520, for merging different type content according to the correlation between different type content information The feature vector of information obtains the feature vector of pending video；

Specific gravity determining module 530 respectively obtains pending video in difference for the feature vector according to pending video Specific gravity under default audit classification；

Category determination module 540 is audited, for determining pending video according to the specific gravity under different default audit classifications Audit classification.

Further, features described above Fusion Module 520 can be specifically used for:

Variant type content information is inputted into the fusion learning model constructed in advance, by merging in learning model not Study submodel under same type extracts the feature vector of different type content information respectively；

According to the correlation between different type content information, by the fusion submodel in fusion learning model to difference The feature vector of type content information is merged, and the feature vector of pending video is obtained.

Further, above-mentioned specific gravity determining module 530, can be specifically used for:

According to the feature vector of pending video, respectively obtained by the default regression function in fusion submodel pending Specific gravity of the video under different default audit classifications.

Further, above-mentioned fusion learning model can be by executing operations described below building:

Different types of samples of content information in training sample is extracted, which is the history under target audit classification Video；

Extract the feature vector of samples of content information respectively by the study submodel under different type, and according to each sample Correlation between content information merges the feature vector of each samples of content information by merging submodel, is instructed Practice the feature vector of sample；

According to the feature vector of training sample, training sample is respectively obtained by the default regression function in fusion submodel Specific gravity under different default audit classifications；

Classification and the specific gravity in the case where Bu Tong presetting audit classification are audited according to the target of training sample, determines corresponding classification Loss, and Classification Loss is subjected to backpropagation, each study submodel and fusion submodel are modified, and continue to obtain mesh New training sample under mark audit classification, until the Classification Loss under target audit classification is lower than default loss threshold value；

Further, above-mentioned different type content information may include sequence of pictures in pending video, tonic train And word sequence.

Further, above- mentioned information extraction module 510 can be specifically used for:

After pending video segmentation, video frame is extracted in the pending video of segmentation；

Further, above- mentioned information extraction module 510 can also be specifically used for:

After pending video segmentation, the resampling audio-frequency information in the pending video of segmentation；

The text information in pending video is obtained by optical character identification OCR algorithm, obtains corresponding word sequence.

Further, above-mentioned audit category determination module 540, can be specifically used for:

Specific gravity of the pending video under violation classification exceed preset violation threshold value, then to manual examination and verification platform send to Audit video；

The audit classification of pending video is determined according to the feedback information of manual examination and verification platform.

The device of video audit provided in this embodiment is applicable to the side for the video audit that above-mentioned any embodiment provides Method has corresponding function and beneficial effect.

Embodiment six

Fig. 6 is a kind of structural schematic diagram for equipment that the embodiment of the present invention six provides, as shown in fig. 6, the equipment includes place Manage device 60, storage device 61 and communication device 62；The quantity of processor 60 can be one or more in equipment, with one in Fig. 6 For a processor 60；Processor 60, storage device 61 and communication device 62 in equipment can pass through bus or other modes It connects, in Fig. 6 for being connected by bus.

Storage device 61 is used as a kind of computer readable storage medium, and it is executable to can be used for storing software program, computer Program and module, the corresponding program instruction/module of method audited such as the video provided in the embodiment of the present invention.Processor 60 By running the software program, instruction and the module that are stored in storage device 61, thereby executing the various function application of equipment And data processing, that is, realize the method for above-mentioned video audit.

Storage device 61 can mainly include storing program area and storage data area, wherein storing program area can store operation Application program needed for system, at least one function；Storage data area, which can be stored, uses created data etc. according to terminal. It can also include nonvolatile memory in addition, storage device 61 may include high-speed random access memory, for example, at least one A disk memory, flush memory device or other non-volatile solid state memory parts.In some instances, storage device 61 can It further comprise the memory remotely located relative to processor 60, these remote memories can be by network connection to setting It is standby.The example of above-mentioned network includes but is not limited to internet, intranet, local area network, mobile radio communication and combinations thereof.

Communication device 62 can be used for realizing the network connection or mobile data cube computation of equipment room.

A kind of equipment provided in this embodiment can be used for executing the method for the video that above-mentioned any embodiment provides, and have phase The function and beneficial effect answered.

Embodiment seven

The embodiment of the present invention seven additionally provides a kind of computer readable storage medium, is stored thereon with computer program, should Program can realize the audit of the video in above-mentioned any embodiment method when being executed by processor.This method can specifically include:

Extract the different type content information in pending video；

Certainly, a kind of storage medium comprising computer executable instructions, computer provided by the embodiment of the present invention Video audit provided by any embodiment of the invention can also be performed in the method operation that executable instruction is not limited to the described above Method in relevant operation.

By the description above with respect to embodiment, it is apparent to those skilled in the art that, the present invention It can be realized by software and required common hardware, naturally it is also possible to which by hardware realization, but in many cases, the former is more Good embodiment.Based on this understanding, technical solution of the present invention substantially in other words contributes to the prior art Part can be embodied in the form of software products, which can store in computer readable storage medium In, floppy disk, read-only memory (Read-Only Memory, ROM), random access memory (Random such as computer Access Memory, RAM), flash memory (FLASH), hard disk or CD etc., including some instructions are with so that a computer is set Standby (can be personal computer, server or the network equipment etc.) executes method described in each embodiment of the present invention.

It is worth noting that, included each unit and module are in the embodiment of the device of above-mentioned video audit It is divided according to the functional logic, but is not limited to the above division, as long as corresponding functions can be realized；Separately Outside, the specific name of each functional unit is also only for convenience of distinguishing each other, the protection scope being not intended to restrict the invention.

The above description is only a preferred embodiment of the present invention, is not intended to restrict the invention, for those skilled in the art For, the invention can have various changes and changes.All any modifications made within the spirit and principles of the present invention are equal Replacement, improvement etc., should all be included in the protection scope of the present invention.

Claims

1. a kind of method of video audit characterized by comprising

Extract the different type content information in pending video；

According to the correlation between different type content information, the feature vector of different type content information is merged, is obtained pending The feature vector of core video；

According to the feature vector of the pending video, ratio of the pending video under different default audit classifications is respectively obtained Weight；

2. the method according to claim 1, wherein being melted according to the correlation between different type content information The feature vector for closing different type content information, obtains the feature vector of pending video, comprising:

Each different type content information is inputted into the fusion learning model constructed in advance, passes through the fusion learning model Study submodel under middle different type extracts the feature vector of the different type content information respectively；

According to the correlation between different type content information, by the fusion submodel in the fusion learning model to difference The feature vector of type content information is merged, and the feature vector of pending video is obtained.

3. according to the method described in claim 2, it is characterized in that, being obtained respectively according to the feature vector of the pending video To specific gravity of the pending video under different default audit classifications, comprising:

According to the feature vector of the pending video, by the default regression function in the fusion submodel respectively obtain to Audit specific gravity of the video under different default audit classifications.

4. according to the method described in claim 2, it is characterized in that, the fusion learning model is by executing operations described below structure It builds:

Different types of samples of content information in training sample is extracted, the training sample is that the history that target is audited under classification regards Frequently；

Extract the feature vector of the samples of content information respectively by the study submodel under different type, and according to each described Correlation between samples of content information melts the feature vector of each samples of content information by merging submodel It closes, obtains the feature vector of training sample；

According to the feature vector of the training sample, training is respectively obtained by the default regression function in the fusion submodel Specific gravity of the sample under different default audit classifications；

Classification and the specific gravity in the case where Bu Tong presetting audit classification are audited according to the target of training sample, determines corresponding classification damage It loses, and the Classification Loss is subjected to backpropagation, each study submodel and fusion submodel are modified, and continue to obtain New training sample under the target audit classification, until the Classification Loss under target audit classification is lower than default loss threshold Value；

The training sample reacquired under other audit classifications is trained again, until point under preset all audits classification Class loss is below corresponding default loss threshold value, then is that fusion is learned by obtained each study submodel and fusant model construction Practise model.

5. the method according to claim 1, wherein the different type content information includes in pending video Sequence of pictures, tonic train and word sequence.

6. according to the method described in claim 5, it is characterized in that, extract the different type content information in pending video, Include:

7. according to the method described in claim 5, it is characterized in that, extract the different type content information in pending video, Include:

8. according to the method described in claim 5, it is characterized in that, extract the different type content information in pending video, Include:

The text information in the pending video is obtained by optical character identification OCR algorithm, obtains corresponding word sequence.

9. the method according to claim 1, wherein being determined according to the specific gravity under different default audit classifications The audit classification of the pending video, comprising:

Specific gravity of the pending video under violation classification exceeds preset violation threshold value, then sends institute to manual examination and verification platform State pending video；

10. a kind of device of video audit characterized by comprising

Fusion Features module, for merging different type content information according to the correlation between different type content information Feature vector obtains the feature vector of pending video；

Specific gravity determining module respectively obtains pending video different pre- for the feature vector according to the pending video If auditing the specific gravity under classification；

Category determination module is audited, for determining the pending video according to the specific gravity under different default audit classifications Audit classification.

11. a kind of equipment, which is characterized in that the equipment includes:

One or more processors；

Storage device, for storing one or more programs；

When one or more of programs are executed by one or more of processors, so that one or more of processors are real The now method of the video audit as described in any in claim 1-9.

12. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is by processor The method of the video audit as described in any in claim 1-9 is realized when execution.