CN107944409A

CN107944409A - video analysis method and device

Info

Publication number: CN107944409A
Application number: CN201711243388.8A
Authority: CN
Inventors: 季向阳; 杨武魁; 陈孝罡
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2017-11-30
Filing date: 2017-11-30
Publication date: 2018-04-20
Anticipated expiration: 2037-11-30
Also published as: CN107944409B

Abstract

This disclosure relates to a kind of video analysis method and device, the described method includes：By video input single frames identification model to be identified, the single frames feature of single-frame images in the video to be identified is obtained；According to frame length, start frame and identification step-length, the video to be identified is divided into each video block；The single frames feature and frame length of the single-frame images included according to each video block, determine the feature stream matrix of each video block；By initial notice matrix and the feature stream matrix of video block, input shot and long term memory models is handled, and obtains the notice matrix of video block；According to the notice matrix of video block, the attention force vector of the video to be identified is determined.The disclosure optionally pays close attention to region important on sdi video, and time upper relatively important frame, and then reduces influence of the irrelevant information to video analysis result.

Description

Video analysis method and device

Technical field

This disclosure relates to computer vision field, more particularly to a kind of video analysis method and device.

Background technology

Video analysis is an important directions of computer vision field, and in recent years, neutral net is in art of image analysis Important breakthrough is achieved, but for relative image, video adds time dimension information, therefore allows machine to understand different video Frame is the contact on time dimension becomes it is particularly important that.In traditional method, usually described using the manual feature such as light stream The temporal information of video, often only considers the analysis result of different single-frame images, can not accurately distinguish molar behavior in video In the action of certain Partial key, cause the recognition result of video inaccurate.

The content of the invention

In view of this, the present disclosure proposes a kind of video analysis method and device, to solve traditional video analysis side In method, it is impossible to the problem of accurately distinguishing the key operations in overall work in video, causing the recognition result of video inaccurate.

According to the one side of the disclosure, there is provided a kind of video analysis method, the described method includes：

By video input single frames identification model to be identified, the single frames feature of single-frame images in the video to be identified is obtained；

According to frame length, start frame and identification step-length, the video to be identified is divided into each video block；

The single frames feature and frame length of the single-frame images included according to each video block, determine the feature stream square of each video block Battle array；

By initial notice matrix and the feature stream matrix of video block, input shot and long term memory models is handled, obtained The notice matrix of video block；

According to the notice matrix of video block, the attention force vector of the video to be identified is determined.

In a kind of possible implementation, by initial notice matrix and the feature stream matrix of video block, length is inputted Phase memory models is handled, and obtains the notice matrix of video block, including：

According to the feature of single frames feature is wide, the feature of single frames feature is high and the frame length, the video block is determined Initial notice matrix；

By the initial notice matrix and the feature stream matrix of first video block, input shot and long term memory models carries out Processing, obtains the notice matrix of first video block；

Using second video block and its follow-up video block as current video block, successively by the attention of a upper video block The feature stream matrix of torque battle array and current video block, input shot and long term memory models are handled, and obtain the note of current video block Meaning torque battle array.

In a kind of possible implementation, by the notice matrix of a upper video block and the feature stream of current video block Matrix, input shot and long term memory models are handled, and obtain the notice matrix of current video block, including：

The notice matrix of a upper video block and the feature stream matrix weights of current video block are summed, obtain integrating special Levy matrix；

The integration characteristics Input matrix shot and long term memory models is handled, obtains the attention torque of current video block Battle array.

In a kind of possible implementation, according to the notice matrix of video block, the note of the video to be identified is determined Meaning force vector, including：

The notice matrix of video block where single-frame images is averaged, obtains the single frames vector of single-frame images；

According to the single frames of all single-frame images vector, the attention force vector of the video to be identified is obtained.

In a kind of possible implementation,

By initial notice matrix and the feature stream matrix of video block, input shot and long term memory models is handled, obtained The notice matrix of video block, further includes：

Obtain the class probability of current video block；

Class probability input grader is handled, obtains the video block classification of current video block；

According to the video block classification of video block, the video classification of the video to be identified is determined.

According to another aspect of the present disclosure, there is provided a kind of video analysis device, including：

Single frames characteristic determination module, for by video input single frames identification model to be identified, obtaining the video to be identified The single frames feature of middle single-frame images；

Video block division module, for according to frame length, start frame and identification step-length, the video to be identified to be divided into Each video block；

Feature stream matrix deciding module, for the single frames feature and frame length of the single-frame images included according to each video block, Determine the feature stream matrix of each video block；

Notice matrix deciding module, for by the feature stream matrix of initial notice matrix and video block, inputting length Phase memory models is handled, and obtains the notice matrix of video block；

Pay attention to force vector determining module, for the notice matrix according to video block, determine the note of the video to be identified Meaning force vector.

In a kind of possible implementation, the notice matrix deciding module, including：

Initial notice matrix determination sub-module, for the feature according to single frames feature is wide, feature height of single frames feature with And the frame length, determine the initial notice matrix of the video block；

First notice matrix determination sub-module, for by the feature of the initial notice matrix and first video block Matrix is flowed, input shot and long term memory models is handled, and obtains the notice matrix of first video block；

Follow-up notice matrix determination sub-module, for using second video block and its follow-up video block as working as forward sight Frequency block, successively by the notice matrix of a upper video block and the feature stream matrix of current video block, input shot and long term memory mould Type is handled, and obtains the notice matrix of current video block.

In a kind of possible implementation, the follow-up notice matrix determination sub-module, including：

Submodule is integrated, for by the feature stream matrix weights of the notice matrix of a upper video block and current video block Summation, obtains integration characteristics matrix；

Shot and long term memory models handles submodule, for the integration characteristics Input matrix shot and long term memory models to be carried out Processing, obtains the notice matrix of current video block.

In a kind of possible implementation, the attention force vector determining module, including：

Single frames vector determination sub-module, for the notice matrix of the video block where single-frame images to be averaged, obtains To the single frames vector of single-frame images；

Summation submodule, for the single frames vector according to all single-frame images, obtains the notice of the video to be identified Vector.

In a kind of possible implementation, the notice matrix deciding module, further includes：

Class probability determination sub-module, for obtaining the class probability of current video block；

Grader submodule, for class probability input grader to be handled, obtains regarding for current video block Frequency block classification；

Video classification determination sub-module, for the video block classification according to video block, determines regarding for the video to be identified Frequency classification.

According to the one side of the disclosure, there is provided a kind of video analysis device, including：

Processor；

For storing the memory of processor-executable instruction；

Wherein, the processor is configured as：The method described in any one in this method claim is realized during execution.

According to the one side of the disclosure, there is provided a kind of non-volatile computer readable storage medium storing program for executing, is stored thereon with meter Calculation machine programmed instruction, the computer program instructions are realized in this method claim when being executed by processor described in any one Method.

By the way that video to be identified is divided into video block, and the single frames feature of the single-frame images of video to be identified is obtained, The disclosure optionally pays close attention to region important on sdi video, and time upper relatively important frame, and then reduces Influence of the irrelevant information to video analysis result.In addition, the attention model in time domain can also be used to screen the pass of video Key frame.According to below with reference to the accompanying drawings becoming clear to detailed description of illustrative embodiments, the further feature and aspect of the disclosure Chu.

Brief description of the drawings

Comprising in the description and the attached drawing of a part for constitution instruction and specification together illustrate the disclosure Exemplary embodiment, feature and aspect, and for explaining the principle of the disclosure.

Fig. 1 shows the flow chart of the video analysis method according to one embodiment of the disclosure；

Fig. 2 shows the flow chart of the video analysis method according to one embodiment of the disclosure；

Fig. 3 shows the flow chart of the video analysis method according to one embodiment of the disclosure；

Fig. 4 shows the flow chart of the video analysis method according to one embodiment of the disclosure；

Fig. 5 shows the flow chart of the video analysis method according to one embodiment of the disclosure；

Fig. 6 shows the exemplary schematic diagram of application according to the video analysis method of one embodiment of the disclosure；

Fig. 7 shows the exemplary schematic diagram of application according to the video analysis method of one embodiment of the disclosure；

Fig. 8 shows the exemplary schematic diagram of application according to the video analysis method of one embodiment of the disclosure；

Fig. 9 shows the block diagram of the video analysis device according to one embodiment of the disclosure；

Figure 10 shows the block diagram of the video analysis device according to one embodiment of the disclosure；

Figure 11 shows the block diagram of the video analysis device according to one embodiment of the disclosure.

Embodiment

Describe various exemplary embodiments, feature and the aspect of the disclosure in detail below with reference to attached drawing.It is identical in attached drawing Reference numeral represent functionally the same or similar element.Although the various aspects of embodiment are shown in the drawings, remove Non-specifically point out, it is not necessary to attached drawing drawn to scale.

Dedicated word " exemplary " means " being used as example, embodiment or illustrative " herein.Here as " exemplary " Illustrated any embodiment should not necessarily be construed as preferred or advantageous over other embodiments.

In addition, in order to better illustrate the disclosure, numerous details is given in embodiment below. It will be appreciated by those skilled in the art that without some details, the disclosure can equally be implemented.In some instances, for Method, means, element and circuit well known to those skilled in the art are not described in detail, in order to highlight the purport of the disclosure.

Fig. 1 shows the flow chart of the video analysis method according to one embodiment of the disclosure, as shown in Figure 1, this method includes Following steps：

Step S10, by video input single frames identification model to be identified, obtains the list of single-frame images in the video to be identified Frame feature.

In a kind of possible implementation, video to be identified includes the continuous image of multiframe.Single frames identification model, such as Trained convolutional neural networks model.After in the trained single frames identification model of video input to be identified, according to setting The feature wide (W) of single frames feature, feature high (H) and characteristic dimension (D), obtain the single frames feature of every two field picture.Single frames feature The wide W of feature, including single frames feature vector is in the location of pixels of the cross direction of two field picture, the high H of feature of single frames feature, including list Location of pixels of the frame feature vector in the high direction of two field picture.

Step S20, according to frame length, start frame and identification step-length, each video block is divided into by the video to be identified.

In a kind of possible implementation, frame length (T) includes the quantity of continuous frame, and start frame is each video block Start frame, identification step-length is each division step-length of video block.Such as frame length is 10, start frame is the 1st frame, identifies step-length For 2, refer in video to be identified, using the 1st frame to the 10th frame as first video block, using the 3rd frame to the 13rd frame as second Video block, and so on.The position of start frame can randomly select.

Step S30, the single frames feature and frame length of the single-frame images included according to each video block, determine the spy of each video block Sign stream matrix.

In a kind of possible implementation, the feature stream matrix of each video block, by each single frames figure in the video block Obtained after the single frames merging features of picture.Single frames is characterized as the feature vector that dimension is W*H*D, and therefore, T*W*H*D is each video block Feature stream matrix (F).It is understood that feature stream matrix can give expression to the space characteristics on video block feature.

Step S40, initial notice matrix and the feature stream matrix of video block are inputted at shot and long term memory models Reason, obtains the notice matrix of video block.

In a kind of possible implementation, shot and long term memory models (LSTM Long Short-Term Memory) is A kind of time recurrent neural network, is suitable for being spaced in processing and predicted time sequence and postponing relatively long critical event. The notice matrix of the video block handled using shot and long term memory models, can provide the feature letter in video block in sequential Breath.Since the calculating of shot and long term memory models needed the input information of a upper period, it is therefore desirable to be first video block The initial notice matrix of setting.Initial notice matrix can provide at random, can also be according to the instruction of shot and long term memory models Practice result to provide.

According to the upper video block notice matrix and the feature stream matrix of current video block being calculated, length is inputted After phase memory models, characteristic information of the every two field picture of video block in time series is obtained, draws the attention of each video block The feature locations in frame where frame where power, and the notice of each video block.It is understood that notice matrix The temporal characteristics on video block feature can be given expression to.

Step S50, according to the notice matrix of video block, determines the attention force vector of the video to be identified.

In a kind of possible implementation, after the notice matrix of each video block is carried out integration processing, it can obtain To the attention force vector of video to be identified.If carrying out the processing of above-mentioned steps using video to be identified as a video block, The notice matrix of the video block drawn, is the attention force vector of video to be identified.By the space characteristics of video block and time The attention force vector that feature is drawn after being integrated has the space characteristics and temporal characteristics of video block at the same time.

The disclosure optionally pays close attention to region important on sdi video, and time upper relatively important frame, into And reduce influence of the irrelevant information to video analysis result.In addition, the attention model in time domain can also be used to screen The key frame of video.The disclosure and the process of people's alanysis video are more closely, reduce irrelevant information and redundancy pair The influence of key message.

Fig. 2 shows the flow chart of the video analysis method according to one embodiment of the disclosure, as shown in Fig. 2, in above-mentioned implementation On the basis of example, the step S40 of this method includes：

Step S41, according to the feature of single frames feature is wide, the feature of single frames feature is high and the frame length, determines described The initial notice matrix of video block.

Step S42, by the initial notice matrix and the feature stream matrix of first video block, input shot and long term memory Model is handled, and obtains the notice matrix of first video block.

Step S43, using second video block and its follow-up video block as current video block, successively by a upper video The notice matrix of block and the feature stream matrix of current video block, input shot and long term memory models are handled, obtain working as forward sight The notice matrix of frequency block.

In a kind of possible implementation, frame length T has the information in time series.It is high according to the wide W of feature, feature The T*W*H of H and frame length T compositions is the notice matrix (L) of each video block.Notice matrix (L) have video it is in the block when Between feature.By initial notice matrix L₀With the feature stream matrix F of first video block₁Input trained multilayer LSTM models In handled, obtain the notice matrix L of first video module₁.Again by the notice matrix L of first video block₁With The feature stream matrix F of two video blocks₂Input in trained LSTM models and handled, obtain the note of second video module Meaning power matrix L₂.And so on be iterated calculating, until obtain the notice matrix of all video blocks.

Compared to the spatial attention model of 2 traditional dimensions, the disclosure can not only adaptively pay close attention to information in video frame The region of Relatively centralized, while the key frame in video also can be adaptively filtered out, and keep a close eye on, so as to optimize video The effect of analysis.

Fig. 3 shows the flow chart of the video analysis method according to one embodiment of the disclosure, method as shown in Figure 3, such as On the basis of embodiment shown in Fig. 2, step S43 includes：

Step S431, the notice matrix of a upper video block and the feature stream matrix weights of current video block are summed, Obtain integration characteristics matrix.

Step S42, the integration characteristics Input matrix shot and long term memory models is handled, obtains current video block Notice matrix.

In a kind of possible implementation, using formula (1) to F₁And L₀Weighted sum, the feature f after being integrated₀ ∈R^D.By f₀It is sent into trained LSTM, exports the notice matrix L of second video block₁.Again by feature stream matrix F₂And L₁ It is weighted summation and obtains f₁.And so on, after n iteration, obtain the notice matrix L of n-th video block_n。

Fig. 6 shows the exemplary schematic diagram of application according to the video analysis method of one embodiment of the disclosure, left side one in Fig. 6 The single needle feature of single-frame images in a video block.Each layer plane represents the feature of a two field picture.Three-dimensional square in each layer Represent a vector value in single frames feature.It is understood that each position of vector value in the planes, i.e., each vector value Feature it is high and feature is wide, be the position of vector value corresponding pixel in single-frame images.The height of each vector value represent to Measure dimension.The single frames combinations of features of single-frame images together, forms the feature stream matrix of video block.Right side is video block in Fig. 6 Notice matrix, by feature is wide, feature is high and frame length forms.By the feature stream matrix in left side and the attention torque on right side Battle array, is integrated according to formula (1), obtains integration characteristics matrix.

Fig. 4 shows the flow chart of the video analysis method according to one embodiment of the disclosure, embodiment as shown in Figure 4, On the basis of above-described embodiment, the step S50 of this method includes：

Step S51, the notice matrix of the video block where single-frame images is averaged, obtains the single frames of single-frame images Vector.

Step S52, according to the single frames of all single-frame images vector, obtains the attention force vector of the video to be identified.

In a kind of possible implementation, when video to be identified is divided into different video blocks, same two field picture Multiple video blocks may be belonged to.By each pixel in single-frame images, in the notice matrix of different video block is belonged to Value is averaged.The single frames vector of single-frame images can be obtained according to the average value of each pixel.Again by all single-frame images Single frames vector connect, obtain the attention force vector of video to be identified.

The disclosure can be used for extracting the key frame information in video, available for information sifting.Since notice matrix is anti- What is reflected is influence degree of the different spatio-temporal regions to video analysis result, can on Spatial Dimension to notice Matrix Calculating and, obtain The vector consistent with frame number to length, in notice matrix, the frame more " key " of the higher position correspondence of vector value.

Fig. 5 shows the flow chart of the video analysis method according to one embodiment of the disclosure, on the basis of above-described embodiment, The step S40 of this method, further includes：

Step S60, obtains the class probability of current video block.

Step S70, class probability input grader is handled, obtains the video block classification of current video block.

Step S80, according to the video block classification of video block, determines the video classification of the video to be identified.

In a kind of possible implementation, trained LSTM models can also export the video class probability of video block Vector.

In a kind of possible implementation, LSTM models are trained using the video sample for identifying action classification, Trained LSTM models can export the action classification in video with user.After video block is inputted LSTM models, regarded While the notice matrix of frequency block, LSTM models can also export the action classification probability of video block.The output of LSTM modules Action classification probability, by softmax graders, obtains the action classification of video block.Fig. 7 is shown according to one embodiment of the disclosure Video analysis method the exemplary schematic diagram of application, as shown in fig. 7, LSTM models output notice matrix L while, lead to Cross the probability matrix that softmax graders provide different action classifications.In different action classification probability matrixs, fraction highest Be video classification.Nth iteration gains attention power matrix L_n, have recorded video block key message institute over time and space The position at place.The vector that length is T is obtained after spatially summing, the video frame of the larger position correspondence of vector value is relatively more It is crucial.

Fig. 8 shows the applicating flow chart of the video analysis method according to one embodiment of the disclosure.As shown in figure 8, it will wait to know Other video (leftmost side side's video in the block in figure), inputs trained CNN (convolutional neural networks) model, extracts feature frame by frame Obtain the single frames feature of single-frame images.Then video block is randomly selected in video to be identified.With reference to the notice of each video block Matrix, after being weighted summation to the feature stream matrix of each video block, inputs in multilayer LSTM models and is handled, respectively obtained The notice matrix and classification of motion result of each video block.The notice matrix of wherein each video block needs to be iterated calculating. For example, when people is watching video, not only notice spatially will be different, and the notice on time dimension also can It is different because of video content.For a video to be identified is the case played soccer, the video of first half be people to Run, and latter half is only sportsman's veritably striking action.Therefore in the video to be identified played soccer, using in the disclosure In the attention force vector for the video to be identified that method obtains, latter half striking action is endowed the weight of higher.With it is traditional Visual classification mode based on LSTM is comparatively, the disclosure obtains the nicety of grading of higher.

Fig. 9 shows the block diagram of the video analysis device according to one embodiment of the disclosure, as shown in figure 9, the device includes：

Single frames characteristic determination module 41, for by video input single frames identification model to be identified, obtaining described to be identified regard The single frames feature of single-frame images in frequency.

Video block division module 42, for according to frame length, start frame and identification step-length, the video to be identified to be divided For each video block.

Feature stream matrix deciding module 43, for the single frames feature and frame length of the single-frame images included according to each video block Degree, determines the feature stream matrix of each video block.

Notice matrix deciding module 44, for the feature stream matrix of initial notice matrix and video block, input to be grown Short-term memory model is handled, and obtains the notice matrix of video block.

Pay attention to force vector determining module 45, for the notice matrix according to video block, determine the video to be identified Pay attention to force vector.

Figure 10 shows the block diagram of the video analysis device according to one embodiment of the disclosure, in embodiment as shown in Figure 9 On the basis of,

In a kind of possible volume implementation, the notice matrix deciding module 44, including：

Initial notice matrix determination sub-module 441, for the feature according to single frames feature to be wide, the feature of single frames feature is high And the frame length, determine the initial notice matrix of the video block；

First notice matrix determination sub-module 442, for by the initial notice matrix and first video block Feature stream matrix, input shot and long term memory models are handled, and obtain the notice matrix of first video block；

Follow-up notice matrix determination sub-module 443, for using second video block and its follow-up video block as ought Preceding video block, successively by the notice matrix of a upper video block and the feature stream matrix of current video block, input shot and long term note Recall model to be handled, obtain the notice matrix of current video block.

In a kind of possible volume implementation, the follow-up notice matrix determination sub-module, including：

In a kind of possible volume implementation, the attention force vector determining module 45, including：

Single frames vector determination sub-module 451, for the notice matrix of the video block where single-frame images to be averaged, Obtain the single frames vector of single-frame images；

Summation submodule 452, for the single frames vector according to all single-frame images, obtains the attention of the video to be identified Force vector.

In a kind of possible volume implementation, the notice matrix deciding module 44, further includes：

Class probability determination sub-module 444, for obtaining the class probability of current video block；

Grader submodule 445, for class probability input grader to be handled, obtains current video block Video block classification；

Video classification determination sub-module 446, for the video block classification according to video block, determines the video to be identified Video classification.

Figure 11 is a kind of block diagram for video identification device 800 according to an exemplary embodiment.For example, device 800 can be mobile phone, computer, digital broadcast terminal, messaging devices, game console, tablet device, and medical treatment is set It is standby, body-building equipment, personal digital assistant etc..

With reference to Figure 11, device 800 can include following one or more assemblies：Processing component 802, memory 804, power supply Component 806, multimedia component 808, audio component 810, the interface 812 of input/output (I/O), sensor component 814, and Communication component 816.

The integrated operation of the usual control device 800 of processing component 802, such as with display, call, data communication, phase The operation that machine operates and record operation is associated.Processing component 802 can refer to including one or more processors 820 to perform Order, to complete all or part of step of above-mentioned method.In addition, processing component 802 can include one or more modules, just Interaction between processing component 802 and other assemblies.For example, processing component 802 can include multi-media module, it is more to facilitate Interaction between media component 808 and processing component 802.

Memory 804 is configured as storing various types of data to support the operation in device 800.These data are shown Example includes the instruction of any application program or method for being operated on device 800, and contact data, telephone book data, disappears Breath, picture, video etc..Memory 804 can be by any kind of volatibility or non-volatile memory device or their group Close and realize, as static RAM (SRAM), electrically erasable programmable read-only memory (EEPROM) are erasable to compile Journey read-only storage (EPROM), programmable read only memory (PROM), read-only storage (ROM), magnetic memory, flash Device, disk or CD.

Power supply module 806 provides electric power for the various assemblies of device 800.Power supply module 806 can include power management system System, one or more power supplys, and other components associated with generating, managing and distributing electric power for device 800.

Multimedia component 808 is included in the screen of one output interface of offer between described device 800 and user.One In a little embodiments, screen can include liquid crystal display (LCD) and touch panel (TP).If screen includes touch panel, screen Curtain may be implemented as touch-screen, to receive input signal from the user.Touch panel includes one or more touch sensings Device is to sense the gesture on touch, slip and touch panel.The touch sensor can not only sense touch or sliding action Border, but also detect and the duration and pressure associated with the touch or slide operation.In certain embodiments, more matchmakers Body component 808 includes a front camera and/or rear camera.When device 800 is in operator scheme, such as screening-mode or During video mode, front camera and/or rear camera can receive exterior multi-medium data.Each front camera and Rear camera can be a fixed optical lens system or have focusing and optical zoom capabilities.

Audio component 810 is configured as output and/or input audio signal.For example, audio component 810 includes a Mike Wind (MIC), when device 800 is in operator scheme, during such as call model, logging mode and speech recognition mode, microphone by with It is set to reception external audio signal.The received audio signal can be further stored in memory 804 or via communication set Part 816 is sent.In certain embodiments, audio component 810 further includes a loudspeaker, for exports audio signal.

I/O interfaces 812 provide interface between processing component 802 and peripheral interface module, and above-mentioned peripheral interface module can To be keyboard, click wheel, button etc..These buttons may include but be not limited to：Home button, volume button, start button and lock Determine button.

Sensor component 814 includes one or more sensors, and the state for providing various aspects for device 800 is commented Estimate.For example, sensor component 814 can detect opening/closed mode of device 800, and the relative positioning of component, for example, it is described Component is the display and keypad of device 800, and sensor component 814 can be with 800 1 components of detection device 800 or device Position change, the existence or non-existence that user contacts with device 800,800 orientation of device or acceleration/deceleration and device 800 Temperature change.Sensor component 814 can include proximity sensor, be configured to detect without any physical contact Presence of nearby objects.Sensor component 814 can also include optical sensor, such as CMOS or ccd image sensor, for into As being used in application.In certain embodiments, which can also include acceleration transducer, gyro sensors Device, Magnetic Sensor, pressure sensor or temperature sensor.

Communication component 816 is configured to facilitate the communication of wired or wireless way between device 800 and other equipment.Device 800 can access the wireless network based on communication standard, such as WiFi, 2G or 3G, or combinations thereof.In an exemplary implementation In example, communication component 816 receives broadcast singal or broadcast related information from external broadcasting management system via broadcast channel. In one exemplary embodiment, the communication component 816 further includes near-field communication (NFC) module, to promote junction service.Example Such as, in NFC module radio frequency identification (RFID) technology can be based on, Infrared Data Association (IrDA) technology, ultra wide band (UWB) technology, Bluetooth (BT) technology and other technologies are realized.

In the exemplary embodiment, device 800 can be believed by one or more application application-specific integrated circuit (ASIC), numeral Number processor (DSP), digital signal processing appts (DSPD), programmable logic device (PLD), field programmable gate array (FPGA), controller, microcontroller, microprocessor or other electronic components are realized, for performing the above method.

In the exemplary embodiment, a kind of non-volatile computer readable storage medium storing program for executing is additionally provided, such as including calculating The memory 804 of machine programmed instruction, above computer programmed instruction can be performed above-mentioned to complete by the processor 820 of device 800 Method.

The disclosure can be system, method and/or computer program product.Computer program product can include computer Readable storage medium storing program for executing, containing for making processor realize the computer-readable program instructions of various aspects of the disclosure.

Computer-readable recording medium can keep and store to perform the tangible of the instruction that uses of equipment by instruction Equipment.Computer-readable recording medium for example can be-- but be not limited to-- storage device electric, magnetic storage apparatus, optical storage Equipment, electromagnetism storage device, semiconductor memory apparatus or above-mentioned any appropriate combination.Computer-readable recording medium More specifically example (non exhaustive list) includes：Portable computer diskette, hard disk, random access memory (RAM), read-only deposit It is reservoir (ROM), erasable programmable read only memory (EPROM or flash memory), static RAM (SRAM), portable Compact disk read-only storage (CD-ROM), digital versatile disc (DVD), memory stick, floppy disk, mechanical coding equipment, for example thereon It is stored with punch card or groove internal projection structure and the above-mentioned any appropriate combination of instruction.Calculating used herein above Machine readable storage medium storing program for executing is not interpreted instantaneous signal in itself, and the electromagnetic wave of such as radio wave or other Free propagations, lead to Cross the electromagnetic wave (for example, the light pulse for passing through fiber optic cables) of waveguide or the propagation of other transmission mediums or transmitted by electric wire Electric signal.

Computer-readable program instructions as described herein can be downloaded to from computer-readable recording medium it is each calculate/ Processing equipment, or outer computer or outer is downloaded to by network, such as internet, LAN, wide area network and/or wireless network Portion's storage device.Network can include copper transmission cable, optical fiber is transmitted, is wirelessly transferred, router, fire wall, interchanger, gateway Computer and/or Edge Server.Adapter or network interface in each calculating/processing equipment are received from network to be counted Calculation machine readable program instructions, and the computer-readable program instructions are forwarded, for the meter being stored in each calculating/processing equipment In calculation machine readable storage medium storing program for executing.

For perform the disclosure operation computer program instructions can be assembly instruction, instruction set architecture (ISA) instruction, Machine instruction, machine-dependent instructions, microcode, firmware instructions, condition setup data or with one or more programming languages The source code or object code that any combination is write, programming language of the programming language including object-oriented-such as Smalltalk, C++ etc., and conventional procedural programming languages-such as " C " language or similar programming language.Computer Readable program instructions fully can on the user computer perform, partly perform on the user computer, be only as one Vertical software kit performs, part performs or completely in remote computer on the remote computer on the user computer for part Or performed on server.In the situation of remote computer is related to, remote computer can pass through network-bag of any kind LAN (LAN) or wide area network (WAN)-be connected to subscriber computer are included, or, it may be connected to outer computer (such as profit Pass through Internet connection with ISP).In certain embodiments, by using computer-readable program instructions Status information carry out personalized customization electronic circuit, such as programmable logic circuit, field programmable gate array (FPGA) or can Programmed logic array (PLA) (PLA), the electronic circuit can perform computer-readable program instructions, so as to fulfill each side of the disclosure Face.

Referring herein to the method, apparatus (system) according to the embodiment of the present disclosure and the flow chart of computer program product and/ Or block diagram describes various aspects of the disclosure.It should be appreciated that each square frame and flow chart of flow chart and/or block diagram and/ Or in block diagram each square frame combination, can be realized by computer-readable program instructions.

These computer-readable program instructions can be supplied to all-purpose computer, special purpose computer or other programmable datas The processor of processing unit, so as to produce a kind of machine so that these instructions are passing through computer or other programmable datas When the processor of processing unit performs, generate and realize work(specified in one or more of flow chart and/or block diagram square frame The device of energy/action.These computer-readable program instructions can also be stored in a computer-readable storage medium, these refer to Order causes computer, programmable data processing unit and/or other equipment to work in a specific way, so that, it is stored with instruction Computer-readable medium then includes a manufacture, it includes realizing in one or more of flow chart and/or block diagram square frame The instruction of the various aspects of defined function/action.

Computer-readable program instructions can also be loaded into computer, other programmable data processing units or other In equipment so that series of operation steps is performed on computer, other programmable data processing units or miscellaneous equipment, with production Raw computer implemented process, so that performed on computer, other programmable data processing units or miscellaneous equipment Function/action specified in one or more of flow chart and/or block diagram square frame is realized in instruction.

Flow chart and block diagram in attached drawing show the system, method and computer journey of multiple embodiments according to the disclosure Architectural framework in the cards, function and the operation of sequence product.At this point, each square frame in flow chart or block diagram can generation One module of table, program segment or a part for instruction, the module, program segment or a part for instruction include one or more use In the executable instruction of logic function as defined in realization.At some as the function of in the realization replaced, being marked in square frame Can be with different from the order marked in attached drawing generation.For example, two continuous square frames can essentially be held substantially in parallel OK, they can also be performed in the opposite order sometimes, this is depending on involved function.It is also noted that block diagram and/or The combination of each square frame and block diagram in flow chart and/or the square frame in flow chart, can use function or dynamic as defined in performing The dedicated hardware based system made is realized, or can be realized with the combination of specialized hardware and computer instruction.

The presently disclosed embodiments is described above, described above is exemplary, and non-exclusive, and It is not limited to disclosed each embodiment.In the case of without departing from the scope and spirit of illustrated each embodiment, for this skill Many modifications and changes will be apparent from for the those of ordinary skill in art field.The selection of term used herein, purport Best explain the principle of each embodiment, practical application or technological improvement to the technology in market, or lead this technology Other those of ordinary skill in domain are understood that each embodiment disclosed herein.

Claims

A kind of 1. video analysis method, it is characterised in that the described method includes：

By video input single frames identification model to be identified, the single frames feature of single-frame images in the video to be identified is obtained；

According to frame length, start frame and identification step-length, the video to be identified is divided into each video block；

The single frames feature and frame length of the single-frame images included according to each video block, determine the feature stream matrix of each video block；

By initial notice matrix and the feature stream matrix of video block, input shot and long term memory models is handled, and obtains video The notice matrix of block；

According to the notice matrix of video block, the attention force vector of the video to be identified is determined.
2. according to the method described in claim 1, it is characterized in that, by initial notice matrix and the feature stream square of video block Battle array, input shot and long term memory models are handled, and obtain the notice matrix of video block, including：

According to the feature of single frames feature is wide, the feature of single frames feature is high and the frame length, the initial of the video block is determined Notice matrix；

The initial notice matrix and the feature stream matrix of first video block are inputted at shot and long term memory models Reason, obtains the notice matrix of first video block；

Using second video block and its follow-up video block as current video block, successively by the attention torque of a upper video block The feature stream matrix of battle array and current video block, input shot and long term memory models are handled, and obtain the notice of current video block Matrix.
3. according to the method described in claim 2, it is characterized in that, notice matrix and current video by a upper video block The feature stream matrix of block, input shot and long term memory models are handled, and obtain the notice matrix of current video block, including：

The notice matrix of a upper video block and the feature stream matrix weights of current video block are summed, obtain integration characteristics square Battle array；

The integration characteristics Input matrix shot and long term memory models is handled, obtains the notice matrix of current video block.
4. according to the method described in claim 1, it is characterized in that, according to the notice matrix of video block, determine described to wait to know The attention force vector of other video, including：

The notice matrix of video block where single-frame images is averaged, obtains the single frames vector of single-frame images；

According to the single frames of all single-frame images vector, the attention force vector of the video to be identified is obtained.
5. method according to any one of claim 1 to 4, it is characterised in that by initial notice matrix and video block Feature stream matrix, input shot and long term memory models handled, obtain the notice matrix of video block, further include：

Obtain the class probability of current video block；

Class probability input grader is handled, obtains the video block classification of current video block；

According to the video block classification of video block, the video classification of the video to be identified is determined.
A kind of 6. video analysis device, it is characterised in that including：

Single frames characteristic determination module, it is single in the video to be identified for by video input single frames identification model to be identified, obtaining The single frames feature of two field picture；

Video block division module, for according to frame length, start frame and identification step-length, the video to be identified being divided into and is respectively regarded Frequency block；

Feature stream matrix deciding module, for the single frames feature and frame length of the single-frame images included according to each video block, determines The feature stream matrix of each video block；

Notice matrix deciding module, for the feature stream matrix of initial notice matrix and video block, input shot and long term to be remembered Recall model to be handled, obtain the notice matrix of video block；

Pay attention to force vector determining module, for the notice matrix according to video block, determine the notice of the video to be identified Vector.
7. device according to claim 6, it is characterised in that the notice matrix deciding module, including：

Initial notice matrix determination sub-module, for the feature according to single frames feature to be wide, the feature of single frames feature is high and institute Frame length is stated, determines the initial notice matrix of the video block；

First notice matrix determination sub-module, for by the feature stream square of the initial notice matrix and first video block Battle array, input shot and long term memory models are handled, and obtain the notice matrix of first video block；

Follow-up notice matrix determination sub-module, for using second video block and its follow-up video block as current video Block, successively by the notice matrix of a upper video block and the feature stream matrix of current video block, inputs shot and long term memory models Handled, obtain the notice matrix of current video block.
8. device according to claim 7, it is characterised in that the follow-up notice matrix determination sub-module, including：

Submodule is integrated, for the feature stream matrix weights of the notice matrix of a upper video block and current video block to be asked With obtain integration characteristics matrix；

Shot and long term memory models handles submodule, for by the integration characteristics Input matrix shot and long term memory models Reason, obtains the notice matrix of current video block.
9. device according to claim 6, it is characterised in that the attention force vector determining module, including：

Single frames vector determination sub-module, for the notice matrix of the video block where single-frame images to be averaged, obtains list The single frames vector of two field picture；

Summation submodule, for the single frames vector according to all single-frame images, obtains the attention force vector of the video to be identified.
10. the device according to any one of claim 6 to 9, it is characterised in that the notice matrix deciding module, Further include：

Class probability determination sub-module, for obtaining the class probability of current video block；

Grader submodule, for class probability input grader to be handled, obtains the video block of current video block Classification；

Video classification determination sub-module, for the video block classification according to video block, determines the video class of the video to be identified Not.
A kind of 11. video analysis device, it is characterised in that including：

Processor；

For storing the memory of processor-executable instruction；

Wherein, the processor is configured as：The method described in any one in claim 1 to 5 is realized during execution.
12. a kind of non-volatile computer readable storage medium storing program for executing, is stored thereon with computer program instructions, it is characterised in that institute State the method realized when computer program instructions are executed by processor in claim 1 to 5 described in any one.