CN109977872A

CN109977872A - Motion detection method, device, electronic equipment and computer readable storage medium

Info

Publication number: CN109977872A
Application number: CN201910239759.8A
Authority: CN
Inventors: 吴益灵; 张弛
Original assignee: Beijing Maigewei Technology Co Ltd
Current assignee: Beijing Megvii Technology Co Ltd; Beijing Maigewei Technology Co Ltd
Priority date: 2019-03-27
Filing date: 2019-03-27
Publication date: 2019-07-05
Anticipated expiration: 2039-03-27
Also published as: CN109977872B

Abstract

This application provides a kind of motion detection method, device, electronic equipment and computer readable storage mediums, are related to field of image processing.This method comprises: based on preset target detection model, detect multiple first main objects in image to be detected, and obtain the bounding box of each first main object, wherein, first main object includes people and object, then, obtain the feature vector of the regular length of image-region in each bounding box, and the location information of each bounding box, again using the feature vector of regular length and the location information, generate the feature vector comprising location information of each bounding box, the feature vector comprising location information of each bounding box is inputted to preset classification of motion model, determine the type of action between each first main object.The application can detect the interaction relationship between the relationship in image between all main objects, especially all persons and object.

Description

Motion detection method, device, electronic equipment and computer readable storage medium

Technical field

This application involves technical field of image processing, specifically, this application involves a kind of motion detection method, device, Electronic equipment and computer readable storage medium.

Background technique

With the development of computer vision technique, the technology of object detection becomes more and more mature, in scene understanding, removes It needs to detect except main object, it is also necessary to identify the relationship between main object.For example, in detection personage and object Between relationship when, for a given sub-picture, need first to detect personage and object, then identify personage and object it Between relationship.

But existing detection method can only be for personage and object pairs of in image, to detect pass between the two System can not be detected for the relationship between main object all in image.

Summary of the invention

It, can be with this application provides a kind of method, apparatus of motion detection, electronic equipment and computer readable storage medium Solving existing detection method can only be for personage and object pairs of in image, to detect relationship between the two, Wu Fazhen The problem of relationship between main object all in image is detected.The technical solution is as follows:

In a first aspect, a kind of motion detection method is provided, this method comprises:

Based on preset target detection model, multiple first main objects in image to be detected are detected, and obtain each The bounding box of a first main object, wherein first main object includes people and object；

The position of the feature vector of the regular length of image-region and each bounding box in each bounding box is obtained to believe Breath；

Feature vector and the location information using the regular length, generate each bounding box includes location information Feature vector；

The feature vector comprising location information of each bounding box is inputted to preset classification of motion model, is determined each Type of action between first main object.

Preferably, described the step of obtaining the feature vector of the regular length of image-region in each bounding box, specifically wraps It includes:

Feature extraction is carried out to image in each bounding box using trained convolutional neural networks model, is obtained each solid The feature vector of measured length.

Preferably, the feature vector using the regular length and the location information, generate each bounding box Feature vector comprising location information, comprising:

The feature vector of the regular length and the location information are spliced, the packet of each bounding box is obtained Feature vector containing location information.

Preferably, the preset classification of motion device is provided with converter；

It is described that the feature vector comprising location information of each bounding box is inputted to preset classification of motion model, it determines The step of type of action between each first main object, comprising:

The feature vector comprising location information of each bounding box is inputted into the converter, obtains described image to be detected The feature vector comprising global information；

Based on the feature vector comprising global information, the type of action between each first main object is determined.

Preferably, the converter is provided with N layers, and every layer is made of an attention model and a feedforward network, In, N is positive integer；

The feature vector comprising global information by the output of n-th layer feedforward network as described image to be detected.

Preferably, the location information of the bounding box includes the centre coordinate of the bounding box, length and width.

Preferably, the feature vector comprising location information includes the feature vector of people and the feature vector of object； The preset classification of motion device is provided with action prediction device；

The action prediction device is differentiated everyone feature vector and the feature vector of each object two-by-two, is obtained Type of action between people and object.

Second aspect provides a kind of action detection device, which includes:

Module of target detection, for detecting multiple first in image to be detected based on preset target detection model Main object, and obtain the bounding box of each first main object, wherein first main object includes people and object；

Module is obtained, for obtaining the feature vector of the regular length of image-region and each side in each bounding box The location information of boundary's frame；

Generation module, feature vector and the location information for the use regular length, generates each bounding box The feature vector comprising location information；

Classification of motion module, for the feature vector comprising location information of each bounding box to be inputted preset movement point Class model determines the type of action between each first main object.

Preferably, the acquisition module is specifically used for, and calls trained convolutional neural networks model to each bounding box Interior image carries out feature extraction, obtains the feature vector of each regular length.

Preferably, the generation module is specifically used for, by the feature vector of the regular length and the location information into Row splicing, obtains the feature vector comprising location information of each bounding box.

The classification of motion module includes:

Input submodule is obtained for the feature vector comprising location information of each bounding box to be inputted the converter To the feature vector comprising global information of described image to be detected；

Output sub-module, for based on the feature vector comprising global information, determine each first main object it Between type of action.

Preferably, the converter is provided with N layers, and every layer is made of an attention model and a feedforward network, In, N is positive integer；The feature vector comprising global information by the output of n-th layer feedforward network as described image to be detected.

Preferably, the feature vector comprising location information includes the feature vector of people and the feature vector of object； The preset classification of motion device is provided with action prediction device；The action prediction device is by everyone feature vector and each object The feature vector of body is differentiated two-by-two, obtains the type of action between people and object.

The third aspect provides a kind of electronic equipment, which includes:

Processor, memory and bus；

The bus, for connecting the processor and the memory；

The memory, for storing operational order；

The processor, for by calling the operational order, executable instruction to execute processor such as the application The corresponding operation of motion detection method shown in first aspect.

Fourth aspect provides a kind of computer readable storage medium, calculating is stored on computer readable storage medium Machine program, the program realize motion detection method shown in the application first aspect when being executed by processor.

Technical solution provided by the present application has the benefit that

For image to be detected, based on preset object detection model, multiple first in image to be detected are detected Then main object based on preset classification of motion model, is detected and is closed between multiple first main objects with the presence or absence of movement Connection, if so, determining and marking the corresponding type of action of movement association.In this way, in compared to the prior art, it can only be in image Relationship between pairs of personage and object detects, and the application can detect the phase in image between all persons and object Interaction relationship.

Detailed description of the invention

In order to more clearly explain the technical solutions in the embodiments of the present application, institute in being described below to the embodiment of the present application Attached drawing to be used is needed to be briefly described.

Fig. 1 is a kind of flow diagram for motion detection method that the application one embodiment provides；

Fig. 2A is the example figure of image to be detected in the application；

Fig. 2 B is the example figure with bounding box in the application；

Fig. 3 is a kind of structural schematic diagram for action detection device that the another embodiment of the application provides；

Fig. 4 is a kind of structural schematic diagram of the electronic equipment for motion detection that the another embodiment of the application provides.

Specific embodiment

Embodiments herein is described below in detail, examples of the embodiments are shown in the accompanying drawings, wherein from beginning to end Same or similar label indicates same or similar element or element with the same or similar functions.Below with reference to attached The embodiment of figure description is exemplary, and is only used for explaining the application, and is not construed as limiting the claims.

Those skilled in the art of the present technique are appreciated that unless expressly stated, singular " one " used herein, " one It is a ", " described " and "the" may also comprise plural form.It is to be further understood that being arranged used in the description of the present application Diction " comprising " refer to that there are the feature, integer, step, operation, element and/or component, but it is not excluded that in the presence of or addition Other one or more features, integer, step, operation, element, component and/or their group.It should be understood that when we claim member Part is " connected " or when " coupled " to another element, it can be directly connected or coupled to other elements, or there may also be Intermediary element.In addition, " connection " used herein or " coupling " may include being wirelessly connected or wirelessly coupling.It is used herein to arrange Diction "and/or" includes one or more associated wholes for listing item or any cell and all combinations.

To keep the purposes, technical schemes and advantages of the application clearer, below in conjunction with attached drawing to the application embodiment party Formula is described in further detail.

Motion detection method, device, electronic equipment and computer readable storage medium provided by the present application, it is intended to solve existing There is the technical problem as above of technology.

How the technical solution of the application and the technical solution of the application are solved with specifically embodiment below above-mentioned Technical problem is described in detail.These specific embodiments can be combined with each other below, for the same or similar concept Or process may repeat no more in certain embodiments.Below in conjunction with attached drawing, embodiments herein is described.

A kind of motion detection method is provided in one embodiment, as shown in Figure 1, this method comprises:

Step S101 detects multiple first main objects in image to be detected based on preset target detection model, And obtain the bounding box of each first main object, wherein first main object includes people and object；

In practical applications, image to be detected is first inputted to preset object detection model, preset object detection mould Type will detect that multiple main objects in image, and is labeled multiple main objects using bounding box.As shown in Figure 2 A, Assuming that Fig. 2A is image to be detected, Fig. 2A is inputted to preset object detection model, preset object detection model output is such as Image shown in Fig. 2 B has marked out 4 main objects, has been respectively: people, umbrella, bicycle, bicycle basket in fig. 2b.Than Such as, school bag is carried for child in image to be detected, then the main object in the image is exactly child and school bag.Alternatively, can also To be multiple personages and multiple objects, for example, having multiple child's queuing station in image to be detected together, each child carries on the back one School bag, then the main object in the image is then all children and all school bags.

Further, image to be detected can be the picture of the formats such as picture, such as BMP, JPG, SWF, CDR, AI, can also be with Be the image of the frame image in video or other forms, be suitable for the application, the application to this with no restriction.

It should be noted that can directly be illustrated in the picture, can also be illustrated by other forms, than As illustrated then text and image to be inputted object detection model in the text together, can also carried out by other forms Illustrate, in short, can illustrate between each main object and each main object in training image movement association and it is right The form for the type of action answered, is suitable for the application, the application to this with no restriction.

Before the preset object detection model of training, a large amount of image can be first collected, then marks out the institute in image There is the movement between main object and each main object, the image being marked is exactly training image, for example, in Fig. 2 B In illustrate movement between each main object again, for example there is movement association between people and umbrella, and corresponding movement is " taking ", Just there is no movement associations between people and bicycle basket, also just without corresponding movement, and so on, by all main objects Between movement be all illustrated after, Fig. 2 B can be as training image.

For a width training image, training image is inputted to preset object detection model, which can be with It is Faster R-CNN model, which uses RPN (region proposition network) to carry out prediction ROI (sense to the training image first Interest region), then object classification is carried out on the basis of ROI, and return offset of the detection bounding box relative to ROI, to obtain Then each width training image is all similarly operated by the object detection model after training image training, thus It obtains through the object detection model after the training of a large amount of training images.

Step S102 obtains the feature vector of the regular length of image-region and each bounding box in each bounding box Location information；

In a kind of preferred embodiment of the present invention, the feature for obtaining the regular length of image-region in each bounding box The step of vector, specifically includes:

Specifically, being ROI Pooling (area-of-interest pond) to each bounding box detected, convolutional layer is connect, The feature vector of the regular length of each bounding box is obtained, then detects the location information of each bounding box, wherein the bounding box Location information include the centre coordinate of the bounding box, length and width (as unit of pixel).

Step S103, feature vector and the location information using the regular length, generates the packet of each bounding box Feature vector containing location information；

It is described to be believed using the feature vector of the regular length and the position in a kind of preferred embodiment of the present invention Breath, generates the feature vector comprising location information of each bounding box, comprising:

Specifically, the centre coordinate of detection bounding box and the length of bounding box and it is wide and, by each bounding box and its position Confidence breath is spliced, and the feature vector f comprising location information of each bounding box is obtained_i∈R1×d_f, i is under bounding box Mark.For example, it is assumed that 1) feature vector obtained in is a_i, location information is (i.e. in first eigenvector a in splicing_iStitching position afterwards Information) obtain the feature vector f comprising location information_i=[a_i,x_i,y_i,w_i,h_i], x, y are the coordinates of bounding box center, W, h is the length and width of bounding box.

The feature vector comprising location information of each bounding box is inputted preset classification of motion model by step S104, Determine the type of action between each first main object.

In a kind of preferred embodiment of the present invention, the feature vector comprising location information include the feature vector of people with And the feature vector of object；The preset classification of motion device is provided with action prediction device and converter；

The action prediction device is differentiated everyone feature vector and the feature vector of each object two-by-two, is obtained Type of action between people and object；

Wherein, the converter is provided with N layers, and every layer is made of an attention model and a feedforward network, wherein N is positive integer；The feature vector comprising global information by the output of n-th layer feedforward network as described image to be detected.

Specifically, each feature vector comprising location information is inputted Transformer (converter).

Transformer structure is used from attention mechanism, includes two foundation structures: bull attention and forward direction net Network.

When converter initialization, need first to provide being defined from attention for bull, wherein bull from attention It is namely multiple from attention.

Defining an attention is a head, i.e.,

Wherein,It is input,It is parameter matrix.

Bull from attention, that is, multiple attention, that is, MultiHead (Q, K, V)=Concat (head₁,…, head_h)W^o, hereIt is parameter matrix.

In order to use the second feature vector for combining multiple bounding boxes from attention mechanism of Transformer, first splice The second feature vector of one bounding box obtains(it is based on multiple f_iObtain F), Pass through the available MultiHead of attention (F, F, F)=B of bull (i.e. input F obtains B) again, wherein be in square brackets by For multiple second feature vectors by the set of column splicing, B is the feature of output.

A feedforward network, FFN (B)=max (0, BW are again₁+b₁)W₂+b₂=D (obtains D based on multiple B), is counting Need to carry out broadcast operation (broadcasting in numpy), the feature vector inconsistent for dimension or square when calculation When battle array carries out operation, the dimension of feature vector or matrix is expanded automatically, so that a feedforward network is obtained, in this way, one A attention adds one layer of a feedforward network composition Transformer, and Transformer can have multilayer in practice, often Layer is all made of an attention and a feedforward network.Wherein, b₁And b₂It is row vector.

That is, it is assumed that converter has five layers, and the input of the attention of first layer is F, and obtained result is as first The input of a feedforward network, input of the output of first feedforward network as second layer attention, second layer attention it is defeated Input as second layer feedforward network out, and so on, until the feedforward network of layer 5 exports result；

Further, the third feature vector comprising each main object in picture is exported (namely from Transformer Say, 3) be for all main objects in image), then, judge that all persons are associated with object to the presence or absence of movement.Sentence It is disconnected to use the logistic regression of two classification with the presence or absence of movement association.It extracts by 3) handling obtained feature, splices one It rises and obtains [d_i,d_j], input classifier p_i,j=sigmoid (W_a[d_i,d_j]+b_a), it is that feature is spliced by row in square brackets, W_aIt is The matrix of a linear transformation, b_aIt is shift term, D=[d₁；d₂；…；d_n], classification personage i is associated with object j in the presence of movement.

To the type of action being related in the presence of the associated personage of movement and object using the classification of third feature vector.

For example, it is assumed that the second main object includes a personage and an object, for the personage i and object j detected, It extracts by 3) handling obtained feature, is stitched together to obtain [d_i,d_j], input classifier p_i,j,r=sigmoid (W_r[d_i, d_j]+b_r), it is that feature is spliced by row in square brackets, W_rIt is the matrix of a linear transformation, b_rIt is shift term, D=[d₁；d₂；…；d_n], point The movement that class personage i and object j are related to because movement between personage and object may there are many, use logistic regression The movement of prediction personage and object simultaneously uses cross entropy as loss function, total loss are as follows:

Wherein N be i, j, r traversal total item, such as People and school bag, classification people is to hold school bag, still carries school bag, classification movement is " taking " or " back ".It is (long different from LSTM Short-term memory network) can only sequence each bounding box of processing feature vector, cause the operation distance between bounding box different, The application has used the converter based on attention mechanism, so that the connection distance between each bounding box is identical；

Further, optimize loss function using the optimization algorithm Adam declined based on gradient, training obtains classification of motion mould Type.

It should be noted that the splicing in the application, refers to merging multiple parameters.For example, fisrt feature to Amount is a_i, location information in splicing obtains the feature vector f comprising location information_i=[a_i,x_i,y_i,w_i,h_i], it is exactly by a_iWith Location information is fused to the (a i.e. in square brackets in a set_i,x_i,y_i,w_i,h_i), certainly, this set can be array, It can be matrix, can also be the set of other forms.

In embodiments of the present invention, image to be detected is detected to be detected based on preset target detection model Multiple first main objects in image, and obtain the bounding box of each first main object, wherein the first main object includes Then people and object obtain the position of the feature vector of the regular length of image-region and each bounding box in each bounding box Confidence breath, then feature vector and the location information using regular length, generate each bounding box includes location information The feature vector comprising location information of each bounding box is inputted preset classification of motion model by feature vector, is determined each Type of action between a first main object.In this way, in compared to the prior art, it can only be to personage pairs of in image and object Relationship between body is detected, and the application can detect the relationship in image between all main objects, especially owner Interaction relationship between object and object.

Fig. 3 is a kind of structural schematic diagram for action detection device that the another embodiment of the application provides, as shown in figure 3, this The device of embodiment may include:

Module of target detection 301, for based on preset target detection model, detecting multiple in image to be detected One main object, and obtain the bounding box of each first main object, wherein first main object includes people and object；

Module 302 is obtained, for obtaining in each bounding box the feature vector of the regular length of image-region and each The location information of bounding box；

Generation module 303, feature vector and the location information for the use regular length, generates each boundary The feature vector comprising location information of frame；

Classification of motion module 304, for the feature vector comprising location information of each bounding box to be inputted preset move Make disaggregated model, determines the type of action between each first main object.

In a kind of preferred embodiment of the present invention, the acquisition module is specifically used for, and calls trained convolutional Neural net Network model carries out feature extraction to image in each bounding box, obtains the feature vector of each regular length.

In a kind of preferred embodiment of the present invention, the generation module is specifically used for, by the feature of the regular length to Amount is spliced with the location information, obtains the feature vector comprising location information of each bounding box.

In a kind of preferred embodiment of the present invention, the preset classification of motion device is provided with converter；

The classification of motion module includes:

In a kind of preferred embodiment of the present invention, the converter is provided with N layers, and every layer by an attention model and one A feedforward network composition, wherein N is positive integer；By the output of n-th layer feedforward network as described image to be detected comprising complete The feature vector of office's information.

In a kind of preferred embodiment of the present invention, the location information of the bounding box includes that the center of the bounding box is sat Mark, length and width.

In a kind of preferred embodiment of the present invention, the feature vector comprising location information include the feature vector of people with And the feature vector of object；The preset classification of motion device is provided with action prediction device；The action prediction device by everyone Feature vector and the feature vector of each object differentiated two-by-two, obtain the type of action between people and object.

Motion detection method shown in the application one embodiment can be performed in the action detection device of the present embodiment, in fact Existing principle is similar, and details are not described herein again.

A kind of electronic equipment is provided in the another embodiment of the application, which includes: memory and processor； At least one program, is stored in memory, and when for being executed by processor, can realize compared with prior art: for be checked The image of survey detects multiple first main objects in image to be detected based on preset target detection model, and obtains each The bounding box of a first main object, wherein the first main object includes then people and object obtain figure in each bounding box As the feature vector of the regular length in region and the location information of each bounding box, then use the feature vector of regular length With the location information, the feature vector comprising location information of each bounding box is generated, what it is by each bounding box includes position The feature vector of information inputs preset classification of motion model, determines the type of action between each first main object.This Sample, compared to the prior art in, the relationship between personage pairs of in image and object can only be detected, the application can examine Measure the interaction relationship between the relationship in image between all main objects, especially all persons and object.

A kind of electronic equipment is provided in one alternate embodiment, as shown in figure 4, electronic equipment shown in Fig. 4 4000 It include: processor 4001 and memory 4003.Wherein, processor 4001 is connected with memory 4003, such as passes through 4002 phase of bus Even.Optionally, electronic equipment 4000 can also include transceiver 4004.It should be noted that transceiver 4004 in practical application It is not limited to one, the structure of the electronic equipment 4000 does not constitute the restriction to the embodiment of the present application.

Processor 4001 can be CPU, general processor, DSP, ASIC, FPGA or other programmable logic device, crystalline substance Body pipe logical device, hardware component or any combination thereof.It, which may be implemented or executes, combines described by present disclosure Various illustrative logic blocks, module and circuit.Processor 4001 is also possible to realize the combination of computing function, such as wraps It is combined containing one or more microprocessors, DSP and the combination of microprocessor etc..

Bus 4002 may include an access, and information is transmitted between said modules.Bus 4002 can be pci bus or Eisa bus etc..Bus 4002 can be divided into address bus, data/address bus, control bus etc..Only to be used in Fig. 4 convenient for indicating One thick line indicates, it is not intended that an only bus or a type of bus.

Memory 4003 can be ROM or can store the other kinds of static storage device of static information and instruction, RAM Or the other kinds of dynamic memory of information and instruction can be stored, it is also possible to EEPROM, CD-ROM or other CDs Storage, optical disc storage (including compression optical disc, laser disc, optical disc, Digital Versatile Disc, Blu-ray Disc etc.), magnetic disk storage medium Or other magnetic storage apparatus or can be used in carry or store have instruction or data structure form desired program generation Code and can by any other medium of computer access, but not limited to this.

Memory 4003 is used to store the application code for executing application scheme, and is held by processor 4001 to control Row.Processor 4001 is for executing the application code stored in memory 4003, to realize aforementioned either method embodiment Shown in content.

Wherein, electronic equipment includes but is not limited to: mobile phone, laptop, digit broadcasting receiver, PDA are (personal Digital assistants), PAD (tablet computer), PMP (portable media player), car-mounted terminal (such as vehicle mounted guidance terminal) etc. Deng mobile terminal and such as number TV, desktop computer etc. fixed terminal.

The another embodiment of the application provides a kind of computer readable storage medium, on the computer readable storage medium It is stored with computer program, when run on a computer, computer is executed corresponding in preceding method embodiment Content.Compared with prior art, image to be detected is detected based on preset target detection model for image to be detected In multiple first main objects, and obtain the bounding box of each first main object, wherein the first main object include people and Then object obtains the position of the feature vector of the regular length of image-region and each bounding box in each bounding box and believes Breath, then feature vector and the location information using regular length, generate the feature comprising location information of each bounding box The feature vector comprising location information of each bounding box is inputted preset classification of motion model by vector, determines each Type of action between one main object.In this way, in compared to the prior art, can only to personage pairs of in image and object it Between relationship detected, the application can detect the relationship in image between all main objects, especially all persons and Interaction relationship between object.

It should be understood that although each step in the flow chart of attached drawing is successively shown according to the instruction of arrow, These steps are not that the inevitable sequence according to arrow instruction successively executes.Unless expressly stating otherwise herein, these steps Execution there is no stringent sequences to limit, can execute in the other order.Moreover, at least one in the flow chart of attached drawing Part steps may include that perhaps these sub-steps of multiple stages or stage are not necessarily in synchronization to multiple sub-steps Completion is executed, but can be executed at different times, execution sequence, which is also not necessarily, successively to be carried out, but can be with other At least part of the sub-step or stage of step or other steps executes in turn or alternately.

The above is only some embodiments of the invention, it is noted that for the ordinary skill people of the art For member, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications are also answered It is considered as protection scope of the present invention.

Claims

1. a kind of motion detection method characterized by comprising

Based on preset target detection model, multiple first main objects in image to be detected are detected, and obtain each The bounding box of one main object, wherein first main object includes people and object；

Obtain the location information of the feature vector of the regular length of image-region and each bounding box in each bounding box；

Feature vector and the location information using the regular length, generate the spy comprising location information of each bounding box Levy vector；

The feature vector comprising location information of each bounding box is inputted to preset classification of motion model, determines each first Type of action between main object.

2. motion detection method according to claim 1, which is characterized in that described to obtain image-region in each bounding box Regular length feature vector the step of specifically include:

Feature extraction is carried out to image in each bounding box using trained convolutional neural networks model, is obtained each fixed long The feature vector of degree.

3. motion detection method according to claim 1, which is characterized in that the feature using the regular length to Amount and the location information, generate the feature vector comprising location information of each bounding box, comprising:

The feature vector of the regular length and the location information are spliced, obtain each bounding box includes position The feature vector of confidence breath.

4. motion detection method according to claim 1, which is characterized in that the preset classification of motion device, which is provided with, to be turned Parallel operation；

It is described that the feature vector comprising location information of each bounding box is inputted to preset classification of motion model, it determines each The step of type of action between first main object, comprising:

The feature vector comprising location information of each bounding box is inputted into the converter, obtains the packet of described image to be detected Feature vector containing global information；

5. motion detection method according to claim 4, which is characterized in that the converter is provided with N layers, and every layer by one A attention model and a feedforward network composition, wherein N is positive integer；Using the output of n-th layer feedforward network as it is described to The feature vector comprising global information of detection image.

6. motion detection method according to claim 1, which is characterized in that the location information of the bounding box includes described The centre coordinate of bounding box, length and width.

7. motion detection method according to claim 1, which is characterized in that the feature vector packet comprising location information Include the feature vector of people and the feature vector of object；The preset classification of motion device is provided with action prediction device；

The action prediction device differentiated everyone feature vector and the feature vector of each object two-by-two, obtain people with Type of action between object.

8. a kind of action detection device characterized by comprising

Module of target detection, for based on preset target detection model, detecting multiple first main bodys in image to be detected Object, and obtain the bounding box of each first main object, wherein first main object includes people and object；

Module is obtained, for obtaining the feature vector of the regular length of image-region and each bounding box in each bounding box Location information；

Generation module, feature vector and the location information for the use regular length, generates the packet of each bounding box Feature vector containing location information；

Classification of motion module, for the feature vector comprising location information of each bounding box to be inputted to preset classification of motion mould Type determines the type of action between each first main object.

9. a kind of electronic equipment, characterized in that it comprises:

Processor, memory and bus；

The bus, for connecting the processor and the memory；

The memory, for storing operational order；

The processor, for executing movement described in any one of the claims 1-7 by calling the operational order Detection method.

10. a kind of computer readable storage medium, which is characterized in that the computer storage medium refers to for storing computer It enables, when run on a computer, computer is allowed to execute movement described in any one of the claims 1 to 7 Detection method.