CN109977872A - Motion detection method, device, electronic equipment and computer readable storage medium - Google Patents

Motion detection method, device, electronic equipment and computer readable storage medium Download PDF

Info

Publication number
CN109977872A
CN109977872A CN201910239759.8A CN201910239759A CN109977872A CN 109977872 A CN109977872 A CN 109977872A CN 201910239759 A CN201910239759 A CN 201910239759A CN 109977872 A CN109977872 A CN 109977872A
Authority
CN
China
Prior art keywords
feature vector
bounding box
location information
image
motion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910239759.8A
Other languages
Chinese (zh)
Other versions
CN109977872B (en
Inventor
吴益灵
张弛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Megvii Technology Co Ltd
Beijing Maigewei Technology Co Ltd
Original Assignee
Beijing Maigewei Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Maigewei Technology Co Ltd filed Critical Beijing Maigewei Technology Co Ltd
Priority to CN201910239759.8A priority Critical patent/CN109977872B/en
Publication of CN109977872A publication Critical patent/CN109977872A/en
Application granted granted Critical
Publication of CN109977872B publication Critical patent/CN109977872B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

This application provides a kind of motion detection method, device, electronic equipment and computer readable storage mediums, are related to field of image processing.This method comprises: based on preset target detection model, detect multiple first main objects in image to be detected, and obtain the bounding box of each first main object, wherein, first main object includes people and object, then, obtain the feature vector of the regular length of image-region in each bounding box, and the location information of each bounding box, again using the feature vector of regular length and the location information, generate the feature vector comprising location information of each bounding box, the feature vector comprising location information of each bounding box is inputted to preset classification of motion model, determine the type of action between each first main object.The application can detect the interaction relationship between the relationship in image between all main objects, especially all persons and object.

Description

Motion detection method, device, electronic equipment and computer readable storage medium
Technical field
This application involves technical field of image processing, specifically, this application involves a kind of motion detection method, device, Electronic equipment and computer readable storage medium.
Background technique
With the development of computer vision technique, the technology of object detection becomes more and more mature, in scene understanding, removes It needs to detect except main object, it is also necessary to identify the relationship between main object.For example, in detection personage and object Between relationship when, for a given sub-picture, need first to detect personage and object, then identify personage and object it Between relationship.
But existing detection method can only be for personage and object pairs of in image, to detect pass between the two System can not be detected for the relationship between main object all in image.
Summary of the invention
It, can be with this application provides a kind of method, apparatus of motion detection, electronic equipment and computer readable storage medium Solving existing detection method can only be for personage and object pairs of in image, to detect relationship between the two, Wu Fazhen The problem of relationship between main object all in image is detected.The technical solution is as follows:
In a first aspect, a kind of motion detection method is provided, this method comprises:
Based on preset target detection model, multiple first main objects in image to be detected are detected, and obtain each The bounding box of a first main object, wherein first main object includes people and object;
The position of the feature vector of the regular length of image-region and each bounding box in each bounding box is obtained to believe Breath;
Feature vector and the location information using the regular length, generate each bounding box includes location information Feature vector;
The feature vector comprising location information of each bounding box is inputted to preset classification of motion model, is determined each Type of action between first main object.
Preferably, described the step of obtaining the feature vector of the regular length of image-region in each bounding box, specifically wraps It includes:
Feature extraction is carried out to image in each bounding box using trained convolutional neural networks model, is obtained each solid The feature vector of measured length.
Preferably, the feature vector using the regular length and the location information, generate each bounding box Feature vector comprising location information, comprising:
The feature vector of the regular length and the location information are spliced, the packet of each bounding box is obtained Feature vector containing location information.
Preferably, the preset classification of motion device is provided with converter;
It is described that the feature vector comprising location information of each bounding box is inputted to preset classification of motion model, it determines The step of type of action between each first main object, comprising:
The feature vector comprising location information of each bounding box is inputted into the converter, obtains described image to be detected The feature vector comprising global information;
Based on the feature vector comprising global information, the type of action between each first main object is determined.
Preferably, the converter is provided with N layers, and every layer is made of an attention model and a feedforward network, In, N is positive integer;
The feature vector comprising global information by the output of n-th layer feedforward network as described image to be detected.
Preferably, the location information of the bounding box includes the centre coordinate of the bounding box, length and width.
Preferably, the feature vector comprising location information includes the feature vector of people and the feature vector of object; The preset classification of motion device is provided with action prediction device;
The action prediction device is differentiated everyone feature vector and the feature vector of each object two-by-two, is obtained Type of action between people and object.
Second aspect provides a kind of action detection device, which includes:
Module of target detection, for detecting multiple first in image to be detected based on preset target detection model Main object, and obtain the bounding box of each first main object, wherein first main object includes people and object;
Module is obtained, for obtaining the feature vector of the regular length of image-region and each side in each bounding box The location information of boundary's frame;
Generation module, feature vector and the location information for the use regular length, generates each bounding box The feature vector comprising location information;
Classification of motion module, for the feature vector comprising location information of each bounding box to be inputted preset movement point Class model determines the type of action between each first main object.
Preferably, the acquisition module is specifically used for, and calls trained convolutional neural networks model to each bounding box Interior image carries out feature extraction, obtains the feature vector of each regular length.
Preferably, the generation module is specifically used for, by the feature vector of the regular length and the location information into Row splicing, obtains the feature vector comprising location information of each bounding box.
Preferably, the preset classification of motion device is provided with converter;
The classification of motion module includes:
Input submodule is obtained for the feature vector comprising location information of each bounding box to be inputted the converter To the feature vector comprising global information of described image to be detected;
Output sub-module, for based on the feature vector comprising global information, determine each first main object it Between type of action.
Preferably, the converter is provided with N layers, and every layer is made of an attention model and a feedforward network, In, N is positive integer;The feature vector comprising global information by the output of n-th layer feedforward network as described image to be detected.
Preferably, the location information of the bounding box includes the centre coordinate of the bounding box, length and width.
Preferably, the feature vector comprising location information includes the feature vector of people and the feature vector of object; The preset classification of motion device is provided with action prediction device;The action prediction device is by everyone feature vector and each object The feature vector of body is differentiated two-by-two, obtains the type of action between people and object.
The third aspect provides a kind of electronic equipment, which includes:
Processor, memory and bus;
The bus, for connecting the processor and the memory;
The memory, for storing operational order;
The processor, for by calling the operational order, executable instruction to execute processor such as the application The corresponding operation of motion detection method shown in first aspect.
Fourth aspect provides a kind of computer readable storage medium, calculating is stored on computer readable storage medium Machine program, the program realize motion detection method shown in the application first aspect when being executed by processor.
Technical solution provided by the present application has the benefit that
For image to be detected, based on preset object detection model, multiple first in image to be detected are detected Then main object based on preset classification of motion model, is detected and is closed between multiple first main objects with the presence or absence of movement Connection, if so, determining and marking the corresponding type of action of movement association.In this way, in compared to the prior art, it can only be in image Relationship between pairs of personage and object detects, and the application can detect the phase in image between all persons and object Interaction relationship.
Detailed description of the invention
In order to more clearly explain the technical solutions in the embodiments of the present application, institute in being described below to the embodiment of the present application Attached drawing to be used is needed to be briefly described.
Fig. 1 is a kind of flow diagram for motion detection method that the application one embodiment provides;
Fig. 2A is the example figure of image to be detected in the application;
Fig. 2 B is the example figure with bounding box in the application;
Fig. 3 is a kind of structural schematic diagram for action detection device that the another embodiment of the application provides;
Fig. 4 is a kind of structural schematic diagram of the electronic equipment for motion detection that the another embodiment of the application provides.
Specific embodiment
Embodiments herein is described below in detail, examples of the embodiments are shown in the accompanying drawings, wherein from beginning to end Same or similar label indicates same or similar element or element with the same or similar functions.Below with reference to attached The embodiment of figure description is exemplary, and is only used for explaining the application, and is not construed as limiting the claims.
Those skilled in the art of the present technique are appreciated that unless expressly stated, singular " one " used herein, " one It is a ", " described " and "the" may also comprise plural form.It is to be further understood that being arranged used in the description of the present application Diction " comprising " refer to that there are the feature, integer, step, operation, element and/or component, but it is not excluded that in the presence of or addition Other one or more features, integer, step, operation, element, component and/or their group.It should be understood that when we claim member Part is " connected " or when " coupled " to another element, it can be directly connected or coupled to other elements, or there may also be Intermediary element.In addition, " connection " used herein or " coupling " may include being wirelessly connected or wirelessly coupling.It is used herein to arrange Diction "and/or" includes one or more associated wholes for listing item or any cell and all combinations.
To keep the purposes, technical schemes and advantages of the application clearer, below in conjunction with attached drawing to the application embodiment party Formula is described in further detail.
Motion detection method, device, electronic equipment and computer readable storage medium provided by the present application, it is intended to solve existing There is the technical problem as above of technology.
How the technical solution of the application and the technical solution of the application are solved with specifically embodiment below above-mentioned Technical problem is described in detail.These specific embodiments can be combined with each other below, for the same or similar concept Or process may repeat no more in certain embodiments.Below in conjunction with attached drawing, embodiments herein is described.
A kind of motion detection method is provided in one embodiment, as shown in Figure 1, this method comprises:
Step S101 detects multiple first main objects in image to be detected based on preset target detection model, And obtain the bounding box of each first main object, wherein first main object includes people and object;
In practical applications, image to be detected is first inputted to preset object detection model, preset object detection mould Type will detect that multiple main objects in image, and is labeled multiple main objects using bounding box.As shown in Figure 2 A, Assuming that Fig. 2A is image to be detected, Fig. 2A is inputted to preset object detection model, preset object detection model output is such as Image shown in Fig. 2 B has marked out 4 main objects, has been respectively: people, umbrella, bicycle, bicycle basket in fig. 2b.Than Such as, school bag is carried for child in image to be detected, then the main object in the image is exactly child and school bag.Alternatively, can also To be multiple personages and multiple objects, for example, having multiple child's queuing station in image to be detected together, each child carries on the back one School bag, then the main object in the image is then all children and all school bags.
Further, image to be detected can be the picture of the formats such as picture, such as BMP, JPG, SWF, CDR, AI, can also be with Be the image of the frame image in video or other forms, be suitable for the application, the application to this with no restriction.
It should be noted that can directly be illustrated in the picture, can also be illustrated by other forms, than As illustrated then text and image to be inputted object detection model in the text together, can also carried out by other forms Illustrate, in short, can illustrate between each main object and each main object in training image movement association and it is right The form for the type of action answered, is suitable for the application, the application to this with no restriction.
Before the preset object detection model of training, a large amount of image can be first collected, then marks out the institute in image There is the movement between main object and each main object, the image being marked is exactly training image, for example, in Fig. 2 B In illustrate movement between each main object again, for example there is movement association between people and umbrella, and corresponding movement is " taking ", Just there is no movement associations between people and bicycle basket, also just without corresponding movement, and so on, by all main objects Between movement be all illustrated after, Fig. 2 B can be as training image.
For a width training image, training image is inputted to preset object detection model, which can be with It is Faster R-CNN model, which uses RPN (region proposition network) to carry out prediction ROI (sense to the training image first Interest region), then object classification is carried out on the basis of ROI, and return offset of the detection bounding box relative to ROI, to obtain Then each width training image is all similarly operated by the object detection model after training image training, thus It obtains through the object detection model after the training of a large amount of training images.
Step S102 obtains the feature vector of the regular length of image-region and each bounding box in each bounding box Location information;
In a kind of preferred embodiment of the present invention, the feature for obtaining the regular length of image-region in each bounding box The step of vector, specifically includes:
Feature extraction is carried out to image in each bounding box using trained convolutional neural networks model, is obtained each solid The feature vector of measured length.
Specifically, being ROI Pooling (area-of-interest pond) to each bounding box detected, convolutional layer is connect, The feature vector of the regular length of each bounding box is obtained, then detects the location information of each bounding box, wherein the bounding box Location information include the centre coordinate of the bounding box, length and width (as unit of pixel).
Step S103, feature vector and the location information using the regular length, generates the packet of each bounding box Feature vector containing location information;
It is described to be believed using the feature vector of the regular length and the position in a kind of preferred embodiment of the present invention Breath, generates the feature vector comprising location information of each bounding box, comprising:
The feature vector of the regular length and the location information are spliced, the packet of each bounding box is obtained Feature vector containing location information.
Specifically, the centre coordinate of detection bounding box and the length of bounding box and it is wide and, by each bounding box and its position Confidence breath is spliced, and the feature vector f comprising location information of each bounding box is obtainedi∈R1×df, i is under bounding box Mark.For example, it is assumed that 1) feature vector obtained in is ai, location information is (i.e. in first eigenvector a in splicingiStitching position afterwards Information) obtain the feature vector f comprising location informationi=[ai,xi,yi,wi,hi], x, y are the coordinates of bounding box center, W, h is the length and width of bounding box.
The feature vector comprising location information of each bounding box is inputted preset classification of motion model by step S104, Determine the type of action between each first main object.
In a kind of preferred embodiment of the present invention, the feature vector comprising location information include the feature vector of people with And the feature vector of object;The preset classification of motion device is provided with action prediction device and converter;
The action prediction device is differentiated everyone feature vector and the feature vector of each object two-by-two, is obtained Type of action between people and object;
It is described that the feature vector comprising location information of each bounding box is inputted to preset classification of motion model, it determines The step of type of action between each first main object, comprising:
The feature vector comprising location information of each bounding box is inputted into the converter, obtains described image to be detected The feature vector comprising global information;
Based on the feature vector comprising global information, the type of action between each first main object is determined.
Wherein, the converter is provided with N layers, and every layer is made of an attention model and a feedforward network, wherein N is positive integer;The feature vector comprising global information by the output of n-th layer feedforward network as described image to be detected.
Specifically, each feature vector comprising location information is inputted Transformer (converter).
Transformer structure is used from attention mechanism, includes two foundation structures: bull attention and forward direction net Network.
When converter initialization, need first to provide being defined from attention for bull, wherein bull from attention It is namely multiple from attention.
Defining an attention is a head, i.e.,
Wherein,It is input,It is parameter matrix.
Bull from attention, that is, multiple attention, that is, MultiHead (Q, K, V)=Concat (head1,…, headh)Wo, hereIt is parameter matrix.
In order to use the second feature vector for combining multiple bounding boxes from attention mechanism of Transformer, first splice The second feature vector of one bounding box obtains(it is based on multiple fiObtain F), Pass through the available MultiHead of attention (F, F, F)=B of bull (i.e. input F obtains B) again, wherein be in square brackets by For multiple second feature vectors by the set of column splicing, B is the feature of output.
A feedforward network, FFN (B)=max (0, BW are again1+b1)W2+b2=D (obtains D based on multiple B), is counting Need to carry out broadcast operation (broadcasting in numpy), the feature vector inconsistent for dimension or square when calculation When battle array carries out operation, the dimension of feature vector or matrix is expanded automatically, so that a feedforward network is obtained, in this way, one A attention adds one layer of a feedforward network composition Transformer, and Transformer can have multilayer in practice, often Layer is all made of an attention and a feedforward network.Wherein, b1And b2It is row vector.
That is, it is assumed that converter has five layers, and the input of the attention of first layer is F, and obtained result is as first The input of a feedforward network, input of the output of first feedforward network as second layer attention, second layer attention it is defeated Input as second layer feedforward network out, and so on, until the feedforward network of layer 5 exports result;
Further, the third feature vector comprising each main object in picture is exported (namely from Transformer Say, 3) be for all main objects in image), then, judge that all persons are associated with object to the presence or absence of movement.Sentence It is disconnected to use the logistic regression of two classification with the presence or absence of movement association.It extracts by 3) handling obtained feature, splices one It rises and obtains [di,dj], input classifier pi,j=sigmoid (Wa[di,dj]+ba), it is that feature is spliced by row in square brackets, WaIt is The matrix of a linear transformation, baIt is shift term, D=[d1;d2;…;dn], classification personage i is associated with object j in the presence of movement.
To the type of action being related in the presence of the associated personage of movement and object using the classification of third feature vector.
For example, it is assumed that the second main object includes a personage and an object, for the personage i and object j detected, It extracts by 3) handling obtained feature, is stitched together to obtain [di,dj], input classifier pi,j,r=sigmoid (Wr[di, dj]+br), it is that feature is spliced by row in square brackets, WrIt is the matrix of a linear transformation, brIt is shift term, D=[d1;d2;…;dn], point The movement that class personage i and object j are related to because movement between personage and object may there are many, use logistic regression The movement of prediction personage and object simultaneously uses cross entropy as loss function, total loss are as follows:
Wherein N be i, j, r traversal total item, such as People and school bag, classification people is to hold school bag, still carries school bag, classification movement is " taking " or " back ".It is (long different from LSTM Short-term memory network) can only sequence each bounding box of processing feature vector, cause the operation distance between bounding box different, The application has used the converter based on attention mechanism, so that the connection distance between each bounding box is identical;
Further, optimize loss function using the optimization algorithm Adam declined based on gradient, training obtains classification of motion mould Type.
It should be noted that the splicing in the application, refers to merging multiple parameters.For example, fisrt feature to Amount is ai, location information in splicing obtains the feature vector f comprising location informationi=[ai,xi,yi,wi,hi], it is exactly by aiWith Location information is fused to the (a i.e. in square brackets in a seti,xi,yi,wi,hi), certainly, this set can be array, It can be matrix, can also be the set of other forms.
In embodiments of the present invention, image to be detected is detected to be detected based on preset target detection model Multiple first main objects in image, and obtain the bounding box of each first main object, wherein the first main object includes Then people and object obtain the position of the feature vector of the regular length of image-region and each bounding box in each bounding box Confidence breath, then feature vector and the location information using regular length, generate each bounding box includes location information The feature vector comprising location information of each bounding box is inputted preset classification of motion model by feature vector, is determined each Type of action between a first main object.In this way, in compared to the prior art, it can only be to personage pairs of in image and object Relationship between body is detected, and the application can detect the relationship in image between all main objects, especially owner Interaction relationship between object and object.
Fig. 3 is a kind of structural schematic diagram for action detection device that the another embodiment of the application provides, as shown in figure 3, this The device of embodiment may include:
Module of target detection 301, for based on preset target detection model, detecting multiple in image to be detected One main object, and obtain the bounding box of each first main object, wherein first main object includes people and object;
Module 302 is obtained, for obtaining in each bounding box the feature vector of the regular length of image-region and each The location information of bounding box;
Generation module 303, feature vector and the location information for the use regular length, generates each boundary The feature vector comprising location information of frame;
Classification of motion module 304, for the feature vector comprising location information of each bounding box to be inputted preset move Make disaggregated model, determines the type of action between each first main object.
In a kind of preferred embodiment of the present invention, the acquisition module is specifically used for, and calls trained convolutional Neural net Network model carries out feature extraction to image in each bounding box, obtains the feature vector of each regular length.
In a kind of preferred embodiment of the present invention, the generation module is specifically used for, by the feature of the regular length to Amount is spliced with the location information, obtains the feature vector comprising location information of each bounding box.
In a kind of preferred embodiment of the present invention, the preset classification of motion device is provided with converter;
The classification of motion module includes:
Input submodule is obtained for the feature vector comprising location information of each bounding box to be inputted the converter To the feature vector comprising global information of described image to be detected;
Output sub-module, for based on the feature vector comprising global information, determine each first main object it Between type of action.
In a kind of preferred embodiment of the present invention, the converter is provided with N layers, and every layer by an attention model and one A feedforward network composition, wherein N is positive integer;By the output of n-th layer feedforward network as described image to be detected comprising complete The feature vector of office's information.
In a kind of preferred embodiment of the present invention, the location information of the bounding box includes that the center of the bounding box is sat Mark, length and width.
In a kind of preferred embodiment of the present invention, the feature vector comprising location information include the feature vector of people with And the feature vector of object;The preset classification of motion device is provided with action prediction device;The action prediction device by everyone Feature vector and the feature vector of each object differentiated two-by-two, obtain the type of action between people and object.
Motion detection method shown in the application one embodiment can be performed in the action detection device of the present embodiment, in fact Existing principle is similar, and details are not described herein again.
A kind of electronic equipment is provided in the another embodiment of the application, which includes: memory and processor; At least one program, is stored in memory, and when for being executed by processor, can realize compared with prior art: for be checked The image of survey detects multiple first main objects in image to be detected based on preset target detection model, and obtains each The bounding box of a first main object, wherein the first main object includes then people and object obtain figure in each bounding box As the feature vector of the regular length in region and the location information of each bounding box, then use the feature vector of regular length With the location information, the feature vector comprising location information of each bounding box is generated, what it is by each bounding box includes position The feature vector of information inputs preset classification of motion model, determines the type of action between each first main object.This Sample, compared to the prior art in, the relationship between personage pairs of in image and object can only be detected, the application can examine Measure the interaction relationship between the relationship in image between all main objects, especially all persons and object.
A kind of electronic equipment is provided in one alternate embodiment, as shown in figure 4, electronic equipment shown in Fig. 4 4000 It include: processor 4001 and memory 4003.Wherein, processor 4001 is connected with memory 4003, such as passes through 4002 phase of bus Even.Optionally, electronic equipment 4000 can also include transceiver 4004.It should be noted that transceiver 4004 in practical application It is not limited to one, the structure of the electronic equipment 4000 does not constitute the restriction to the embodiment of the present application.
Processor 4001 can be CPU, general processor, DSP, ASIC, FPGA or other programmable logic device, crystalline substance Body pipe logical device, hardware component or any combination thereof.It, which may be implemented or executes, combines described by present disclosure Various illustrative logic blocks, module and circuit.Processor 4001 is also possible to realize the combination of computing function, such as wraps It is combined containing one or more microprocessors, DSP and the combination of microprocessor etc..
Bus 4002 may include an access, and information is transmitted between said modules.Bus 4002 can be pci bus or Eisa bus etc..Bus 4002 can be divided into address bus, data/address bus, control bus etc..Only to be used in Fig. 4 convenient for indicating One thick line indicates, it is not intended that an only bus or a type of bus.
Memory 4003 can be ROM or can store the other kinds of static storage device of static information and instruction, RAM Or the other kinds of dynamic memory of information and instruction can be stored, it is also possible to EEPROM, CD-ROM or other CDs Storage, optical disc storage (including compression optical disc, laser disc, optical disc, Digital Versatile Disc, Blu-ray Disc etc.), magnetic disk storage medium Or other magnetic storage apparatus or can be used in carry or store have instruction or data structure form desired program generation Code and can by any other medium of computer access, but not limited to this.
Memory 4003 is used to store the application code for executing application scheme, and is held by processor 4001 to control Row.Processor 4001 is for executing the application code stored in memory 4003, to realize aforementioned either method embodiment Shown in content.
Wherein, electronic equipment includes but is not limited to: mobile phone, laptop, digit broadcasting receiver, PDA are (personal Digital assistants), PAD (tablet computer), PMP (portable media player), car-mounted terminal (such as vehicle mounted guidance terminal) etc. Deng mobile terminal and such as number TV, desktop computer etc. fixed terminal.
The another embodiment of the application provides a kind of computer readable storage medium, on the computer readable storage medium It is stored with computer program, when run on a computer, computer is executed corresponding in preceding method embodiment Content.Compared with prior art, image to be detected is detected based on preset target detection model for image to be detected In multiple first main objects, and obtain the bounding box of each first main object, wherein the first main object include people and Then object obtains the position of the feature vector of the regular length of image-region and each bounding box in each bounding box and believes Breath, then feature vector and the location information using regular length, generate the feature comprising location information of each bounding box The feature vector comprising location information of each bounding box is inputted preset classification of motion model by vector, determines each Type of action between one main object.In this way, in compared to the prior art, can only to personage pairs of in image and object it Between relationship detected, the application can detect the relationship in image between all main objects, especially all persons and Interaction relationship between object.
It should be understood that although each step in the flow chart of attached drawing is successively shown according to the instruction of arrow, These steps are not that the inevitable sequence according to arrow instruction successively executes.Unless expressly stating otherwise herein, these steps Execution there is no stringent sequences to limit, can execute in the other order.Moreover, at least one in the flow chart of attached drawing Part steps may include that perhaps these sub-steps of multiple stages or stage are not necessarily in synchronization to multiple sub-steps Completion is executed, but can be executed at different times, execution sequence, which is also not necessarily, successively to be carried out, but can be with other At least part of the sub-step or stage of step or other steps executes in turn or alternately.
The above is only some embodiments of the invention, it is noted that for the ordinary skill people of the art For member, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications are also answered It is considered as protection scope of the present invention.

Claims (10)

1. a kind of motion detection method characterized by comprising
Based on preset target detection model, multiple first main objects in image to be detected are detected, and obtain each The bounding box of one main object, wherein first main object includes people and object;
Obtain the location information of the feature vector of the regular length of image-region and each bounding box in each bounding box;
Feature vector and the location information using the regular length, generate the spy comprising location information of each bounding box Levy vector;
The feature vector comprising location information of each bounding box is inputted to preset classification of motion model, determines each first Type of action between main object.
2. motion detection method according to claim 1, which is characterized in that described to obtain image-region in each bounding box Regular length feature vector the step of specifically include:
Feature extraction is carried out to image in each bounding box using trained convolutional neural networks model, is obtained each fixed long The feature vector of degree.
3. motion detection method according to claim 1, which is characterized in that the feature using the regular length to Amount and the location information, generate the feature vector comprising location information of each bounding box, comprising:
The feature vector of the regular length and the location information are spliced, obtain each bounding box includes position The feature vector of confidence breath.
4. motion detection method according to claim 1, which is characterized in that the preset classification of motion device, which is provided with, to be turned Parallel operation;
It is described that the feature vector comprising location information of each bounding box is inputted to preset classification of motion model, it determines each The step of type of action between first main object, comprising:
The feature vector comprising location information of each bounding box is inputted into the converter, obtains the packet of described image to be detected Feature vector containing global information;
Based on the feature vector comprising global information, the type of action between each first main object is determined.
5. motion detection method according to claim 4, which is characterized in that the converter is provided with N layers, and every layer by one A attention model and a feedforward network composition, wherein N is positive integer;Using the output of n-th layer feedforward network as it is described to The feature vector comprising global information of detection image.
6. motion detection method according to claim 1, which is characterized in that the location information of the bounding box includes described The centre coordinate of bounding box, length and width.
7. motion detection method according to claim 1, which is characterized in that the feature vector packet comprising location information Include the feature vector of people and the feature vector of object;The preset classification of motion device is provided with action prediction device;
The action prediction device differentiated everyone feature vector and the feature vector of each object two-by-two, obtain people with Type of action between object.
8. a kind of action detection device characterized by comprising
Module of target detection, for based on preset target detection model, detecting multiple first main bodys in image to be detected Object, and obtain the bounding box of each first main object, wherein first main object includes people and object;
Module is obtained, for obtaining the feature vector of the regular length of image-region and each bounding box in each bounding box Location information;
Generation module, feature vector and the location information for the use regular length, generates the packet of each bounding box Feature vector containing location information;
Classification of motion module, for the feature vector comprising location information of each bounding box to be inputted to preset classification of motion mould Type determines the type of action between each first main object.
9. a kind of electronic equipment, characterized in that it comprises:
Processor, memory and bus;
The bus, for connecting the processor and the memory;
The memory, for storing operational order;
The processor, for executing movement described in any one of the claims 1-7 by calling the operational order Detection method.
10. a kind of computer readable storage medium, which is characterized in that the computer storage medium refers to for storing computer It enables, when run on a computer, computer is allowed to execute movement described in any one of the claims 1 to 7 Detection method.
CN201910239759.8A 2019-03-27 2019-03-27 Motion detection method and device, electronic equipment and computer readable storage medium Active CN109977872B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910239759.8A CN109977872B (en) 2019-03-27 2019-03-27 Motion detection method and device, electronic equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910239759.8A CN109977872B (en) 2019-03-27 2019-03-27 Motion detection method and device, electronic equipment and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN109977872A true CN109977872A (en) 2019-07-05
CN109977872B CN109977872B (en) 2021-09-17

Family

ID=67081127

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910239759.8A Active CN109977872B (en) 2019-03-27 2019-03-27 Motion detection method and device, electronic equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN109977872B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110378420A (en) * 2019-07-19 2019-10-25 Oppo广东移动通信有限公司 A kind of image detecting method, device and computer readable storage medium
CN110543877A (en) * 2019-09-04 2019-12-06 北京迈格威科技有限公司 Identification recognition method, training method and device of model thereof and electronic system
CN111753730A (en) * 2020-06-24 2020-10-09 国网电子商务有限公司 Image examination method and device
CN113632097A (en) * 2021-03-17 2021-11-09 商汤国际私人有限公司 Method, device, equipment and storage medium for predicting relevance between objects
CN114120160A (en) * 2022-01-25 2022-03-01 成都合能创越软件有限公司 Object space distinguishing method and device based on fast-RCNN, computer equipment and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105912991A (en) * 2016-04-05 2016-08-31 湖南大学 Behavior identification method based on 3D point cloud and key bone nodes
CN106529467A (en) * 2016-11-07 2017-03-22 南京邮电大学 Group behavior identification method based on multi-feature fusion
CN108492273A (en) * 2018-03-28 2018-09-04 深圳市唯特视科技有限公司 A kind of image generating method based on from attention model
CN108647591A (en) * 2018-04-25 2018-10-12 长沙学院 Activity recognition method and system in a kind of video of view-based access control model-semantic feature
CN108898067A (en) * 2018-06-06 2018-11-27 北京京东尚科信息技术有限公司 Determine the method, apparatus and computer readable storage medium of people and the object degree of association
CN109241536A (en) * 2018-09-21 2019-01-18 浙江大学 It is a kind of based on deep learning from the sentence sort method of attention mechanism
CN109271999A (en) * 2018-09-06 2019-01-25 北京京东尚科信息技术有限公司 Processing method, device and the computer readable storage medium of image

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105912991A (en) * 2016-04-05 2016-08-31 湖南大学 Behavior identification method based on 3D point cloud and key bone nodes
CN106529467A (en) * 2016-11-07 2017-03-22 南京邮电大学 Group behavior identification method based on multi-feature fusion
CN108492273A (en) * 2018-03-28 2018-09-04 深圳市唯特视科技有限公司 A kind of image generating method based on from attention model
CN108647591A (en) * 2018-04-25 2018-10-12 长沙学院 Activity recognition method and system in a kind of video of view-based access control model-semantic feature
CN108898067A (en) * 2018-06-06 2018-11-27 北京京东尚科信息技术有限公司 Determine the method, apparatus and computer readable storage medium of people and the object degree of association
CN109271999A (en) * 2018-09-06 2019-01-25 北京京东尚科信息技术有限公司 Processing method, device and the computer readable storage medium of image
CN109241536A (en) * 2018-09-21 2019-01-18 浙江大学 It is a kind of based on deep learning from the sentence sort method of attention mechanism

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
MING CHEN ET AL: "TVT: Two-View Transformer Network for Video Captioning", 《PROCEEDINGS OF MACHINE LEARNING RESEARCH》 *
YU-WEI CHAO等: "Learning to Detect Human-Object Interactions", 《2018 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110378420A (en) * 2019-07-19 2019-10-25 Oppo广东移动通信有限公司 A kind of image detecting method, device and computer readable storage medium
CN110543877A (en) * 2019-09-04 2019-12-06 北京迈格威科技有限公司 Identification recognition method, training method and device of model thereof and electronic system
CN111753730A (en) * 2020-06-24 2020-10-09 国网电子商务有限公司 Image examination method and device
CN113632097A (en) * 2021-03-17 2021-11-09 商汤国际私人有限公司 Method, device, equipment and storage medium for predicting relevance between objects
US11941838B2 (en) 2021-03-17 2024-03-26 Sensetime International Pte. Ltd. Methods, apparatuses, devices and storage medium for predicting correlation between objects
CN113632097B (en) * 2021-03-17 2024-07-19 商汤国际私人有限公司 Method, device, equipment and storage medium for predicting relevance between objects
CN114120160A (en) * 2022-01-25 2022-03-01 成都合能创越软件有限公司 Object space distinguishing method and device based on fast-RCNN, computer equipment and storage medium

Also Published As

Publication number Publication date
CN109977872B (en) 2021-09-17

Similar Documents

Publication Publication Date Title
CN109977872A (en) Motion detection method, device, electronic equipment and computer readable storage medium
Zheng et al. Cross-domain object detection through coarse-to-fine feature adaptation
Zang et al. Attention-based temporal weighted convolutional neural network for action recognition
CN112016475B (en) Human body detection and identification method and device
CN110276253A (en) A kind of fuzzy literal detection recognition method based on deep learning
CN107506707A (en) Using the Face datection of the small-scale convolutional neural networks module in embedded system
CN104412298B (en) Method and apparatus for changing image
Zhai et al. Group-split attention network for crowd counting
CN109829421B (en) Method and device for vehicle detection and computer readable storage medium
WO2022205329A1 (en) Object detection method, object detection apparatus, and object detection system
Zhang et al. Drone-based RGBT tiny person detection
Xu et al. Tiny FCOS: A lightweight anchor-free object detection algorithm for mobile scenarios
Peng et al. DRPN: Making CNN dynamically handle scale variation
Liu et al. Siamese network with bidirectional feature pyramid for small target tracking
Gupta et al. VehiPose: a multi-scale framework for vehicle pose estimation
CN116682076A (en) Multi-scale target detection method, system and equipment for ship safety supervision
Wanchaitanawong et al. Multi-modal pedestrian detection with misalignment based on modal-wise regression and multi-modal IoU
Feng et al. RTDOD: A large-scale RGB-thermal domain-incremental object detection dataset for UAVs
Zhao et al. Faster object detector for drone-captured images
Kandula et al. Deep end-to-end rolling shutter rectification
Luo et al. Spatial–temporal interaction module for action recognition
Chen et al. MIDFA: Memory-Based Instance Division and Feature Aggregation Network for Video Object Detection
Yan et al. Object tracking based on Siamese networks and attention mechanism
Shen et al. Vehicle detection method based on adaptive multi-scale feature fusion network
Xu et al. CFM-YOLOv5: CFPNet moudle and muti-target prediction head incorporating YOLOv5 for metal surface defect detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant