CN109977872A - Motion detection method, device, electronic equipment and computer readable storage medium - Google Patents
Motion detection method, device, electronic equipment and computer readable storage medium Download PDFInfo
- Publication number
- CN109977872A CN109977872A CN201910239759.8A CN201910239759A CN109977872A CN 109977872 A CN109977872 A CN 109977872A CN 201910239759 A CN201910239759 A CN 201910239759A CN 109977872 A CN109977872 A CN 109977872A
- Authority
- CN
- China
- Prior art keywords
- feature vector
- bounding box
- location information
- image
- motion
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
This application provides a kind of motion detection method, device, electronic equipment and computer readable storage mediums, are related to field of image processing.This method comprises: based on preset target detection model, detect multiple first main objects in image to be detected, and obtain the bounding box of each first main object, wherein, first main object includes people and object, then, obtain the feature vector of the regular length of image-region in each bounding box, and the location information of each bounding box, again using the feature vector of regular length and the location information, generate the feature vector comprising location information of each bounding box, the feature vector comprising location information of each bounding box is inputted to preset classification of motion model, determine the type of action between each first main object.The application can detect the interaction relationship between the relationship in image between all main objects, especially all persons and object.
Description
Technical field
This application involves technical field of image processing, specifically, this application involves a kind of motion detection method, device,
Electronic equipment and computer readable storage medium.
Background technique
With the development of computer vision technique, the technology of object detection becomes more and more mature, in scene understanding, removes
It needs to detect except main object, it is also necessary to identify the relationship between main object.For example, in detection personage and object
Between relationship when, for a given sub-picture, need first to detect personage and object, then identify personage and object it
Between relationship.
But existing detection method can only be for personage and object pairs of in image, to detect pass between the two
System can not be detected for the relationship between main object all in image.
Summary of the invention
It, can be with this application provides a kind of method, apparatus of motion detection, electronic equipment and computer readable storage medium
Solving existing detection method can only be for personage and object pairs of in image, to detect relationship between the two, Wu Fazhen
The problem of relationship between main object all in image is detected.The technical solution is as follows:
In a first aspect, a kind of motion detection method is provided, this method comprises:
Based on preset target detection model, multiple first main objects in image to be detected are detected, and obtain each
The bounding box of a first main object, wherein first main object includes people and object;
The position of the feature vector of the regular length of image-region and each bounding box in each bounding box is obtained to believe
Breath;
Feature vector and the location information using the regular length, generate each bounding box includes location information
Feature vector;
The feature vector comprising location information of each bounding box is inputted to preset classification of motion model, is determined each
Type of action between first main object.
Preferably, described the step of obtaining the feature vector of the regular length of image-region in each bounding box, specifically wraps
It includes:
Feature extraction is carried out to image in each bounding box using trained convolutional neural networks model, is obtained each solid
The feature vector of measured length.
Preferably, the feature vector using the regular length and the location information, generate each bounding box
Feature vector comprising location information, comprising:
The feature vector of the regular length and the location information are spliced, the packet of each bounding box is obtained
Feature vector containing location information.
Preferably, the preset classification of motion device is provided with converter;
It is described that the feature vector comprising location information of each bounding box is inputted to preset classification of motion model, it determines
The step of type of action between each first main object, comprising:
The feature vector comprising location information of each bounding box is inputted into the converter, obtains described image to be detected
The feature vector comprising global information;
Based on the feature vector comprising global information, the type of action between each first main object is determined.
Preferably, the converter is provided with N layers, and every layer is made of an attention model and a feedforward network,
In, N is positive integer;
The feature vector comprising global information by the output of n-th layer feedforward network as described image to be detected.
Preferably, the location information of the bounding box includes the centre coordinate of the bounding box, length and width.
Preferably, the feature vector comprising location information includes the feature vector of people and the feature vector of object;
The preset classification of motion device is provided with action prediction device;
The action prediction device is differentiated everyone feature vector and the feature vector of each object two-by-two, is obtained
Type of action between people and object.
Second aspect provides a kind of action detection device, which includes:
Module of target detection, for detecting multiple first in image to be detected based on preset target detection model
Main object, and obtain the bounding box of each first main object, wherein first main object includes people and object;
Module is obtained, for obtaining the feature vector of the regular length of image-region and each side in each bounding box
The location information of boundary's frame;
Generation module, feature vector and the location information for the use regular length, generates each bounding box
The feature vector comprising location information;
Classification of motion module, for the feature vector comprising location information of each bounding box to be inputted preset movement point
Class model determines the type of action between each first main object.
Preferably, the acquisition module is specifically used for, and calls trained convolutional neural networks model to each bounding box
Interior image carries out feature extraction, obtains the feature vector of each regular length.
Preferably, the generation module is specifically used for, by the feature vector of the regular length and the location information into
Row splicing, obtains the feature vector comprising location information of each bounding box.
Preferably, the preset classification of motion device is provided with converter;
The classification of motion module includes:
Input submodule is obtained for the feature vector comprising location information of each bounding box to be inputted the converter
To the feature vector comprising global information of described image to be detected;
Output sub-module, for based on the feature vector comprising global information, determine each first main object it
Between type of action.
Preferably, the converter is provided with N layers, and every layer is made of an attention model and a feedforward network,
In, N is positive integer;The feature vector comprising global information by the output of n-th layer feedforward network as described image to be detected.
Preferably, the location information of the bounding box includes the centre coordinate of the bounding box, length and width.
Preferably, the feature vector comprising location information includes the feature vector of people and the feature vector of object;
The preset classification of motion device is provided with action prediction device;The action prediction device is by everyone feature vector and each object
The feature vector of body is differentiated two-by-two, obtains the type of action between people and object.
The third aspect provides a kind of electronic equipment, which includes:
Processor, memory and bus;
The bus, for connecting the processor and the memory;
The memory, for storing operational order;
The processor, for by calling the operational order, executable instruction to execute processor such as the application
The corresponding operation of motion detection method shown in first aspect.
Fourth aspect provides a kind of computer readable storage medium, calculating is stored on computer readable storage medium
Machine program, the program realize motion detection method shown in the application first aspect when being executed by processor.
Technical solution provided by the present application has the benefit that
For image to be detected, based on preset object detection model, multiple first in image to be detected are detected
Then main object based on preset classification of motion model, is detected and is closed between multiple first main objects with the presence or absence of movement
Connection, if so, determining and marking the corresponding type of action of movement association.In this way, in compared to the prior art, it can only be in image
Relationship between pairs of personage and object detects, and the application can detect the phase in image between all persons and object
Interaction relationship.
Detailed description of the invention
In order to more clearly explain the technical solutions in the embodiments of the present application, institute in being described below to the embodiment of the present application
Attached drawing to be used is needed to be briefly described.
Fig. 1 is a kind of flow diagram for motion detection method that the application one embodiment provides;
Fig. 2A is the example figure of image to be detected in the application;
Fig. 2 B is the example figure with bounding box in the application;
Fig. 3 is a kind of structural schematic diagram for action detection device that the another embodiment of the application provides;
Fig. 4 is a kind of structural schematic diagram of the electronic equipment for motion detection that the another embodiment of the application provides.
Specific embodiment
Embodiments herein is described below in detail, examples of the embodiments are shown in the accompanying drawings, wherein from beginning to end
Same or similar label indicates same or similar element or element with the same or similar functions.Below with reference to attached
The embodiment of figure description is exemplary, and is only used for explaining the application, and is not construed as limiting the claims.
Those skilled in the art of the present technique are appreciated that unless expressly stated, singular " one " used herein, " one
It is a ", " described " and "the" may also comprise plural form.It is to be further understood that being arranged used in the description of the present application
Diction " comprising " refer to that there are the feature, integer, step, operation, element and/or component, but it is not excluded that in the presence of or addition
Other one or more features, integer, step, operation, element, component and/or their group.It should be understood that when we claim member
Part is " connected " or when " coupled " to another element, it can be directly connected or coupled to other elements, or there may also be
Intermediary element.In addition, " connection " used herein or " coupling " may include being wirelessly connected or wirelessly coupling.It is used herein to arrange
Diction "and/or" includes one or more associated wholes for listing item or any cell and all combinations.
To keep the purposes, technical schemes and advantages of the application clearer, below in conjunction with attached drawing to the application embodiment party
Formula is described in further detail.
Motion detection method, device, electronic equipment and computer readable storage medium provided by the present application, it is intended to solve existing
There is the technical problem as above of technology.
How the technical solution of the application and the technical solution of the application are solved with specifically embodiment below above-mentioned
Technical problem is described in detail.These specific embodiments can be combined with each other below, for the same or similar concept
Or process may repeat no more in certain embodiments.Below in conjunction with attached drawing, embodiments herein is described.
A kind of motion detection method is provided in one embodiment, as shown in Figure 1, this method comprises:
Step S101 detects multiple first main objects in image to be detected based on preset target detection model,
And obtain the bounding box of each first main object, wherein first main object includes people and object;
In practical applications, image to be detected is first inputted to preset object detection model, preset object detection mould
Type will detect that multiple main objects in image, and is labeled multiple main objects using bounding box.As shown in Figure 2 A,
Assuming that Fig. 2A is image to be detected, Fig. 2A is inputted to preset object detection model, preset object detection model output is such as
Image shown in Fig. 2 B has marked out 4 main objects, has been respectively: people, umbrella, bicycle, bicycle basket in fig. 2b.Than
Such as, school bag is carried for child in image to be detected, then the main object in the image is exactly child and school bag.Alternatively, can also
To be multiple personages and multiple objects, for example, having multiple child's queuing station in image to be detected together, each child carries on the back one
School bag, then the main object in the image is then all children and all school bags.
Further, image to be detected can be the picture of the formats such as picture, such as BMP, JPG, SWF, CDR, AI, can also be with
Be the image of the frame image in video or other forms, be suitable for the application, the application to this with no restriction.
It should be noted that can directly be illustrated in the picture, can also be illustrated by other forms, than
As illustrated then text and image to be inputted object detection model in the text together, can also carried out by other forms
Illustrate, in short, can illustrate between each main object and each main object in training image movement association and it is right
The form for the type of action answered, is suitable for the application, the application to this with no restriction.
Before the preset object detection model of training, a large amount of image can be first collected, then marks out the institute in image
There is the movement between main object and each main object, the image being marked is exactly training image, for example, in Fig. 2 B
In illustrate movement between each main object again, for example there is movement association between people and umbrella, and corresponding movement is " taking ",
Just there is no movement associations between people and bicycle basket, also just without corresponding movement, and so on, by all main objects
Between movement be all illustrated after, Fig. 2 B can be as training image.
For a width training image, training image is inputted to preset object detection model, which can be with
It is Faster R-CNN model, which uses RPN (region proposition network) to carry out prediction ROI (sense to the training image first
Interest region), then object classification is carried out on the basis of ROI, and return offset of the detection bounding box relative to ROI, to obtain
Then each width training image is all similarly operated by the object detection model after training image training, thus
It obtains through the object detection model after the training of a large amount of training images.
Step S102 obtains the feature vector of the regular length of image-region and each bounding box in each bounding box
Location information;
In a kind of preferred embodiment of the present invention, the feature for obtaining the regular length of image-region in each bounding box
The step of vector, specifically includes:
Feature extraction is carried out to image in each bounding box using trained convolutional neural networks model, is obtained each solid
The feature vector of measured length.
Specifically, being ROI Pooling (area-of-interest pond) to each bounding box detected, convolutional layer is connect,
The feature vector of the regular length of each bounding box is obtained, then detects the location information of each bounding box, wherein the bounding box
Location information include the centre coordinate of the bounding box, length and width (as unit of pixel).
Step S103, feature vector and the location information using the regular length, generates the packet of each bounding box
Feature vector containing location information;
It is described to be believed using the feature vector of the regular length and the position in a kind of preferred embodiment of the present invention
Breath, generates the feature vector comprising location information of each bounding box, comprising:
The feature vector of the regular length and the location information are spliced, the packet of each bounding box is obtained
Feature vector containing location information.
Specifically, the centre coordinate of detection bounding box and the length of bounding box and it is wide and, by each bounding box and its position
Confidence breath is spliced, and the feature vector f comprising location information of each bounding box is obtainedi∈R1×df, i is under bounding box
Mark.For example, it is assumed that 1) feature vector obtained in is ai, location information is (i.e. in first eigenvector a in splicingiStitching position afterwards
Information) obtain the feature vector f comprising location informationi=[ai,xi,yi,wi,hi], x, y are the coordinates of bounding box center,
W, h is the length and width of bounding box.
The feature vector comprising location information of each bounding box is inputted preset classification of motion model by step S104,
Determine the type of action between each first main object.
In a kind of preferred embodiment of the present invention, the feature vector comprising location information include the feature vector of people with
And the feature vector of object;The preset classification of motion device is provided with action prediction device and converter;
The action prediction device is differentiated everyone feature vector and the feature vector of each object two-by-two, is obtained
Type of action between people and object;
It is described that the feature vector comprising location information of each bounding box is inputted to preset classification of motion model, it determines
The step of type of action between each first main object, comprising:
The feature vector comprising location information of each bounding box is inputted into the converter, obtains described image to be detected
The feature vector comprising global information;
Based on the feature vector comprising global information, the type of action between each first main object is determined.
Wherein, the converter is provided with N layers, and every layer is made of an attention model and a feedforward network, wherein
N is positive integer;The feature vector comprising global information by the output of n-th layer feedforward network as described image to be detected.
Specifically, each feature vector comprising location information is inputted Transformer (converter).
Transformer structure is used from attention mechanism, includes two foundation structures: bull attention and forward direction net
Network.
When converter initialization, need first to provide being defined from attention for bull, wherein bull from attention
It is namely multiple from attention.
Defining an attention is a head, i.e.,
Wherein,It is input,It is parameter matrix.
Bull from attention, that is, multiple attention, that is, MultiHead (Q, K, V)=Concat (head1,…,
headh)Wo, hereIt is parameter matrix.
In order to use the second feature vector for combining multiple bounding boxes from attention mechanism of Transformer, first splice
The second feature vector of one bounding box obtains(it is based on multiple fiObtain F),
Pass through the available MultiHead of attention (F, F, F)=B of bull (i.e. input F obtains B) again, wherein be in square brackets by
For multiple second feature vectors by the set of column splicing, B is the feature of output.
A feedforward network, FFN (B)=max (0, BW are again1+b1)W2+b2=D (obtains D based on multiple B), is counting
Need to carry out broadcast operation (broadcasting in numpy), the feature vector inconsistent for dimension or square when calculation
When battle array carries out operation, the dimension of feature vector or matrix is expanded automatically, so that a feedforward network is obtained, in this way, one
A attention adds one layer of a feedforward network composition Transformer, and Transformer can have multilayer in practice, often
Layer is all made of an attention and a feedforward network.Wherein, b1And b2It is row vector.
That is, it is assumed that converter has five layers, and the input of the attention of first layer is F, and obtained result is as first
The input of a feedforward network, input of the output of first feedforward network as second layer attention, second layer attention it is defeated
Input as second layer feedforward network out, and so on, until the feedforward network of layer 5 exports result;
Further, the third feature vector comprising each main object in picture is exported (namely from Transformer
Say, 3) be for all main objects in image), then, judge that all persons are associated with object to the presence or absence of movement.Sentence
It is disconnected to use the logistic regression of two classification with the presence or absence of movement association.It extracts by 3) handling obtained feature, splices one
It rises and obtains [di,dj], input classifier pi,j=sigmoid (Wa[di,dj]+ba), it is that feature is spliced by row in square brackets, WaIt is
The matrix of a linear transformation, baIt is shift term, D=[d1;d2;…;dn], classification personage i is associated with object j in the presence of movement.
To the type of action being related in the presence of the associated personage of movement and object using the classification of third feature vector.
For example, it is assumed that the second main object includes a personage and an object, for the personage i and object j detected,
It extracts by 3) handling obtained feature, is stitched together to obtain [di,dj], input classifier pi,j,r=sigmoid (Wr[di,
dj]+br), it is that feature is spliced by row in square brackets, WrIt is the matrix of a linear transformation, brIt is shift term, D=[d1;d2;…;dn], point
The movement that class personage i and object j are related to because movement between personage and object may there are many, use logistic regression
The movement of prediction personage and object simultaneously uses cross entropy as loss function, total loss are as follows:
Wherein N be i, j, r traversal total item, such as
People and school bag, classification people is to hold school bag, still carries school bag, classification movement is " taking " or " back ".It is (long different from LSTM
Short-term memory network) can only sequence each bounding box of processing feature vector, cause the operation distance between bounding box different,
The application has used the converter based on attention mechanism, so that the connection distance between each bounding box is identical;
Further, optimize loss function using the optimization algorithm Adam declined based on gradient, training obtains classification of motion mould
Type.
It should be noted that the splicing in the application, refers to merging multiple parameters.For example, fisrt feature to
Amount is ai, location information in splicing obtains the feature vector f comprising location informationi=[ai,xi,yi,wi,hi], it is exactly by aiWith
Location information is fused to the (a i.e. in square brackets in a seti,xi,yi,wi,hi), certainly, this set can be array,
It can be matrix, can also be the set of other forms.
In embodiments of the present invention, image to be detected is detected to be detected based on preset target detection model
Multiple first main objects in image, and obtain the bounding box of each first main object, wherein the first main object includes
Then people and object obtain the position of the feature vector of the regular length of image-region and each bounding box in each bounding box
Confidence breath, then feature vector and the location information using regular length, generate each bounding box includes location information
The feature vector comprising location information of each bounding box is inputted preset classification of motion model by feature vector, is determined each
Type of action between a first main object.In this way, in compared to the prior art, it can only be to personage pairs of in image and object
Relationship between body is detected, and the application can detect the relationship in image between all main objects, especially owner
Interaction relationship between object and object.
Fig. 3 is a kind of structural schematic diagram for action detection device that the another embodiment of the application provides, as shown in figure 3, this
The device of embodiment may include:
Module of target detection 301, for based on preset target detection model, detecting multiple in image to be detected
One main object, and obtain the bounding box of each first main object, wherein first main object includes people and object;
Module 302 is obtained, for obtaining in each bounding box the feature vector of the regular length of image-region and each
The location information of bounding box;
Generation module 303, feature vector and the location information for the use regular length, generates each boundary
The feature vector comprising location information of frame;
Classification of motion module 304, for the feature vector comprising location information of each bounding box to be inputted preset move
Make disaggregated model, determines the type of action between each first main object.
In a kind of preferred embodiment of the present invention, the acquisition module is specifically used for, and calls trained convolutional Neural net
Network model carries out feature extraction to image in each bounding box, obtains the feature vector of each regular length.
In a kind of preferred embodiment of the present invention, the generation module is specifically used for, by the feature of the regular length to
Amount is spliced with the location information, obtains the feature vector comprising location information of each bounding box.
In a kind of preferred embodiment of the present invention, the preset classification of motion device is provided with converter;
The classification of motion module includes:
Input submodule is obtained for the feature vector comprising location information of each bounding box to be inputted the converter
To the feature vector comprising global information of described image to be detected;
Output sub-module, for based on the feature vector comprising global information, determine each first main object it
Between type of action.
In a kind of preferred embodiment of the present invention, the converter is provided with N layers, and every layer by an attention model and one
A feedforward network composition, wherein N is positive integer;By the output of n-th layer feedforward network as described image to be detected comprising complete
The feature vector of office's information.
In a kind of preferred embodiment of the present invention, the location information of the bounding box includes that the center of the bounding box is sat
Mark, length and width.
In a kind of preferred embodiment of the present invention, the feature vector comprising location information include the feature vector of people with
And the feature vector of object;The preset classification of motion device is provided with action prediction device;The action prediction device by everyone
Feature vector and the feature vector of each object differentiated two-by-two, obtain the type of action between people and object.
Motion detection method shown in the application one embodiment can be performed in the action detection device of the present embodiment, in fact
Existing principle is similar, and details are not described herein again.
A kind of electronic equipment is provided in the another embodiment of the application, which includes: memory and processor;
At least one program, is stored in memory, and when for being executed by processor, can realize compared with prior art: for be checked
The image of survey detects multiple first main objects in image to be detected based on preset target detection model, and obtains each
The bounding box of a first main object, wherein the first main object includes then people and object obtain figure in each bounding box
As the feature vector of the regular length in region and the location information of each bounding box, then use the feature vector of regular length
With the location information, the feature vector comprising location information of each bounding box is generated, what it is by each bounding box includes position
The feature vector of information inputs preset classification of motion model, determines the type of action between each first main object.This
Sample, compared to the prior art in, the relationship between personage pairs of in image and object can only be detected, the application can examine
Measure the interaction relationship between the relationship in image between all main objects, especially all persons and object.
A kind of electronic equipment is provided in one alternate embodiment, as shown in figure 4, electronic equipment shown in Fig. 4 4000
It include: processor 4001 and memory 4003.Wherein, processor 4001 is connected with memory 4003, such as passes through 4002 phase of bus
Even.Optionally, electronic equipment 4000 can also include transceiver 4004.It should be noted that transceiver 4004 in practical application
It is not limited to one, the structure of the electronic equipment 4000 does not constitute the restriction to the embodiment of the present application.
Processor 4001 can be CPU, general processor, DSP, ASIC, FPGA or other programmable logic device, crystalline substance
Body pipe logical device, hardware component or any combination thereof.It, which may be implemented or executes, combines described by present disclosure
Various illustrative logic blocks, module and circuit.Processor 4001 is also possible to realize the combination of computing function, such as wraps
It is combined containing one or more microprocessors, DSP and the combination of microprocessor etc..
Bus 4002 may include an access, and information is transmitted between said modules.Bus 4002 can be pci bus or
Eisa bus etc..Bus 4002 can be divided into address bus, data/address bus, control bus etc..Only to be used in Fig. 4 convenient for indicating
One thick line indicates, it is not intended that an only bus or a type of bus.
Memory 4003 can be ROM or can store the other kinds of static storage device of static information and instruction, RAM
Or the other kinds of dynamic memory of information and instruction can be stored, it is also possible to EEPROM, CD-ROM or other CDs
Storage, optical disc storage (including compression optical disc, laser disc, optical disc, Digital Versatile Disc, Blu-ray Disc etc.), magnetic disk storage medium
Or other magnetic storage apparatus or can be used in carry or store have instruction or data structure form desired program generation
Code and can by any other medium of computer access, but not limited to this.
Memory 4003 is used to store the application code for executing application scheme, and is held by processor 4001 to control
Row.Processor 4001 is for executing the application code stored in memory 4003, to realize aforementioned either method embodiment
Shown in content.
Wherein, electronic equipment includes but is not limited to: mobile phone, laptop, digit broadcasting receiver, PDA are (personal
Digital assistants), PAD (tablet computer), PMP (portable media player), car-mounted terminal (such as vehicle mounted guidance terminal) etc.
Deng mobile terminal and such as number TV, desktop computer etc. fixed terminal.
The another embodiment of the application provides a kind of computer readable storage medium, on the computer readable storage medium
It is stored with computer program, when run on a computer, computer is executed corresponding in preceding method embodiment
Content.Compared with prior art, image to be detected is detected based on preset target detection model for image to be detected
In multiple first main objects, and obtain the bounding box of each first main object, wherein the first main object include people and
Then object obtains the position of the feature vector of the regular length of image-region and each bounding box in each bounding box and believes
Breath, then feature vector and the location information using regular length, generate the feature comprising location information of each bounding box
The feature vector comprising location information of each bounding box is inputted preset classification of motion model by vector, determines each
Type of action between one main object.In this way, in compared to the prior art, can only to personage pairs of in image and object it
Between relationship detected, the application can detect the relationship in image between all main objects, especially all persons and
Interaction relationship between object.
It should be understood that although each step in the flow chart of attached drawing is successively shown according to the instruction of arrow,
These steps are not that the inevitable sequence according to arrow instruction successively executes.Unless expressly stating otherwise herein, these steps
Execution there is no stringent sequences to limit, can execute in the other order.Moreover, at least one in the flow chart of attached drawing
Part steps may include that perhaps these sub-steps of multiple stages or stage are not necessarily in synchronization to multiple sub-steps
Completion is executed, but can be executed at different times, execution sequence, which is also not necessarily, successively to be carried out, but can be with other
At least part of the sub-step or stage of step or other steps executes in turn or alternately.
The above is only some embodiments of the invention, it is noted that for the ordinary skill people of the art
For member, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications are also answered
It is considered as protection scope of the present invention.
Claims (10)
1. a kind of motion detection method characterized by comprising
Based on preset target detection model, multiple first main objects in image to be detected are detected, and obtain each
The bounding box of one main object, wherein first main object includes people and object;
Obtain the location information of the feature vector of the regular length of image-region and each bounding box in each bounding box;
Feature vector and the location information using the regular length, generate the spy comprising location information of each bounding box
Levy vector;
The feature vector comprising location information of each bounding box is inputted to preset classification of motion model, determines each first
Type of action between main object.
2. motion detection method according to claim 1, which is characterized in that described to obtain image-region in each bounding box
Regular length feature vector the step of specifically include:
Feature extraction is carried out to image in each bounding box using trained convolutional neural networks model, is obtained each fixed long
The feature vector of degree.
3. motion detection method according to claim 1, which is characterized in that the feature using the regular length to
Amount and the location information, generate the feature vector comprising location information of each bounding box, comprising:
The feature vector of the regular length and the location information are spliced, obtain each bounding box includes position
The feature vector of confidence breath.
4. motion detection method according to claim 1, which is characterized in that the preset classification of motion device, which is provided with, to be turned
Parallel operation;
It is described that the feature vector comprising location information of each bounding box is inputted to preset classification of motion model, it determines each
The step of type of action between first main object, comprising:
The feature vector comprising location information of each bounding box is inputted into the converter, obtains the packet of described image to be detected
Feature vector containing global information;
Based on the feature vector comprising global information, the type of action between each first main object is determined.
5. motion detection method according to claim 4, which is characterized in that the converter is provided with N layers, and every layer by one
A attention model and a feedforward network composition, wherein N is positive integer;Using the output of n-th layer feedforward network as it is described to
The feature vector comprising global information of detection image.
6. motion detection method according to claim 1, which is characterized in that the location information of the bounding box includes described
The centre coordinate of bounding box, length and width.
7. motion detection method according to claim 1, which is characterized in that the feature vector packet comprising location information
Include the feature vector of people and the feature vector of object;The preset classification of motion device is provided with action prediction device;
The action prediction device differentiated everyone feature vector and the feature vector of each object two-by-two, obtain people with
Type of action between object.
8. a kind of action detection device characterized by comprising
Module of target detection, for based on preset target detection model, detecting multiple first main bodys in image to be detected
Object, and obtain the bounding box of each first main object, wherein first main object includes people and object;
Module is obtained, for obtaining the feature vector of the regular length of image-region and each bounding box in each bounding box
Location information;
Generation module, feature vector and the location information for the use regular length, generates the packet of each bounding box
Feature vector containing location information;
Classification of motion module, for the feature vector comprising location information of each bounding box to be inputted to preset classification of motion mould
Type determines the type of action between each first main object.
9. a kind of electronic equipment, characterized in that it comprises:
Processor, memory and bus;
The bus, for connecting the processor and the memory;
The memory, for storing operational order;
The processor, for executing movement described in any one of the claims 1-7 by calling the operational order
Detection method.
10. a kind of computer readable storage medium, which is characterized in that the computer storage medium refers to for storing computer
It enables, when run on a computer, computer is allowed to execute movement described in any one of the claims 1 to 7
Detection method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910239759.8A CN109977872B (en) | 2019-03-27 | 2019-03-27 | Motion detection method and device, electronic equipment and computer readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910239759.8A CN109977872B (en) | 2019-03-27 | 2019-03-27 | Motion detection method and device, electronic equipment and computer readable storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109977872A true CN109977872A (en) | 2019-07-05 |
CN109977872B CN109977872B (en) | 2021-09-17 |
Family
ID=67081127
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910239759.8A Active CN109977872B (en) | 2019-03-27 | 2019-03-27 | Motion detection method and device, electronic equipment and computer readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109977872B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110378420A (en) * | 2019-07-19 | 2019-10-25 | Oppo广东移动通信有限公司 | A kind of image detecting method, device and computer readable storage medium |
CN110543877A (en) * | 2019-09-04 | 2019-12-06 | 北京迈格威科技有限公司 | Identification recognition method, training method and device of model thereof and electronic system |
CN111753730A (en) * | 2020-06-24 | 2020-10-09 | 国网电子商务有限公司 | Image examination method and device |
CN113632097A (en) * | 2021-03-17 | 2021-11-09 | 商汤国际私人有限公司 | Method, device, equipment and storage medium for predicting relevance between objects |
CN114120160A (en) * | 2022-01-25 | 2022-03-01 | 成都合能创越软件有限公司 | Object space distinguishing method and device based on fast-RCNN, computer equipment and storage medium |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105912991A (en) * | 2016-04-05 | 2016-08-31 | 湖南大学 | Behavior identification method based on 3D point cloud and key bone nodes |
CN106529467A (en) * | 2016-11-07 | 2017-03-22 | 南京邮电大学 | Group behavior identification method based on multi-feature fusion |
CN108492273A (en) * | 2018-03-28 | 2018-09-04 | 深圳市唯特视科技有限公司 | A kind of image generating method based on from attention model |
CN108647591A (en) * | 2018-04-25 | 2018-10-12 | 长沙学院 | Activity recognition method and system in a kind of video of view-based access control model-semantic feature |
CN108898067A (en) * | 2018-06-06 | 2018-11-27 | 北京京东尚科信息技术有限公司 | Determine the method, apparatus and computer readable storage medium of people and the object degree of association |
CN109241536A (en) * | 2018-09-21 | 2019-01-18 | 浙江大学 | It is a kind of based on deep learning from the sentence sort method of attention mechanism |
CN109271999A (en) * | 2018-09-06 | 2019-01-25 | 北京京东尚科信息技术有限公司 | Processing method, device and the computer readable storage medium of image |
-
2019
- 2019-03-27 CN CN201910239759.8A patent/CN109977872B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105912991A (en) * | 2016-04-05 | 2016-08-31 | 湖南大学 | Behavior identification method based on 3D point cloud and key bone nodes |
CN106529467A (en) * | 2016-11-07 | 2017-03-22 | 南京邮电大学 | Group behavior identification method based on multi-feature fusion |
CN108492273A (en) * | 2018-03-28 | 2018-09-04 | 深圳市唯特视科技有限公司 | A kind of image generating method based on from attention model |
CN108647591A (en) * | 2018-04-25 | 2018-10-12 | 长沙学院 | Activity recognition method and system in a kind of video of view-based access control model-semantic feature |
CN108898067A (en) * | 2018-06-06 | 2018-11-27 | 北京京东尚科信息技术有限公司 | Determine the method, apparatus and computer readable storage medium of people and the object degree of association |
CN109271999A (en) * | 2018-09-06 | 2019-01-25 | 北京京东尚科信息技术有限公司 | Processing method, device and the computer readable storage medium of image |
CN109241536A (en) * | 2018-09-21 | 2019-01-18 | 浙江大学 | It is a kind of based on deep learning from the sentence sort method of attention mechanism |
Non-Patent Citations (2)
Title |
---|
MING CHEN ET AL: "TVT: Two-View Transformer Network for Video Captioning", 《PROCEEDINGS OF MACHINE LEARNING RESEARCH》 * |
YU-WEI CHAO等: "Learning to Detect Human-Object Interactions", 《2018 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION》 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110378420A (en) * | 2019-07-19 | 2019-10-25 | Oppo广东移动通信有限公司 | A kind of image detecting method, device and computer readable storage medium |
CN110543877A (en) * | 2019-09-04 | 2019-12-06 | 北京迈格威科技有限公司 | Identification recognition method, training method and device of model thereof and electronic system |
CN111753730A (en) * | 2020-06-24 | 2020-10-09 | 国网电子商务有限公司 | Image examination method and device |
CN113632097A (en) * | 2021-03-17 | 2021-11-09 | 商汤国际私人有限公司 | Method, device, equipment and storage medium for predicting relevance between objects |
US11941838B2 (en) | 2021-03-17 | 2024-03-26 | Sensetime International Pte. Ltd. | Methods, apparatuses, devices and storage medium for predicting correlation between objects |
CN113632097B (en) * | 2021-03-17 | 2024-07-19 | 商汤国际私人有限公司 | Method, device, equipment and storage medium for predicting relevance between objects |
CN114120160A (en) * | 2022-01-25 | 2022-03-01 | 成都合能创越软件有限公司 | Object space distinguishing method and device based on fast-RCNN, computer equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN109977872B (en) | 2021-09-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109977872A (en) | Motion detection method, device, electronic equipment and computer readable storage medium | |
Zheng et al. | Cross-domain object detection through coarse-to-fine feature adaptation | |
Zang et al. | Attention-based temporal weighted convolutional neural network for action recognition | |
CN112016475B (en) | Human body detection and identification method and device | |
CN110276253A (en) | A kind of fuzzy literal detection recognition method based on deep learning | |
CN107506707A (en) | Using the Face datection of the small-scale convolutional neural networks module in embedded system | |
CN104412298B (en) | Method and apparatus for changing image | |
Zhai et al. | Group-split attention network for crowd counting | |
CN109829421B (en) | Method and device for vehicle detection and computer readable storage medium | |
WO2022205329A1 (en) | Object detection method, object detection apparatus, and object detection system | |
Zhang et al. | Drone-based RGBT tiny person detection | |
Xu et al. | Tiny FCOS: A lightweight anchor-free object detection algorithm for mobile scenarios | |
Peng et al. | DRPN: Making CNN dynamically handle scale variation | |
Liu et al. | Siamese network with bidirectional feature pyramid for small target tracking | |
Gupta et al. | VehiPose: a multi-scale framework for vehicle pose estimation | |
CN116682076A (en) | Multi-scale target detection method, system and equipment for ship safety supervision | |
Wanchaitanawong et al. | Multi-modal pedestrian detection with misalignment based on modal-wise regression and multi-modal IoU | |
Feng et al. | RTDOD: A large-scale RGB-thermal domain-incremental object detection dataset for UAVs | |
Zhao et al. | Faster object detector for drone-captured images | |
Kandula et al. | Deep end-to-end rolling shutter rectification | |
Luo et al. | Spatial–temporal interaction module for action recognition | |
Chen et al. | MIDFA: Memory-Based Instance Division and Feature Aggregation Network for Video Object Detection | |
Yan et al. | Object tracking based on Siamese networks and attention mechanism | |
Shen et al. | Vehicle detection method based on adaptive multi-scale feature fusion network | |
Xu et al. | CFM-YOLOv5: CFPNet moudle and muti-target prediction head incorporating YOLOv5 for metal surface defect detection |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |