CN110084313A - A method of generating object detection model - Google Patents
A method of generating object detection model Download PDFInfo
- Publication number
- CN110084313A CN110084313A CN201910369470.8A CN201910369470A CN110084313A CN 110084313 A CN110084313 A CN 110084313A CN 201910369470 A CN201910369470 A CN 201910369470A CN 110084313 A CN110084313 A CN 110084313A
- Authority
- CN
- China
- Prior art keywords
- detection model
- object detection
- frame
- prediction
- training
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a kind of methods for generating object detection model, comprising: obtains the training image comprising labeled data, labeled data is the position of target object and classification in training image;It will be handled in the object detection model of training image input pre-training, the object detection model includes the characteristic extracting module and prediction module being mutually coupled, wherein, characteristic extracting module includes depth residual error network unit and convolution processing unit, suitable for carrying out process of convolution to training image, to generate at least one characteristic pattern;Prediction module is suitable for predicting classification and the position of target object from least one characteristic pattern;Object category and position based on labeled data and prediction are trained the object detection model of pre-training, using the object detection model after being trained as object detection model generated.
Description
Technical field
The present invention relates to technical field of computer vision more particularly to a kind of methods for generating object detection model, object
Detection method calculates equipment and storage medium.
Background technique
Object detection is the basis of many Computer Vision Tasks, one known in input picture suitable for positioning and identifying
Or multiple targets, be generally applied to scene content understanding, video monitoring, content-based image retrieval, robot navigation and
The fields such as augmented reality.
Traditional object detecting method generally divides three phases: firstly, candidate frame region is extracted, using sliding window to whole
Width image traversal obtains the position that object is likely to occur;Then, to the candidate frame extracted region feature that these are extracted, common side
Method has SIFT (Scale invariant features transform), HOG (histograms of oriented gradients) etc.;Finally, feature input classifier is divided
Class, common classifier have SVM (support vector machines), Adaboost (iterative algorithm) etc..Traditional object detecting method time
Complexity is high, and window redundancy needs manual designs feature, and variation robustness multifarious to object is low.
The object detection method based on deep learning achieves important progress in recent years.Main stream approach is broadly divided into two
Type: one kind is test problems to be divided into two stages, firstly, passing through inspiration based on the two-part algorithm of region nomination
Formula method generates a series of sparse candidate frames, and then these candidate frames are classified and returned.Typically there is R-CNN (base
Convolutional neural networks in region), SPPNet (spatial pyramid pond network) and various improved R-CNN serial algorithms
Deng.This mode accuracy in detection is higher, but calculating speed is slower.One is unistage type algorithms end to end, that is, do not need
The extracted region stage directly generates the class probability and position coordinates of object.By equably being carried out in the different location of picture
Intensive sampling can use different scale and length-width ratio when sampling, then be extracted after feature directly using convolutional neural networks
Classified and is returned.Typically there are YOLO, SSD etc..It is fast that this mode detects speed, but accuracy rate is lower.
Therefore, it is necessary to a kind of object detecting methods, and the calculating speed of model can be improved while reducing model size
And accuracy rate.
Summary of the invention
For this purpose, the present invention provides a kind of method for generating object detection model, to try hard to solve or at least in alleviation
At least one problem existing for face.
According to an aspect of the invention, there is provided a kind of method for generating object detection model, this method is suitable for counting
It calculates and is executed in equipment, comprising: firstly, obtaining the training image comprising labeled data, labeled data is object in training image
The position of body and classification.Then, it will be handled in the object detection model of training image input pre-training, object detection model
Including the characteristic extracting module and prediction module being mutually coupled, wherein characteristic extracting module include depth residual error network unit and
Convolution processing unit is suitable for carrying out process of convolution to training image, to generate at least one characteristic pattern;Prediction module be suitable for to
Classification and the position of target object are predicted in a few characteristic pattern.Finally, object category based on labeled data and prediction and
Position is trained the object detection model of pre-training, using the object detection model after being trained as object generated
Body detection model.
Optionally, in the above-mentioned methods, it is 3*3 that depth residual error network unit, which includes multiple convolution kernel sizes being mutually coupled,
Process of convolution layer and jump articulamentum, jump articulamentum be suitable for will be mutually coupled two process of convolution layers output characteristic pattern
The output of phase adduction.
Optionally, in the above-mentioned methods, process of convolution layer includes convolutional layer, batch normalized layer and active coating,
In, batch normalizes layer and is merged into convolutional layer.
Optionally, in the above-mentioned methods, prediction module includes class prediction unit and position prediction unit, class prediction list
Member is suitable for exporting the classification confidence level of each object in image, and position prediction unit, which is suitable for exporting, predicts target object in image
Position.
Optionally, in the above-mentioned methods, the position of the target object of mark is the characteristic point coordinate or true of target object
Object frame.
Optionally, in the above-mentioned methods, prediction module further includes candidate frame generation unit and candidate frame matching unit.It is candidate
Each characteristic pattern that frame generation unit is suitable for exporting characteristic extracting module generates corresponding according to different sizes and length-width ratio
Multiple candidate frames, candidate frame matching unit is suitable for choosing and the matched candidate frame of real-world object frame, to be based on matched candidate
Frame is predicted.
Optionally, in the above-mentioned methods, determine between the real-world object frame position based on mark and prediction object frame position
Classification confidence level penalty values between bit-loss value and the classification and prediction classification confidence level of mark, update object detection model
Parameter, when the weighted sum of the positioning penalty values and classification confidence level penalty values meets predetermined condition, training terminates.
Optionally, in the above-mentioned methods, based on following formula calculate positioning penalty values and classification confidence level penalty values plus
Quan He:
Wherein, LlocTo position penalty values, LconfFor classification confidence penalty values, N is the quantity of matched candidate frame, and α is
Weight coefficient, g are the positions of real-world object frame, and l is the position for predicting object frame, and x is the classification of mark, and C is classification confidence level.
Optionally, in the above-mentioned methods, positioning penalty values are calculated based on following formula:
Wherein, i is the serial number for predicting object frame, and j is the serial number of real-world object frame, and cx, cy are the center of candidate frame, w, h
For the width and height of candidate frame, m indicates the size of candidate frame,For i-th of prediction object frame and j-th of real-world object frame
Between position deviation, Pos indicates the quantity of positive sample candidate frame in training image, and N indicates the quantity of matched candidate frame,Indicate whether i-th of prediction object frame matches with j-th of real-world object frame about classification k.
Optionally, in the above-mentioned methods, classification confidence level penalty values are calculated based on following formula:
Wherein, i is the serial number for predicting object frame, and j is the serial number of real-world object frame, and N indicates the quantity of matched candidate frame,
Pos indicates the quantity of the positive sample candidate frame in training image, and Neg indicates the quantity of the negative sample candidate frame in training image,Indicate the ratio that prediction classification is p,Indicate that i-th of prediction object frame corresponds to the classification confidence of classification p,It indicates
Whether i-th of prediction object frame matches with j-th of real-world object frame about classification p.
Optionally, in the above-mentioned methods, the object detection model of pre-training is generated based on image data set, wherein image
Include at least the image of each object category in training image in data set, the object category in training image include cat face,
Dog face, face and background.
Optionally, in the above-mentioned methods, data enhancing processing and normalized are carried out to training image.
Optionally, in the above-mentioned methods, data enhancing processing include overturning, rotation, color jitter, random cropping, at random
Brightness adjustment, random comparison are to any one of adjustment, Fuzzy Processing or multinomial.
According to a further aspect of the present invention, a kind of object detecting method is provided, image to be detected can be inputted into object
In detection model, to obtain the position of each object frame and classification in image, wherein object detection model is using as described above
Method generates.
According to another aspect of the invention, a kind of calculating equipment is provided, comprising: one or more processors;And storage
Device;One or more programs, wherein one or more programs store in memory and are configured as being handled by one or more
Device executes, and one or more programs include the instruction for either executing in method as described above method.
In accordance with a further aspect of the present invention, a kind of computer-readable storage medium for storing one or more programs is provided
Matter, one or more programs include instruction, and instruction is when calculating equipment execution, so that calculating equipment executes method as described above
In either method.
Scheme according to the present invention, object detection model include the characteristic extracting module and prediction module being mutually coupled, each mould
The convolutional layer of block uses less port number, reduces the size of model.Further, object detection model uses depth residual error
Low-level image feature can be fused to upper one layer, improve the accuracy and speed of model inspection by network unit.Therefore, this programme institute
The object detection model of offer can either match the computational efficiency and memory of mobile terminal, and can satisfy wanting for object detection precision
It asks.
Detailed description of the invention
To the accomplishment of the foregoing and related purposes, certain illustrative sides are described herein in conjunction with following description and drawings
Face, these aspects indicate the various modes that can practice principles disclosed herein, and all aspects and its equivalent aspect
It is intended to fall in the range of theme claimed.Read following detailed description in conjunction with the accompanying drawings, the disclosure it is above-mentioned
And other purposes, feature and advantage will be apparent.Throughout the disclosure, identical appended drawing reference generally refers to identical
Component or element.
Fig. 1 shows the organigram according to an embodiment of the invention for calculating equipment 100;
Fig. 2 shows the structural schematic diagrams of object detection model 200 according to an embodiment of the invention;
Fig. 3 shows the schematic network structure of depth residual error network unit 300 according to an embodiment of the invention;
Fig. 4 shows the schematic stream of the method 400 according to an embodiment of the invention for generating object detection model
Cheng Tu;
Fig. 5 shows the schematic diagram of the training image according to an embodiment of the invention comprising labeled data;
Fig. 6 shows the schematic diagram of image data enhancing processing according to an embodiment of the invention.
Specific embodiment
Exemplary embodiments of the present disclosure are described in more detail below with reference to accompanying drawings.Although showing the disclosure in attached drawing
Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth here
It is limited.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be by the scope of the present disclosure
It is fully disclosed to those skilled in the art.
Object detection is position and the classification in order to go out objects in images with collimation mark.Based on SSD object detection model not
With being identified under the characteristic pattern of level, more ranges can be covered, generally, SSD object detection model includes the basis VGG
Network and pyramid network have 16 layers or 19 layers, keep the parameter amount of model larger, nothing since VGG has deeper network structure
Method meets the requirement of mobile terminal.In order to realize real-time object detection, model is made to meet the requirement of mobile end memory and calculating speed,
This programme improves the network structure of SSD object detection model, to reduce the size of model, improve and detection accuracy and improve
Calculating speed can satisfy the real-time object detection in mobile terminal.
Fig. 1 is the block diagram of Example Computing Device 100.In basic configuration 102, calculating equipment 100, which typically comprises, is
System memory 106 and one or more processor 104.Memory bus 108 can be used for storing in processor 104 and system
Communication between device 106.
Depending on desired configuration, processor 104 can be any kind of processor, including but not limited to: micro process
Device (μ P), microcontroller (μ C), digital information processor (DSP) or any combination of them.Processor 104 may include all
Cache, processor core such as one or more rank of on-chip cache 110 and second level cache 112 etc
114 and register 116.Exemplary processor core 114 may include arithmetic and logical unit (ALU), floating-point unit (FPU),
Digital signal processing core (DSP core) or any combination of them.Exemplary Memory Controller 118 can be with processor
104 are used together, or in some implementations, and Memory Controller 118 can be an interior section of processor 104.
Depending on desired configuration, system storage 106 can be any type of memory, including but not limited to: easily
The property lost memory (RAM), nonvolatile memory (ROM, flash memory etc.) or any combination of them.System storage
Device 106 may include operating system 120, one or more is using 122 and program data 124.In some embodiments,
It may be arranged to be operated using program data 124 on an operating system using 122.In some embodiments, equipment is calculated
100 are configured as executing the method 400 for generating object detection model, just contain in program data 124 for executing method 400
Instruction.
Calculating equipment 100 can also include facilitating from various interface equipments (for example, output equipment 142, Peripheral Interface
144 and communication equipment 146) to basic configuration 102 via the communication of bus/interface controller 130 interface bus 140.Example
Output equipment 142 include graphics processing unit 148 and audio treatment unit 150.They can be configured as facilitate via
One or more port A/V 152 is communicated with the various external equipments of such as display or loudspeaker etc.Outside example
If interface 144 may include serial interface controller 154 and parallel interface controller 156, they, which can be configured as, facilitates
Via one or more port I/0 158 and such as input equipment (for example, keyboard, mouse, pen, voice-input device, image
Input equipment) or the external equipment of other peripheral hardwares (such as printer, scanner etc.) etc communicated.Exemplary communication is set
Standby 146 may include network controller 160, can be arranged to convenient for via one or more communication port 164 and one
A or multiple other calculate communication of the equipment 162 by network communication link.
Network communication link can be an example of communication media.Communication media can be usually presented as in such as carrier wave
Or computer readable instructions, data structure, program module in the modulated data signal of other transmission mechanisms etc, and can
To include any information delivery media." modulated data signal " can be such signal, one in its data set or
Multiple or its change can be carried out in a manner of encoded information in the signal.As unrestricted example, communication media
It may include the wired medium of such as cable network or private line network etc, and such as sound, radio frequency (RF), microwave, red
Various wireless mediums including (IR) or other wireless mediums outside.Term computer-readable medium used herein may include
Both storage medium and communication media.In some embodiments, one or more programs are stored in computer-readable medium, this
It include the instruction for executing certain methods in a or multiple programs.
Calculating equipment 100 can be implemented as a part of portable (or mobile) electronic equipment of small size, these electronics are set
It is standby to can be such as cellular phone, digital camera, personal digital assistant (PDA), personal media player device, wireless network
Browsing apparatus, personal helmet, application specific equipment or may include any of the above function mixing apparatus.Certainly, it counts
Calculate equipment 100 and also can be implemented as include desktop computer and notebook computer configuration personal computer, or have
The server of above-mentioned configuration.Embodiments of the present invention to this with no restriction.
First before model training, need to be configured the network structure and parameter of model.Fig. 2 shows according to this
The structural schematic diagram of the object detection model 200 of one embodiment of invention.As shown in Fig. 2, object detection model 200 includes phase
The characteristic extracting module 210 and prediction module 220 mutually coupled.Wherein, characteristic extracting module 210 includes depth residual error network unit
And convolution processing unit, it is suitable for carrying out process of convolution to input picture, to generate at least one characteristic pattern.Prediction module 220 is wrapped
Include candidate frame generation unit 221 and candidate frame matching unit 222, class prediction unit 223 and position prediction unit 224.It is candidate
Each characteristic pattern that frame generation unit 221 is suitable for exporting characteristic extracting module 210 is generated according to different sizes and length-width ratio
Corresponding multiple candidate frames.Candidate frame matching unit 222 be suitable for choose with the matched candidate frame of real-world object frame, so as to based on
The candidate frame matched is predicted.Class prediction unit 223 is suitable for exporting the classification confidence level of each object in image, position prediction
Unit 224 is suitable for exporting the position that object frame is predicted in image.
For object detection model, if simply increasing depth, it may appear that the phenomenon that accuracy rate of network declines.Error
Raised the phenomenon that gradient disappears, is just more obvious the reason is that network is deeper, can not effective handle so when back-propagating
To the network layer of front, forward network layer parameter can not update gradient updating, and trained and test effect is caused to be deteriorated.If
Network design is H (x)=F (x)+x, so that it may as long as being converted to study one residual error function F (x)=- x. F (x)=0 H (x),
Just constitute identical mapping H (x)=x.Depth residual error network can increase an identical mapping, and current output is directly passed
It is defeated by next layer network, a shortcut has been equivalent to away, has skipped this layer of operation, this is directly connected to be named as " the company of jumping
Connect ", while it being directly passed to a upper layer network during back-propagating, and by the gradient of next layer network, thus solve
The gradient disappearance problem of deep layer of having determined network.
According to one embodiment of present invention, characteristic extracting module can use ResNet depth residual error network unit.Table 1
Show the subnetwork parameter of characteristic extracting module 210 according to an embodiment of the invention.Wherein, number Conv_1,
Layer_19_2_2, layer_19_2_3, layer_19_2_4, layer_19_2_5 are convolution processing units, number conv1,
Conv2, Conv2_sum constitute a depth residual error network unit, and conv_3, conv_4, Conv_4_sum constitute a depth
Residual error network unit, conv_5, conv_6, Conv_6_sum one depth residual error network unit of composition, conv_7, conv_8,
Conv_8_sum constitutes a depth residual error network unit, and conv_9, conv_10, Conv_10_sum constitute a depth residual error
Network unit.
In table 1, Conv is convolutional layer, and BN is batch normalization layer, and it is active coating that ReLU, which indicates activation primitive,.Sum is to jump
Turn articulamentum.Kh, kw respectively indicate the height and width of convolution kernel, and padding is Filling power, and stride is convolution step-length, num_
Output indicates the quantity of the matched candidate frame of output, and group indicates grouping convolution, and group=1 expression is not grouped.
The subnetwork parameter of 1 characteristic extracting module of table
As shown in Table 1, characteristic extracting module includes multiple ResNet units and convolution processing unit.Each ResNet is mono-
Member includes the process of convolution layer that two convolution kernel sizes being mutually coupled are 3*3 and jumps articulamentum, and process of convolution layer includes volume
Lamination, batch normalization layer and active coating.In neural metwork training network model, batch, which normalizes layer, can speed up network receipts
It holds back, and the generation of over-fitting can be controlled, can effectively solve the problem that gradient disappears and gradient explosion issues are generally placed upon volume
After lamination, before active coating.Although BN layers training when play positive effect, however, before network to infer when more than
Some layers of operation influences the performance of model, and occupies more memory or video memory space.It therefore, can be by batch normalizing
Change layer and be merged into convolutional layer, can be improved the calculating speed of model in this way, to be suitable for the real-time object detection of mobile terminal.Activation
Layer uses ReLU activation primitive, can also use any type of activation primitives such as leakyReLU, tanh, sigmoid,
This is without limitation.Feature is extracted using various sizes of convolution kernel in lightweight convolution unit, while two outputs being connected
It is connected to together, it can be with lifting feature dimension.For example, conv1 and convolution kernel size that convolution kernel size is 3*3 by Conv2_sum
It is attached for the conv2 of the 3*3 characteristic pattern exported.
As described above, each process layer can export corresponding characteristic pattern in characteristic extracting module 210, according to this hair
Bright embodiment, from wherein extracting at least one characteristic pattern for being processed into output, for prediction module 220 carry out position and
Class prediction.In one embodiment, as shown in table 1, extracting its middle layer number is conv_8_sum, Conv_1, layer_
The characteristic pattern that 6 process layers of 19_2_2, layer_19_2_3, layer_19_2_4, layer_19_2_5 are exported.
Depth residual error network unit realizes that short circuit connection can't increase additional to network by way of jumping connection
Parameter and calculation amount, while can but greatly increase the training speed of model, improve training effect, and when the number of plies of model adds
When deep, residual error structure can be good at solving degenerate problem.Fig. 3 shows depth residual error according to an embodiment of the invention
The schematic diagram of network unit 300.As shown in figure 3, depth residual error network unit includes multiple convolution kernels being mutually coupled
Size is the process of convolution layer of 3*3 and jumps articulamentum, jumps two process of convolution layers that articulamentum is suitable for be mutually coupled
Characteristic pattern is added.For residual error network, the matched short circuit of dimension is connected as solid line connection, otherwise connects for dotted line.Dimension is not
Timing, there are two types of optinal plans for same mapping: directly by increasing dimension with 0 filling.
Prediction module 220 may include class prediction unit 223 and position prediction unit 224.Table 2 and table 3 are shown respectively
The network parameter of position prediction unit according to an embodiment of the invention and class prediction unit.According to the present invention one
A embodiment, prediction module 220 further include candidate frame generation unit 231 and candidate frame matching unit 222, and wherein candidate frame generates
Each characteristic pattern that unit is suitable for exporting characteristic extracting module 210 generates corresponding multiple according to different sizes and length-width ratio
Candidate frame.Candidate frame matching unit be suitable for choose with the matched candidate frame of real-world object frame, so as to based on matched candidate frame into
Row prediction.
The network parameter of 2 position prediction unit of table
The subnetwork parameter of 3 class prediction unit of table
Wherein, mbox block is candidate frame and real-world object frame from the characteristic pattern extracted in characteristic extracting module
The candidate frame matched.Concat layers of effect is exactly to splice two or more characteristic patterns on channel dimension, will be with big
Small characteristic pattern is stitched together.Table 4 shows the network parameter of candidate frame generation unit according to an embodiment of the invention.
Wherein PriorBox indicates that the candidate frame generated, aspect_ratio indicate to generate the length-width ratio of candidate frame, and min_size makes a living
At the smallest dimension of candidate frame, max_size is the out to out for generating candidate frame.
The network parameter of 4 candidate frame generation unit of table
In the training process, it first has to determine that the real-world object frame in training picture is matched with which candidate frame,
Matching candidate frame is responsible for predicting true frame.Table 5 shows the network parameter of candidate frame matching unit.Wherein, Permute
Layer can reset the dimension of input according to mould-fixed.Flatten layers can be by input " pressing ", i.e., the defeated of multidimensional
Enter one-dimensional.Prediction module finally integrates the prediction output of 6 characteristic patterns.The row of order expression matching candidate frame
Sequence, axis:1 are indicated using 1 value along each row or column label mould to executing corresponding method.
The network parameter of 5 candidate frame matching unit of table
After completing the setting of network structure and parameter of model, the generation object detection model of this programme can be executed
Method.Fig. 4 shows the schematic flow of the method 400 according to an embodiment of the invention for generating object detection model
Figure.Wherein object detection model may include that (structure about model can join for characteristic extracting module, Fusion Module and prediction module
It examines and is described above, details are not described herein again).This method can execute in calculating equipment 100, as shown in figure 4, this method 400 begins
In step S410.
It according to some embodiments of the present invention, can be first to constructed object detection mould before executing step S410
Type carries out pre-training.According to one embodiment of present invention, image data the set pair analysis model can be primarily based on and carry out pre-training, with
Just the parameter for initializing object detection model, that is, generate the object detection model of pre-training.For example, image data set can be
VOC data set includes 20 catalogues: the mankind in data set;Animal (bird, cat, ox, dog, horse, sheep);The vehicles (aircraft, from
Driving, ship, bus, car, motorcycle, train);Indoor (bottle, chair, dining table, potted plant, sofa, TV).
It also needs to consider background when using VOC data set training pattern, it is therefore desirable to the model of 21 classifications of training.For different
Layer can initialize 4 classifications (cat faces, dog face, people of the invention with the biggish weighted value of weighted value in the corresponding layer of modulus type
Face, background) object detection model.By the method for this pre-training, model convergence rate can be accelerated, while improving model
Detection accuracy.The COCO data set that Microsoft can also be used to provide carries out the pre-training of model, and wherein COCO data set has 3 kinds of marks
Infuse type: object instance, target critical point and iamge description can be advantageously applied to object detection.This programme is to picture number
According to collection using without limitation.
In step S410, the training image comprising labeled data is obtained, labeled data is target object in training image
Position and classification.The position of real-world object frame can be gone out with Direct Mark, object frame can also be calculated by the characteristic point of mark
Position.This programme to the mask method of labeled data without limitation.
Fig. 5 shows the schematic diagram of the training image according to an embodiment of the invention comprising labeled data.Such as Fig. 5 institute
Show, in order to detect the cat in picture, dog, face, the frame of each examined object first in mark picture, then in frame
Object marks out classification (also needing in model training plus background classification).For the ease of display, in each object in Fig. 5
The classification of target object: cat, dog, face has been marked out beside frame.Cat face classification can also be labeled as to 1, dog face classification mark
Note is 2, and face classification is labeled as 3, and background classification is labeled as 0.Another implementation according to the present invention, simultaneously for one
Comprising cat face, dog face, face image, cat face characteristic point, dog face characteristic point and human face characteristic point can be marked first, in total 30
A characteristic point (quantity of characteristic point mark can be adjusted as the case may be) and the class label for marking each object.Example
Such as, cat face is labeled as 1, and dog face is labeled as 2, and face is labeled as 3, and background is labeled as 0.It can be based on the characteristic point coordinate of mark
Calculate the position of real-world object frame.For example, obtaining the maximum value and minimum value of all characteristic point coordinates, respectively xmin, xmax, ymin,
ymax.So the coordinate of object frame is (xmin, ymin, w, h), w=xmax-xmin, h=ymax-ymin。
According to one embodiment of present invention, in the input layer of model, training image can also be pre-processed, it can be with
Enhance processing and normalized including data.In order to detect the object under various natural scenes, guarantee the effective of model
Training can carry out data extending or enhancing to training image.By to picture Random-Rotation, random brightness, setting contrast
And Fuzzy Processing etc., to simulate the image data under various natural scenes.Fig. 6 is shown according to one embodiment of present invention
Image data enhancing processing schematic diagram.As shown in fig. 6, be from left to right followed successively by rotation, dim, lighten, enhance contrast,
Fuzzy Processing.In addition, it can include overturning (horizontally or vertically), change of scale (adjustment image resolution ratio), take at random (
Image block is taken in original image at random), color jitter (slight noise is added to original pixel Distribution value) etc., complicated data expand
Filling method, there are also GAN generation confrontation network generation, principal component analysis, supervised to take and (only take the figure of obvious semantic information
As block) etc..
It should be noted that not all data enhancement methods can be used at will, such as facial image
Flip vertical is carried out with regard to improper.In data enhancing, it is also necessary to which image data and flag data are synchronized expansion, example
Such as Image Reversal or rotation, corresponding mark coordinate accordingly will overturn or rotate.Due to the size of real image be it is unfixed,
If changing the size of image, the markup information of image is with regard to incorrect, so simultaneously to the size modification of image,
Corresponding variation is done to markup information.The mark of image can be cut according to the original size of image and the ratio of markup information
Infuse the corresponding image of information.
Then in the step s 420, it will be handled in the object detection model of training image input pre-training, wherein object
Body detection model includes the characteristic extracting module and prediction module being mutually coupled, wherein characteristic extracting module includes multiple depth
Residual error network unit is suitable for carrying out process of convolution to training image, to generate at least one characteristic pattern.Prediction module be suitable for to
Classification and the position of target object are predicted in a few characteristic pattern.
Finally in step S430, object category and position based on labeled data and prediction examine the object of pre-training
It surveys model to be trained, using the object detection model after being trained as object detection model generated.
According to one embodiment of present invention, can real-world object frame position based on mark and prediction object frame position it
Between positioning penalty values and mark classification and prediction classification confidence level between classification confidence level penalty values, update object detection
The parameter of model, when the weighted sum until positioning penalty values and classification confidence level penalty values meets predetermined condition, training terminates.?
In an implementation of the invention, for location error, it can be calculated using Smooth loss function, confidence level is missed
Difference can be calculated using softmax loss function.
The weighted sum of positioning penalty values and classification confidence level penalty values can be calculated based on following formula:
Wherein, LlocTo position penalty values, LconfFor classification confidence penalty values, N is and the matched candidate of real-world object frame
The quantity of frame, α are weight coefficient, and weight coefficient can be set to 1.G is the location parameter of real-world object frame, and l is prediction object
The location parameter of frame, x are the classification of mark, and c is classification confidence level.
The positioning penalty values can be calculated based on following formula:
Wherein, i is the serial number for predicting object frame, and j is the serial number of real-world object frame, and cx, cy are the center of candidate frame, w, h
For the width and height of candidate frame, m indicates the size of candidate frame,For i-th of prediction object frame and j-th of real-world object frame
Between position deviation, Pos indicates the quantity of positive sample candidate frame in training image, and N indicates the quantity of matched candidate frame,Indicate whether i-th of prediction object frame matches with j-th of real-world object frame about classification k, matching is 1, and mismatching is 0.
Since the gradient of the error in deep-neural-network can add up to be multiplied in the updating, if the gradient between network layer
Value is greater than 1, then repeating to be multiplied will lead to gradient and be exponentially increased, and causing network weight significantly to update makes network become shakiness
It is fixed.Therefore it is lost using mean square deviation when predicted value is differed with true value less than 1 using smooth loss function and adds 0.5
Smoothing factor then reduces loss power, at this moment backpropagation derivation is not just deposited when predicted value and true value are differed by more than equal to 1
At this, so as to solve the problems, such as that gradient is exploded.
In the training process, it first has to determine that the real-world object frame in training picture is matched with which candidate frame,
Matching candidate frame will be responsible for predicting it.Candidate frame and true frame matching principle mainly have two o'clock.First principle is: right
Each true frame in picture finds and hands over it and than maximum candidate frame, then the candidate frame is matched.Second principle
Be: for remaining not matched candidate frame, if hand over and compare be greater than some threshold value (usually 0.5), then the candidate frame also with
This true frame matches.After candidate frame matching step, most of candidate frames are all negative samples, this leads to positive sample and bears
Imbalance between sample.In order to guarantee that positive negative sample balances as far as possible, negative sample can be sampled, according to confidence when sampling
It spends error (confidence level of projected background is smaller, and error is bigger) and carries out descending arrangement, choose the biggish a number of of error
Negative sample of the sample as training, to guarantee positive and negative sample proportion close to 1: 3.Model can be made to obtain stable training in this way,
Ensure that model can restrain.
Classification confidence level is lost, needs to consider the choosing of positive sample candidate frame and negative sample candidate frame in training image
It selects, that is to say, that only hand over and the candidate frame than reaching threshold value is positive sample.Classification confidence can be calculated based on following formula
Penalty values:
Wherein, i is the serial number for predicting object frame, and j is the serial number of real-world object frame, and N indicates the quantity of matched candidate frame,
Pos indicates the quantity of the positive sample candidate frame in training image, and Neg indicates the quantity of the negative sample candidate frame in training image,Indicate the ratio that prediction classification is p,Indicate that i-th of prediction object frame corresponds to the classification confidence of classification p,It indicates
Whether i-th of prediction object frame matches with j-th of real-world object frame about classification p.
Based on the gradient value that above-mentioned penalty values obtain, the parameter value through multiple inverse iteration more new model, when penalty values
Weighted sum meets predetermined condition, such as the difference of the front and back penalty values weighted sum of iteration twice is less than predetermined threshold, or reaches pre-
When determining the number of iterations, training terminates.
After obtaining trained object detection model according to method 400, so that it may execute object inspection in the terminal
Survey method.According to a kind of embodiment, image to be detected (in an embodiment according to the present invention, may be wrapped in image to be detected
Contain the target objects such as cat face, dog face, face) it inputs in trained object detection model, to obtain each object frame in image
Position and classification.Specifically, characteristic extracting module carries out process of convolution to image to be detected, generates at least one characteristic pattern;
Prediction module predicts the classification of target object (that is, each object frame) from least one characteristic pattern that characteristic extracting module is extracted
The position and.By in mobile terminal application test, compared with traditional SSD object detection model, the calculating speed of this programme is mentioned
It is high by 20%, it can be realized the real-time detection of object.
According to the solution of the present invention, it is improved by the network structure to object detection model, in characteristic extracting module
It is middle to use depth residual error network unit, wherein jumping connection structure using more, low-level image feature can be fused to upper one
Layer, improves the accuracy and speed of model inspection.Object detection model provided by this programme can either match the meter of mobile terminal
Efficiency and memory are calculated, and can satisfy the requirement of object detection precision.
A8, the method as described in A7, wherein calculate positioning penalty values and classification confidence level penalty values based on following formula
Weighted sum:
Wherein, LlocTo position penalty values, LconfFor classification confidence penalty values, N is the quantity of matched candidate frame, and α is
Weight coefficient, g are the positions of real-world object frame, and l is the position for predicting object frame, and x is the classification of mark, and c is classification confidence level.
A9, the method as described in A8, wherein calculate the positioning penalty values based on following formula:
Wherein, i is the serial number for predicting object frame, and j is the serial number of real-world object frame, and cx, cy are the center of candidate frame, w, h
For the width and height of candidate frame, m indicates the size of candidate frame,For i-th of prediction object frame and j-th of real-world object frame
Between position deviation, Pos indicates the quantity of positive sample candidate frame in training image, and N indicates the quantity of matched candidate frame,Indicate whether i-th of prediction object frame matches with j-th of real-world object frame about classification k.
A10, the method as described in A8, wherein calculate classification confidence penalty values based on following formula:
Wherein, i is the serial number for predicting object frame, and j is the serial number of real-world object frame, and N indicates the quantity of matched candidate frame,
Pos indicates the quantity of the positive sample candidate frame in training image, and Neg indicates the quantity of the negative sample candidate frame in training image,Indicate the ratio that prediction classification is p,Indicate that i-th of prediction object frame corresponds to the classification confidence of classification p,It indicates
Whether i-th of prediction object frame matches with j-th of real-world object frame about classification p.
A11, method as described in a1, wherein the described method includes:
The object detection model of pre-training is generated based on image data set, and training figure is included at least in described image data set
The image of each object category as in, the object category in the training image includes cat face, dog face, face and background.
A12, method as described in a1, wherein the method also includes:
Data enhancing processing and normalized are carried out to training image.
A13, the method as described in A12, wherein data enhancing processing includes overturning, rotation, color jitter, at random
It cuts, random brightness adjustment, random comparison is to any one of adjustment, Fuzzy Processing or multinomial.
It should be appreciated that in order to simplify the disclosure and help to understand one or more of the various inventive aspects, it is right above
In the description of exemplary embodiment of the present invention, each feature of the invention be grouped together into sometimes single embodiment, figure or
In person's descriptions thereof.However, the disclosed method should not be interpreted as reflecting the following intention: i.e. claimed hair
Bright requirement is than feature more features expressly recited in each claim.More precisely, as the following claims
As book reflects, inventive aspect is all features less than single embodiment disclosed above.Therefore, it then follows specific real
Thus the claims for applying mode are expressly incorporated in the specific embodiment, wherein each claim itself is used as this hair
Bright separate embodiments.
Those skilled in the art should understand that the module of the equipment in example disclosed herein or unit or groups
Part can be arranged in equipment as depicted in this embodiment, or alternatively can be positioned at and the equipment in the example
In different one or more equipment.Module in aforementioned exemplary can be combined into a module or furthermore be segmented into multiple
Submodule.
Those skilled in the art will understand that can be carried out adaptively to the module in the equipment in embodiment
Change and they are arranged in one or more devices different from this embodiment.It can be the module or list in embodiment
Member or component are combined into a module or unit or component, and furthermore they can be divided into multiple submodule or subelement or
Sub-component.Other than such feature and/or at least some of process or unit exclude each other, it can use any
Combination is to all features disclosed in this specification (including adjoint claim, abstract and attached drawing) and so disclosed
All process or units of what method or apparatus are combined.Unless expressly stated otherwise, this specification is (including adjoint power
Benefit require, abstract and attached drawing) disclosed in each feature can carry out generation with an alternative feature that provides the same, equivalent, or similar purpose
It replaces.
In addition, it will be appreciated by those of skill in the art that although some embodiments described herein include other embodiments
In included certain features rather than other feature, but the combination of the feature of different embodiments mean it is of the invention
Within the scope of and form different embodiments.For example, in the following claims, embodiment claimed is appointed
Meaning one of can in any combination mode come using.
Various technologies described herein are realized together in combination with hardware or software or their combination.To the present invention
Method and apparatus or the process and apparatus of the present invention some aspects or part can take insertion tangible media, such as it is soft
The form of program code (instructing) in disk, CD-ROM, hard disk drive or other any machine readable storage mediums,
Wherein when program is loaded into the machine of such as computer etc, and is executed by the machine, the machine becomes to practice this hair
Bright equipment.
In the case where program code executes on programmable computers, calculates equipment and generally comprise processor, processor
Readable storage medium (including volatile and non-volatile memory and or memory element), at least one input unit, and extremely
A few output device.Wherein, memory is configured for storage program code;Processor is configured for according to the memory
Instruction in the said program code of middle storage executes method of the present invention.
By way of example and not limitation, computer-readable medium includes computer storage media and communication media.It calculates
Machine readable medium includes computer storage media and communication media.Computer storage medium storage such as computer-readable instruction,
The information such as data structure, program module or other data.Communication media is generally modulated with carrier wave or other transmission mechanisms etc.
Data-signal processed passes to embody computer readable instructions, data structure, program module or other data including any information
Pass medium.Above any combination is also included within the scope of computer-readable medium.
In addition, be described as herein can be by the processor of computer system or by executing by some in the embodiment
The combination of method or method element that other devices of the function are implemented.Therefore, have for implementing the method or method
The processor of the necessary instruction of element forms the device for implementing this method or method element.In addition, Installation practice
Element described in this is the example of following device: the device be used for implement as in order to implement the purpose of the invention element performed by
Function.
As used in this, unless specifically stated, come using ordinal number " first ", " second ", " third " etc.
Description plain objects, which are merely representative of, is related to the different instances of similar object, and is not intended to imply that the object being described in this way must
Must have the time it is upper, spatially, sequence aspect or given sequence in any other manner.
Although the embodiment according to limited quantity describes the present invention, above description, the art are benefited from
It is interior it is clear for the skilled person that in the scope of the present invention thus described, it can be envisaged that other embodiments.Additionally, it should be noted that
Language used in this specification primarily to readable and introduction purpose and select, rather than in order to explain or limit
Determine subject of the present invention and selects.Therefore, without departing from the scope and spirit of the appended claims, for this
Many modifications and changes are obvious for the those of ordinary skill of technical field.For the scope of the present invention, to this
Invent done disclosure be it is illustrative and not restrictive, the scope of the invention is defined by the appended claims.
Claims (10)
1. a kind of method for generating object detection model, the method is suitable for executing in calculating equipment, comprising:
The training image comprising labeled data is obtained, the labeled data is the position of target object and classification in training image;
It will be handled in the object detection model of training image input pre-training, the object detection model includes being mutually coupled
Characteristic extracting module and prediction module, wherein
The characteristic extracting module includes multiple depth residual error network units and convolution processing unit, is suitable for the training image
Process of convolution is carried out, to generate at least one characteristic pattern;
The prediction module is suitable for predicting classification and the position of target object from least one characteristic pattern;
Object category and position based on labeled data and prediction are trained the object detection model of the pre-training, with
Object detection model after being trained is as object detection model generated.
2. the method for claim 1, wherein the depth residual error network unit includes multiple convolution kernels being mutually coupled
Size is the process of convolution layer of 3*3 and jumps articulamentum, two process of convolution for jumping articulamentum and being suitable for be mutually coupled
The characteristic pattern phase adduction output of layer output.
3. method according to claim 2, the process of convolution layer includes convolutional layer, batch normalized layer and activation
Layer, wherein the batch normalization layer is merged into convolutional layer.
4. the method for claim 1, wherein the prediction module includes class prediction unit and position prediction unit,
The class prediction unit is suitable for exporting the classification confidence level of each object in image, and the position prediction unit is suitable for output figure
The position of target object is predicted as in.
5. the method for claim 1, wherein the position of the target object of the mark is that the characteristic point of target object is sat
Mark or real-world object frame.
6. method as claimed in claim 5, wherein the prediction module further includes candidate frame generation unit and candidate frame matching
Unit, each characteristic pattern that the candidate frame generation unit is suitable for export the characteristic extracting module according to different sizes with
Length-width ratio generates corresponding multiple candidate frames, and the candidate frame matching unit is suitable for choosing and the matched candidate of real-world object frame
Frame, to be predicted based on matched candidate frame.
7. method as claimed in claim 6, wherein the object category and position based on labeled data and prediction, to institute
It states the step of the object detection model of pre-training is trained and includes:
The classification of positioning penalty values and mark between real-world object frame position based on mark and prediction object frame position in advance
The classification confidence level penalty values between classification confidence level are surveyed, the parameter of object detection model are updated, until the positioning penalty values
When meeting predetermined condition with the weighted sum of classification confidence level penalty values, training terminates.
8. a kind of object detecting method, this method is suitable for executing in the terminal, comprising:
Image to be detected is inputted in object detection model, to obtain the position of each object frame and classification in image,
Wherein the object detection model is generated using the method as described in claim 1-7 any one.
9. a kind of calculating equipment, comprising:
Memory;
One or more processors;
One or more programs, wherein one or more of programs are stored in the memory and are configured as by described one
A or multiple processors execute, and one or more of programs include for executing in -8 the methods according to claim 1
The instruction of either method.
10. a kind of computer readable storage medium for storing one or more programs, one or more of programs include instruction,
Described instruction is when calculating equipment execution, so that the equipment that calculates executes appointing in method described in -8 according to claim 1
One method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910369470.8A CN110084313A (en) | 2019-05-05 | 2019-05-05 | A method of generating object detection model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910369470.8A CN110084313A (en) | 2019-05-05 | 2019-05-05 | A method of generating object detection model |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110084313A true CN110084313A (en) | 2019-08-02 |
Family
ID=67418597
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910369470.8A Pending CN110084313A (en) | 2019-05-05 | 2019-05-05 | A method of generating object detection model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110084313A (en) |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110688893A (en) * | 2019-08-22 | 2020-01-14 | 成都通甲优博科技有限责任公司 | Detection method for wearing safety helmet, model training method and related device |
CN110796185A (en) * | 2019-10-17 | 2020-02-14 | 北京爱数智慧科技有限公司 | Method and device for detecting image annotation result |
CN111046974A (en) * | 2019-12-25 | 2020-04-21 | 珠海格力电器股份有限公司 | Article classification method and device, storage medium and electronic equipment |
CN111160156A (en) * | 2019-12-17 | 2020-05-15 | 北京明略软件***有限公司 | Moving object identification method and device |
CN111179241A (en) * | 2019-12-25 | 2020-05-19 | 成都数之联科技有限公司 | Panel defect detection and classification method and system |
CN111428591A (en) * | 2020-03-11 | 2020-07-17 | 天津华来科技有限公司 | AI face image processing method, device, equipment and storage medium |
CN111444828A (en) * | 2020-03-25 | 2020-07-24 | 腾讯科技(深圳)有限公司 | Model training method, target detection method, device and storage medium |
CN111709310A (en) * | 2020-05-26 | 2020-09-25 | 重庆大学 | Gesture tracking and recognition method based on deep learning |
CN112529940A (en) * | 2020-12-17 | 2021-03-19 | 北京深睿博联科技有限责任公司 | Moving target position prediction method and device under fixed camera |
CN112950613A (en) * | 2020-05-19 | 2021-06-11 | 惠州高视科技有限公司 | Surface defect detection method and device |
WO2021155792A1 (en) * | 2020-02-03 | 2021-08-12 | 华为技术有限公司 | Processing apparatus, method and storage medium |
CN113392927A (en) * | 2021-07-01 | 2021-09-14 | 哈尔滨理工大学 | Animal target detection method based on single-order deep neural network |
CN113781416A (en) * | 2021-08-30 | 2021-12-10 | 武汉理工大学 | Conveyer belt tearing detection method and device and electronic equipment |
CN114511041A (en) * | 2022-04-01 | 2022-05-17 | 北京世纪好未来教育科技有限公司 | Model training method, image processing method, device, equipment and storage medium |
CN116524339A (en) * | 2023-07-05 | 2023-08-01 | 宁德时代新能源科技股份有限公司 | Object detection method, apparatus, computer device, storage medium, and program product |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108288075A (en) * | 2018-02-02 | 2018-07-17 | 沈阳工业大学 | A kind of lightweight small target detecting method improving SSD |
CN108416394A (en) * | 2018-03-22 | 2018-08-17 | 河南工业大学 | Multi-target detection model building method based on convolutional neural networks |
CN109712117A (en) * | 2018-12-11 | 2019-05-03 | 重庆信息通信研究院 | Lightweight TFT-LCD mould group scratch detection method based on convolutional neural networks |
-
2019
- 2019-05-05 CN CN201910369470.8A patent/CN110084313A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108288075A (en) * | 2018-02-02 | 2018-07-17 | 沈阳工业大学 | A kind of lightweight small target detecting method improving SSD |
CN108416394A (en) * | 2018-03-22 | 2018-08-17 | 河南工业大学 | Multi-target detection model building method based on convolutional neural networks |
CN109712117A (en) * | 2018-12-11 | 2019-05-03 | 重庆信息通信研究院 | Lightweight TFT-LCD mould group scratch detection method based on convolutional neural networks |
Non-Patent Citations (2)
Title |
---|
曾钰廷: "基于深度学习的物体检测与跟踪方法的研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
毕鹏程 等: "轻量化卷积神经网络技术研究", 《计算机工程与应用》 * |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110688893A (en) * | 2019-08-22 | 2020-01-14 | 成都通甲优博科技有限责任公司 | Detection method for wearing safety helmet, model training method and related device |
CN110796185A (en) * | 2019-10-17 | 2020-02-14 | 北京爱数智慧科技有限公司 | Method and device for detecting image annotation result |
CN111160156A (en) * | 2019-12-17 | 2020-05-15 | 北京明略软件***有限公司 | Moving object identification method and device |
CN111046974A (en) * | 2019-12-25 | 2020-04-21 | 珠海格力电器股份有限公司 | Article classification method and device, storage medium and electronic equipment |
CN111179241A (en) * | 2019-12-25 | 2020-05-19 | 成都数之联科技有限公司 | Panel defect detection and classification method and system |
WO2021155792A1 (en) * | 2020-02-03 | 2021-08-12 | 华为技术有限公司 | Processing apparatus, method and storage medium |
CN111428591A (en) * | 2020-03-11 | 2020-07-17 | 天津华来科技有限公司 | AI face image processing method, device, equipment and storage medium |
CN111444828B (en) * | 2020-03-25 | 2023-06-20 | 腾讯科技(深圳)有限公司 | Model training method, target detection method, device and storage medium |
CN111444828A (en) * | 2020-03-25 | 2020-07-24 | 腾讯科技(深圳)有限公司 | Model training method, target detection method, device and storage medium |
CN112950613A (en) * | 2020-05-19 | 2021-06-11 | 惠州高视科技有限公司 | Surface defect detection method and device |
CN111709310A (en) * | 2020-05-26 | 2020-09-25 | 重庆大学 | Gesture tracking and recognition method based on deep learning |
CN111709310B (en) * | 2020-05-26 | 2024-02-02 | 重庆大学 | Gesture tracking and recognition method based on deep learning |
CN112529940A (en) * | 2020-12-17 | 2021-03-19 | 北京深睿博联科技有限责任公司 | Moving target position prediction method and device under fixed camera |
CN112529940B (en) * | 2020-12-17 | 2022-02-11 | 北京深睿博联科技有限责任公司 | Moving target position prediction method and device under fixed camera |
CN113392927A (en) * | 2021-07-01 | 2021-09-14 | 哈尔滨理工大学 | Animal target detection method based on single-order deep neural network |
CN113781416A (en) * | 2021-08-30 | 2021-12-10 | 武汉理工大学 | Conveyer belt tearing detection method and device and electronic equipment |
CN114511041A (en) * | 2022-04-01 | 2022-05-17 | 北京世纪好未来教育科技有限公司 | Model training method, image processing method, device, equipment and storage medium |
CN116524339A (en) * | 2023-07-05 | 2023-08-01 | 宁德时代新能源科技股份有限公司 | Object detection method, apparatus, computer device, storage medium, and program product |
CN116524339B (en) * | 2023-07-05 | 2023-10-13 | 宁德时代新能源科技股份有限公司 | Object detection method, apparatus, computer device, storage medium, and program product |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110084313A (en) | A method of generating object detection model | |
CN110070072A (en) | A method of generating object detection model | |
CN110084253A (en) | A method of generating object detection model | |
Barsoum et al. | Hp-gan: Probabilistic 3d human motion prediction via gan | |
Yi et al. | ASSD: Attentive single shot multibox detector | |
CN111797893B (en) | Neural network training method, image classification system and related equipment | |
Liu et al. | Learning spatio-temporal representations for action recognition: A genetic programming approach | |
Babenko et al. | Robust object tracking with online multiple instance learning | |
CN110378381A (en) | Object detecting method, device and computer storage medium | |
CN110309856A (en) | Image classification method, the training method of neural network and device | |
WO2020015752A1 (en) | Object attribute identification method, apparatus and system, and computing device | |
Seyedhosseini et al. | Semantic image segmentation with contextual hierarchical models | |
JP2020513637A (en) | System and method for data management | |
CN109559300A (en) | Image processing method, electronic equipment and computer readable storage medium | |
CN110096964A (en) | A method of generating image recognition model | |
CN109934173A (en) | Expression recognition method, device and electronic equipment | |
CN110516803A (en) | Traditional computer vision algorithm is embodied as neural network | |
EP3987443A1 (en) | Recurrent multi-task convolutional neural network architecture | |
US20230137337A1 (en) | Enhanced machine learning model for joint detection and multi person pose estimation | |
CN110287857A (en) | A kind of training method of characteristic point detection model | |
CN110276289A (en) | Generate the method and human face characteristic point method for tracing of Matching Model | |
CN109583367A (en) | Image text row detection method and device, storage medium and electronic equipment | |
CN109522970A (en) | Image classification method, apparatus and system | |
CN110084312A (en) | A method of generating object detection model | |
Kang et al. | Yolo-6d+: single shot 6d pose estimation using privileged silhouette information |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190802 |
|
RJ01 | Rejection of invention patent application after publication |