CN110084313A - A method of generating object detection model - Google Patents

A method of generating object detection model Download PDF

Info

Publication number
CN110084313A
CN110084313A CN201910369470.8A CN201910369470A CN110084313A CN 110084313 A CN110084313 A CN 110084313A CN 201910369470 A CN201910369470 A CN 201910369470A CN 110084313 A CN110084313 A CN 110084313A
Authority
CN
China
Prior art keywords
detection model
object detection
frame
prediction
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910369470.8A
Other languages
Chinese (zh)
Inventor
齐子铭
李启东
陈裕潮
张伟
李志阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen Meitu Technology Co Ltd
Original Assignee
Xiamen Meitu Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen Meitu Technology Co Ltd filed Critical Xiamen Meitu Technology Co Ltd
Priority to CN201910369470.8A priority Critical patent/CN110084313A/en
Publication of CN110084313A publication Critical patent/CN110084313A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a kind of methods for generating object detection model, comprising: obtains the training image comprising labeled data, labeled data is the position of target object and classification in training image;It will be handled in the object detection model of training image input pre-training, the object detection model includes the characteristic extracting module and prediction module being mutually coupled, wherein, characteristic extracting module includes depth residual error network unit and convolution processing unit, suitable for carrying out process of convolution to training image, to generate at least one characteristic pattern;Prediction module is suitable for predicting classification and the position of target object from least one characteristic pattern;Object category and position based on labeled data and prediction are trained the object detection model of pre-training, using the object detection model after being trained as object detection model generated.

Description

A method of generating object detection model
Technical field
The present invention relates to technical field of computer vision more particularly to a kind of methods for generating object detection model, object Detection method calculates equipment and storage medium.
Background technique
Object detection is the basis of many Computer Vision Tasks, one known in input picture suitable for positioning and identifying Or multiple targets, be generally applied to scene content understanding, video monitoring, content-based image retrieval, robot navigation and The fields such as augmented reality.
Traditional object detecting method generally divides three phases: firstly, candidate frame region is extracted, using sliding window to whole Width image traversal obtains the position that object is likely to occur;Then, to the candidate frame extracted region feature that these are extracted, common side Method has SIFT (Scale invariant features transform), HOG (histograms of oriented gradients) etc.;Finally, feature input classifier is divided Class, common classifier have SVM (support vector machines), Adaboost (iterative algorithm) etc..Traditional object detecting method time Complexity is high, and window redundancy needs manual designs feature, and variation robustness multifarious to object is low.
The object detection method based on deep learning achieves important progress in recent years.Main stream approach is broadly divided into two Type: one kind is test problems to be divided into two stages, firstly, passing through inspiration based on the two-part algorithm of region nomination Formula method generates a series of sparse candidate frames, and then these candidate frames are classified and returned.Typically there is R-CNN (base Convolutional neural networks in region), SPPNet (spatial pyramid pond network) and various improved R-CNN serial algorithms Deng.This mode accuracy in detection is higher, but calculating speed is slower.One is unistage type algorithms end to end, that is, do not need The extracted region stage directly generates the class probability and position coordinates of object.By equably being carried out in the different location of picture Intensive sampling can use different scale and length-width ratio when sampling, then be extracted after feature directly using convolutional neural networks Classified and is returned.Typically there are YOLO, SSD etc..It is fast that this mode detects speed, but accuracy rate is lower.
Therefore, it is necessary to a kind of object detecting methods, and the calculating speed of model can be improved while reducing model size And accuracy rate.
Summary of the invention
For this purpose, the present invention provides a kind of method for generating object detection model, to try hard to solve or at least in alleviation At least one problem existing for face.
According to an aspect of the invention, there is provided a kind of method for generating object detection model, this method is suitable for counting It calculates and is executed in equipment, comprising: firstly, obtaining the training image comprising labeled data, labeled data is object in training image The position of body and classification.Then, it will be handled in the object detection model of training image input pre-training, object detection model Including the characteristic extracting module and prediction module being mutually coupled, wherein characteristic extracting module include depth residual error network unit and Convolution processing unit is suitable for carrying out process of convolution to training image, to generate at least one characteristic pattern;Prediction module be suitable for to Classification and the position of target object are predicted in a few characteristic pattern.Finally, object category based on labeled data and prediction and Position is trained the object detection model of pre-training, using the object detection model after being trained as object generated Body detection model.
Optionally, in the above-mentioned methods, it is 3*3 that depth residual error network unit, which includes multiple convolution kernel sizes being mutually coupled, Process of convolution layer and jump articulamentum, jump articulamentum be suitable for will be mutually coupled two process of convolution layers output characteristic pattern The output of phase adduction.
Optionally, in the above-mentioned methods, process of convolution layer includes convolutional layer, batch normalized layer and active coating, In, batch normalizes layer and is merged into convolutional layer.
Optionally, in the above-mentioned methods, prediction module includes class prediction unit and position prediction unit, class prediction list Member is suitable for exporting the classification confidence level of each object in image, and position prediction unit, which is suitable for exporting, predicts target object in image Position.
Optionally, in the above-mentioned methods, the position of the target object of mark is the characteristic point coordinate or true of target object Object frame.
Optionally, in the above-mentioned methods, prediction module further includes candidate frame generation unit and candidate frame matching unit.It is candidate Each characteristic pattern that frame generation unit is suitable for exporting characteristic extracting module generates corresponding according to different sizes and length-width ratio Multiple candidate frames, candidate frame matching unit is suitable for choosing and the matched candidate frame of real-world object frame, to be based on matched candidate Frame is predicted.
Optionally, in the above-mentioned methods, determine between the real-world object frame position based on mark and prediction object frame position Classification confidence level penalty values between bit-loss value and the classification and prediction classification confidence level of mark, update object detection model Parameter, when the weighted sum of the positioning penalty values and classification confidence level penalty values meets predetermined condition, training terminates.
Optionally, in the above-mentioned methods, based on following formula calculate positioning penalty values and classification confidence level penalty values plus Quan He:
Wherein, LlocTo position penalty values, LconfFor classification confidence penalty values, N is the quantity of matched candidate frame, and α is Weight coefficient, g are the positions of real-world object frame, and l is the position for predicting object frame, and x is the classification of mark, and C is classification confidence level.
Optionally, in the above-mentioned methods, positioning penalty values are calculated based on following formula:
Wherein, i is the serial number for predicting object frame, and j is the serial number of real-world object frame, and cx, cy are the center of candidate frame, w, h For the width and height of candidate frame, m indicates the size of candidate frame,For i-th of prediction object frame and j-th of real-world object frame Between position deviation, Pos indicates the quantity of positive sample candidate frame in training image, and N indicates the quantity of matched candidate frame,Indicate whether i-th of prediction object frame matches with j-th of real-world object frame about classification k.
Optionally, in the above-mentioned methods, classification confidence level penalty values are calculated based on following formula:
Wherein, i is the serial number for predicting object frame, and j is the serial number of real-world object frame, and N indicates the quantity of matched candidate frame, Pos indicates the quantity of the positive sample candidate frame in training image, and Neg indicates the quantity of the negative sample candidate frame in training image,Indicate the ratio that prediction classification is p,Indicate that i-th of prediction object frame corresponds to the classification confidence of classification p,It indicates Whether i-th of prediction object frame matches with j-th of real-world object frame about classification p.
Optionally, in the above-mentioned methods, the object detection model of pre-training is generated based on image data set, wherein image Include at least the image of each object category in training image in data set, the object category in training image include cat face, Dog face, face and background.
Optionally, in the above-mentioned methods, data enhancing processing and normalized are carried out to training image.
Optionally, in the above-mentioned methods, data enhancing processing include overturning, rotation, color jitter, random cropping, at random Brightness adjustment, random comparison are to any one of adjustment, Fuzzy Processing or multinomial.
According to a further aspect of the present invention, a kind of object detecting method is provided, image to be detected can be inputted into object In detection model, to obtain the position of each object frame and classification in image, wherein object detection model is using as described above Method generates.
According to another aspect of the invention, a kind of calculating equipment is provided, comprising: one or more processors;And storage Device;One or more programs, wherein one or more programs store in memory and are configured as being handled by one or more Device executes, and one or more programs include the instruction for either executing in method as described above method.
In accordance with a further aspect of the present invention, a kind of computer-readable storage medium for storing one or more programs is provided Matter, one or more programs include instruction, and instruction is when calculating equipment execution, so that calculating equipment executes method as described above In either method.
Scheme according to the present invention, object detection model include the characteristic extracting module and prediction module being mutually coupled, each mould The convolutional layer of block uses less port number, reduces the size of model.Further, object detection model uses depth residual error Low-level image feature can be fused to upper one layer, improve the accuracy and speed of model inspection by network unit.Therefore, this programme institute The object detection model of offer can either match the computational efficiency and memory of mobile terminal, and can satisfy wanting for object detection precision It asks.
Detailed description of the invention
To the accomplishment of the foregoing and related purposes, certain illustrative sides are described herein in conjunction with following description and drawings Face, these aspects indicate the various modes that can practice principles disclosed herein, and all aspects and its equivalent aspect It is intended to fall in the range of theme claimed.Read following detailed description in conjunction with the accompanying drawings, the disclosure it is above-mentioned And other purposes, feature and advantage will be apparent.Throughout the disclosure, identical appended drawing reference generally refers to identical Component or element.
Fig. 1 shows the organigram according to an embodiment of the invention for calculating equipment 100;
Fig. 2 shows the structural schematic diagrams of object detection model 200 according to an embodiment of the invention;
Fig. 3 shows the schematic network structure of depth residual error network unit 300 according to an embodiment of the invention;
Fig. 4 shows the schematic stream of the method 400 according to an embodiment of the invention for generating object detection model Cheng Tu;
Fig. 5 shows the schematic diagram of the training image according to an embodiment of the invention comprising labeled data;
Fig. 6 shows the schematic diagram of image data enhancing processing according to an embodiment of the invention.
Specific embodiment
Exemplary embodiments of the present disclosure are described in more detail below with reference to accompanying drawings.Although showing the disclosure in attached drawing Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth here It is limited.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be by the scope of the present disclosure It is fully disclosed to those skilled in the art.
Object detection is position and the classification in order to go out objects in images with collimation mark.Based on SSD object detection model not With being identified under the characteristic pattern of level, more ranges can be covered, generally, SSD object detection model includes the basis VGG Network and pyramid network have 16 layers or 19 layers, keep the parameter amount of model larger, nothing since VGG has deeper network structure Method meets the requirement of mobile terminal.In order to realize real-time object detection, model is made to meet the requirement of mobile end memory and calculating speed, This programme improves the network structure of SSD object detection model, to reduce the size of model, improve and detection accuracy and improve Calculating speed can satisfy the real-time object detection in mobile terminal.
Fig. 1 is the block diagram of Example Computing Device 100.In basic configuration 102, calculating equipment 100, which typically comprises, is System memory 106 and one or more processor 104.Memory bus 108 can be used for storing in processor 104 and system Communication between device 106.
Depending on desired configuration, processor 104 can be any kind of processor, including but not limited to: micro process Device (μ P), microcontroller (μ C), digital information processor (DSP) or any combination of them.Processor 104 may include all Cache, processor core such as one or more rank of on-chip cache 110 and second level cache 112 etc 114 and register 116.Exemplary processor core 114 may include arithmetic and logical unit (ALU), floating-point unit (FPU), Digital signal processing core (DSP core) or any combination of them.Exemplary Memory Controller 118 can be with processor 104 are used together, or in some implementations, and Memory Controller 118 can be an interior section of processor 104.
Depending on desired configuration, system storage 106 can be any type of memory, including but not limited to: easily The property lost memory (RAM), nonvolatile memory (ROM, flash memory etc.) or any combination of them.System storage Device 106 may include operating system 120, one or more is using 122 and program data 124.In some embodiments, It may be arranged to be operated using program data 124 on an operating system using 122.In some embodiments, equipment is calculated 100 are configured as executing the method 400 for generating object detection model, just contain in program data 124 for executing method 400 Instruction.
Calculating equipment 100 can also include facilitating from various interface equipments (for example, output equipment 142, Peripheral Interface 144 and communication equipment 146) to basic configuration 102 via the communication of bus/interface controller 130 interface bus 140.Example Output equipment 142 include graphics processing unit 148 and audio treatment unit 150.They can be configured as facilitate via One or more port A/V 152 is communicated with the various external equipments of such as display or loudspeaker etc.Outside example If interface 144 may include serial interface controller 154 and parallel interface controller 156, they, which can be configured as, facilitates Via one or more port I/0 158 and such as input equipment (for example, keyboard, mouse, pen, voice-input device, image Input equipment) or the external equipment of other peripheral hardwares (such as printer, scanner etc.) etc communicated.Exemplary communication is set Standby 146 may include network controller 160, can be arranged to convenient for via one or more communication port 164 and one A or multiple other calculate communication of the equipment 162 by network communication link.
Network communication link can be an example of communication media.Communication media can be usually presented as in such as carrier wave Or computer readable instructions, data structure, program module in the modulated data signal of other transmission mechanisms etc, and can To include any information delivery media." modulated data signal " can be such signal, one in its data set or Multiple or its change can be carried out in a manner of encoded information in the signal.As unrestricted example, communication media It may include the wired medium of such as cable network or private line network etc, and such as sound, radio frequency (RF), microwave, red Various wireless mediums including (IR) or other wireless mediums outside.Term computer-readable medium used herein may include Both storage medium and communication media.In some embodiments, one or more programs are stored in computer-readable medium, this It include the instruction for executing certain methods in a or multiple programs.
Calculating equipment 100 can be implemented as a part of portable (or mobile) electronic equipment of small size, these electronics are set It is standby to can be such as cellular phone, digital camera, personal digital assistant (PDA), personal media player device, wireless network Browsing apparatus, personal helmet, application specific equipment or may include any of the above function mixing apparatus.Certainly, it counts Calculate equipment 100 and also can be implemented as include desktop computer and notebook computer configuration personal computer, or have The server of above-mentioned configuration.Embodiments of the present invention to this with no restriction.
First before model training, need to be configured the network structure and parameter of model.Fig. 2 shows according to this The structural schematic diagram of the object detection model 200 of one embodiment of invention.As shown in Fig. 2, object detection model 200 includes phase The characteristic extracting module 210 and prediction module 220 mutually coupled.Wherein, characteristic extracting module 210 includes depth residual error network unit And convolution processing unit, it is suitable for carrying out process of convolution to input picture, to generate at least one characteristic pattern.Prediction module 220 is wrapped Include candidate frame generation unit 221 and candidate frame matching unit 222, class prediction unit 223 and position prediction unit 224.It is candidate Each characteristic pattern that frame generation unit 221 is suitable for exporting characteristic extracting module 210 is generated according to different sizes and length-width ratio Corresponding multiple candidate frames.Candidate frame matching unit 222 be suitable for choose with the matched candidate frame of real-world object frame, so as to based on The candidate frame matched is predicted.Class prediction unit 223 is suitable for exporting the classification confidence level of each object in image, position prediction Unit 224 is suitable for exporting the position that object frame is predicted in image.
For object detection model, if simply increasing depth, it may appear that the phenomenon that accuracy rate of network declines.Error Raised the phenomenon that gradient disappears, is just more obvious the reason is that network is deeper, can not effective handle so when back-propagating To the network layer of front, forward network layer parameter can not update gradient updating, and trained and test effect is caused to be deteriorated.If Network design is H (x)=F (x)+x, so that it may as long as being converted to study one residual error function F (x)=- x. F (x)=0 H (x), Just constitute identical mapping H (x)=x.Depth residual error network can increase an identical mapping, and current output is directly passed It is defeated by next layer network, a shortcut has been equivalent to away, has skipped this layer of operation, this is directly connected to be named as " the company of jumping Connect ", while it being directly passed to a upper layer network during back-propagating, and by the gradient of next layer network, thus solve The gradient disappearance problem of deep layer of having determined network.
According to one embodiment of present invention, characteristic extracting module can use ResNet depth residual error network unit.Table 1 Show the subnetwork parameter of characteristic extracting module 210 according to an embodiment of the invention.Wherein, number Conv_1, Layer_19_2_2, layer_19_2_3, layer_19_2_4, layer_19_2_5 are convolution processing units, number conv1, Conv2, Conv2_sum constitute a depth residual error network unit, and conv_3, conv_4, Conv_4_sum constitute a depth Residual error network unit, conv_5, conv_6, Conv_6_sum one depth residual error network unit of composition, conv_7, conv_8, Conv_8_sum constitutes a depth residual error network unit, and conv_9, conv_10, Conv_10_sum constitute a depth residual error Network unit.
In table 1, Conv is convolutional layer, and BN is batch normalization layer, and it is active coating that ReLU, which indicates activation primitive,.Sum is to jump Turn articulamentum.Kh, kw respectively indicate the height and width of convolution kernel, and padding is Filling power, and stride is convolution step-length, num_ Output indicates the quantity of the matched candidate frame of output, and group indicates grouping convolution, and group=1 expression is not grouped.
The subnetwork parameter of 1 characteristic extracting module of table
As shown in Table 1, characteristic extracting module includes multiple ResNet units and convolution processing unit.Each ResNet is mono- Member includes the process of convolution layer that two convolution kernel sizes being mutually coupled are 3*3 and jumps articulamentum, and process of convolution layer includes volume Lamination, batch normalization layer and active coating.In neural metwork training network model, batch, which normalizes layer, can speed up network receipts It holds back, and the generation of over-fitting can be controlled, can effectively solve the problem that gradient disappears and gradient explosion issues are generally placed upon volume After lamination, before active coating.Although BN layers training when play positive effect, however, before network to infer when more than Some layers of operation influences the performance of model, and occupies more memory or video memory space.It therefore, can be by batch normalizing Change layer and be merged into convolutional layer, can be improved the calculating speed of model in this way, to be suitable for the real-time object detection of mobile terminal.Activation Layer uses ReLU activation primitive, can also use any type of activation primitives such as leakyReLU, tanh, sigmoid, This is without limitation.Feature is extracted using various sizes of convolution kernel in lightweight convolution unit, while two outputs being connected It is connected to together, it can be with lifting feature dimension.For example, conv1 and convolution kernel size that convolution kernel size is 3*3 by Conv2_sum It is attached for the conv2 of the 3*3 characteristic pattern exported.
As described above, each process layer can export corresponding characteristic pattern in characteristic extracting module 210, according to this hair Bright embodiment, from wherein extracting at least one characteristic pattern for being processed into output, for prediction module 220 carry out position and Class prediction.In one embodiment, as shown in table 1, extracting its middle layer number is conv_8_sum, Conv_1, layer_ The characteristic pattern that 6 process layers of 19_2_2, layer_19_2_3, layer_19_2_4, layer_19_2_5 are exported.
Depth residual error network unit realizes that short circuit connection can't increase additional to network by way of jumping connection Parameter and calculation amount, while can but greatly increase the training speed of model, improve training effect, and when the number of plies of model adds When deep, residual error structure can be good at solving degenerate problem.Fig. 3 shows depth residual error according to an embodiment of the invention The schematic diagram of network unit 300.As shown in figure 3, depth residual error network unit includes multiple convolution kernels being mutually coupled Size is the process of convolution layer of 3*3 and jumps articulamentum, jumps two process of convolution layers that articulamentum is suitable for be mutually coupled Characteristic pattern is added.For residual error network, the matched short circuit of dimension is connected as solid line connection, otherwise connects for dotted line.Dimension is not Timing, there are two types of optinal plans for same mapping: directly by increasing dimension with 0 filling.
Prediction module 220 may include class prediction unit 223 and position prediction unit 224.Table 2 and table 3 are shown respectively The network parameter of position prediction unit according to an embodiment of the invention and class prediction unit.According to the present invention one A embodiment, prediction module 220 further include candidate frame generation unit 231 and candidate frame matching unit 222, and wherein candidate frame generates Each characteristic pattern that unit is suitable for exporting characteristic extracting module 210 generates corresponding multiple according to different sizes and length-width ratio Candidate frame.Candidate frame matching unit be suitable for choose with the matched candidate frame of real-world object frame, so as to based on matched candidate frame into Row prediction.
The network parameter of 2 position prediction unit of table
The subnetwork parameter of 3 class prediction unit of table
Wherein, mbox block is candidate frame and real-world object frame from the characteristic pattern extracted in characteristic extracting module The candidate frame matched.Concat layers of effect is exactly to splice two or more characteristic patterns on channel dimension, will be with big Small characteristic pattern is stitched together.Table 4 shows the network parameter of candidate frame generation unit according to an embodiment of the invention. Wherein PriorBox indicates that the candidate frame generated, aspect_ratio indicate to generate the length-width ratio of candidate frame, and min_size makes a living At the smallest dimension of candidate frame, max_size is the out to out for generating candidate frame.
The network parameter of 4 candidate frame generation unit of table
In the training process, it first has to determine that the real-world object frame in training picture is matched with which candidate frame, Matching candidate frame is responsible for predicting true frame.Table 5 shows the network parameter of candidate frame matching unit.Wherein, Permute Layer can reset the dimension of input according to mould-fixed.Flatten layers can be by input " pressing ", i.e., the defeated of multidimensional Enter one-dimensional.Prediction module finally integrates the prediction output of 6 characteristic patterns.The row of order expression matching candidate frame Sequence, axis:1 are indicated using 1 value along each row or column label mould to executing corresponding method.
The network parameter of 5 candidate frame matching unit of table
After completing the setting of network structure and parameter of model, the generation object detection model of this programme can be executed Method.Fig. 4 shows the schematic flow of the method 400 according to an embodiment of the invention for generating object detection model Figure.Wherein object detection model may include that (structure about model can join for characteristic extracting module, Fusion Module and prediction module It examines and is described above, details are not described herein again).This method can execute in calculating equipment 100, as shown in figure 4, this method 400 begins In step S410.
It according to some embodiments of the present invention, can be first to constructed object detection mould before executing step S410 Type carries out pre-training.According to one embodiment of present invention, image data the set pair analysis model can be primarily based on and carry out pre-training, with Just the parameter for initializing object detection model, that is, generate the object detection model of pre-training.For example, image data set can be VOC data set includes 20 catalogues: the mankind in data set;Animal (bird, cat, ox, dog, horse, sheep);The vehicles (aircraft, from Driving, ship, bus, car, motorcycle, train);Indoor (bottle, chair, dining table, potted plant, sofa, TV). It also needs to consider background when using VOC data set training pattern, it is therefore desirable to the model of 21 classifications of training.For different Layer can initialize 4 classifications (cat faces, dog face, people of the invention with the biggish weighted value of weighted value in the corresponding layer of modulus type Face, background) object detection model.By the method for this pre-training, model convergence rate can be accelerated, while improving model Detection accuracy.The COCO data set that Microsoft can also be used to provide carries out the pre-training of model, and wherein COCO data set has 3 kinds of marks Infuse type: object instance, target critical point and iamge description can be advantageously applied to object detection.This programme is to picture number According to collection using without limitation.
In step S410, the training image comprising labeled data is obtained, labeled data is target object in training image Position and classification.The position of real-world object frame can be gone out with Direct Mark, object frame can also be calculated by the characteristic point of mark Position.This programme to the mask method of labeled data without limitation.
Fig. 5 shows the schematic diagram of the training image according to an embodiment of the invention comprising labeled data.Such as Fig. 5 institute Show, in order to detect the cat in picture, dog, face, the frame of each examined object first in mark picture, then in frame Object marks out classification (also needing in model training plus background classification).For the ease of display, in each object in Fig. 5 The classification of target object: cat, dog, face has been marked out beside frame.Cat face classification can also be labeled as to 1, dog face classification mark Note is 2, and face classification is labeled as 3, and background classification is labeled as 0.Another implementation according to the present invention, simultaneously for one Comprising cat face, dog face, face image, cat face characteristic point, dog face characteristic point and human face characteristic point can be marked first, in total 30 A characteristic point (quantity of characteristic point mark can be adjusted as the case may be) and the class label for marking each object.Example Such as, cat face is labeled as 1, and dog face is labeled as 2, and face is labeled as 3, and background is labeled as 0.It can be based on the characteristic point coordinate of mark Calculate the position of real-world object frame.For example, obtaining the maximum value and minimum value of all characteristic point coordinates, respectively xmin, xmax, ymin, ymax.So the coordinate of object frame is (xmin, ymin, w, h), w=xmax-xmin, h=ymax-ymin
According to one embodiment of present invention, in the input layer of model, training image can also be pre-processed, it can be with Enhance processing and normalized including data.In order to detect the object under various natural scenes, guarantee the effective of model Training can carry out data extending or enhancing to training image.By to picture Random-Rotation, random brightness, setting contrast And Fuzzy Processing etc., to simulate the image data under various natural scenes.Fig. 6 is shown according to one embodiment of present invention Image data enhancing processing schematic diagram.As shown in fig. 6, be from left to right followed successively by rotation, dim, lighten, enhance contrast, Fuzzy Processing.In addition, it can include overturning (horizontally or vertically), change of scale (adjustment image resolution ratio), take at random ( Image block is taken in original image at random), color jitter (slight noise is added to original pixel Distribution value) etc., complicated data expand Filling method, there are also GAN generation confrontation network generation, principal component analysis, supervised to take and (only take the figure of obvious semantic information As block) etc..
It should be noted that not all data enhancement methods can be used at will, such as facial image Flip vertical is carried out with regard to improper.In data enhancing, it is also necessary to which image data and flag data are synchronized expansion, example Such as Image Reversal or rotation, corresponding mark coordinate accordingly will overturn or rotate.Due to the size of real image be it is unfixed, If changing the size of image, the markup information of image is with regard to incorrect, so simultaneously to the size modification of image, Corresponding variation is done to markup information.The mark of image can be cut according to the original size of image and the ratio of markup information Infuse the corresponding image of information.
Then in the step s 420, it will be handled in the object detection model of training image input pre-training, wherein object Body detection model includes the characteristic extracting module and prediction module being mutually coupled, wherein characteristic extracting module includes multiple depth Residual error network unit is suitable for carrying out process of convolution to training image, to generate at least one characteristic pattern.Prediction module be suitable for to Classification and the position of target object are predicted in a few characteristic pattern.
Finally in step S430, object category and position based on labeled data and prediction examine the object of pre-training It surveys model to be trained, using the object detection model after being trained as object detection model generated.
According to one embodiment of present invention, can real-world object frame position based on mark and prediction object frame position it Between positioning penalty values and mark classification and prediction classification confidence level between classification confidence level penalty values, update object detection The parameter of model, when the weighted sum until positioning penalty values and classification confidence level penalty values meets predetermined condition, training terminates.? In an implementation of the invention, for location error, it can be calculated using Smooth loss function, confidence level is missed Difference can be calculated using softmax loss function.
The weighted sum of positioning penalty values and classification confidence level penalty values can be calculated based on following formula:
Wherein, LlocTo position penalty values, LconfFor classification confidence penalty values, N is and the matched candidate of real-world object frame The quantity of frame, α are weight coefficient, and weight coefficient can be set to 1.G is the location parameter of real-world object frame, and l is prediction object The location parameter of frame, x are the classification of mark, and c is classification confidence level.
The positioning penalty values can be calculated based on following formula:
Wherein, i is the serial number for predicting object frame, and j is the serial number of real-world object frame, and cx, cy are the center of candidate frame, w, h For the width and height of candidate frame, m indicates the size of candidate frame,For i-th of prediction object frame and j-th of real-world object frame Between position deviation, Pos indicates the quantity of positive sample candidate frame in training image, and N indicates the quantity of matched candidate frame,Indicate whether i-th of prediction object frame matches with j-th of real-world object frame about classification k, matching is 1, and mismatching is 0.
Since the gradient of the error in deep-neural-network can add up to be multiplied in the updating, if the gradient between network layer Value is greater than 1, then repeating to be multiplied will lead to gradient and be exponentially increased, and causing network weight significantly to update makes network become shakiness It is fixed.Therefore it is lost using mean square deviation when predicted value is differed with true value less than 1 using smooth loss function and adds 0.5 Smoothing factor then reduces loss power, at this moment backpropagation derivation is not just deposited when predicted value and true value are differed by more than equal to 1 At this, so as to solve the problems, such as that gradient is exploded.
In the training process, it first has to determine that the real-world object frame in training picture is matched with which candidate frame, Matching candidate frame will be responsible for predicting it.Candidate frame and true frame matching principle mainly have two o'clock.First principle is: right Each true frame in picture finds and hands over it and than maximum candidate frame, then the candidate frame is matched.Second principle Be: for remaining not matched candidate frame, if hand over and compare be greater than some threshold value (usually 0.5), then the candidate frame also with This true frame matches.After candidate frame matching step, most of candidate frames are all negative samples, this leads to positive sample and bears Imbalance between sample.In order to guarantee that positive negative sample balances as far as possible, negative sample can be sampled, according to confidence when sampling It spends error (confidence level of projected background is smaller, and error is bigger) and carries out descending arrangement, choose the biggish a number of of error Negative sample of the sample as training, to guarantee positive and negative sample proportion close to 1: 3.Model can be made to obtain stable training in this way, Ensure that model can restrain.
Classification confidence level is lost, needs to consider the choosing of positive sample candidate frame and negative sample candidate frame in training image It selects, that is to say, that only hand over and the candidate frame than reaching threshold value is positive sample.Classification confidence can be calculated based on following formula Penalty values:
Wherein, i is the serial number for predicting object frame, and j is the serial number of real-world object frame, and N indicates the quantity of matched candidate frame, Pos indicates the quantity of the positive sample candidate frame in training image, and Neg indicates the quantity of the negative sample candidate frame in training image,Indicate the ratio that prediction classification is p,Indicate that i-th of prediction object frame corresponds to the classification confidence of classification p,It indicates Whether i-th of prediction object frame matches with j-th of real-world object frame about classification p.
Based on the gradient value that above-mentioned penalty values obtain, the parameter value through multiple inverse iteration more new model, when penalty values Weighted sum meets predetermined condition, such as the difference of the front and back penalty values weighted sum of iteration twice is less than predetermined threshold, or reaches pre- When determining the number of iterations, training terminates.
After obtaining trained object detection model according to method 400, so that it may execute object inspection in the terminal Survey method.According to a kind of embodiment, image to be detected (in an embodiment according to the present invention, may be wrapped in image to be detected Contain the target objects such as cat face, dog face, face) it inputs in trained object detection model, to obtain each object frame in image Position and classification.Specifically, characteristic extracting module carries out process of convolution to image to be detected, generates at least one characteristic pattern; Prediction module predicts the classification of target object (that is, each object frame) from least one characteristic pattern that characteristic extracting module is extracted The position and.By in mobile terminal application test, compared with traditional SSD object detection model, the calculating speed of this programme is mentioned It is high by 20%, it can be realized the real-time detection of object.
According to the solution of the present invention, it is improved by the network structure to object detection model, in characteristic extracting module It is middle to use depth residual error network unit, wherein jumping connection structure using more, low-level image feature can be fused to upper one Layer, improves the accuracy and speed of model inspection.Object detection model provided by this programme can either match the meter of mobile terminal Efficiency and memory are calculated, and can satisfy the requirement of object detection precision.
A8, the method as described in A7, wherein calculate positioning penalty values and classification confidence level penalty values based on following formula Weighted sum:
Wherein, LlocTo position penalty values, LconfFor classification confidence penalty values, N is the quantity of matched candidate frame, and α is Weight coefficient, g are the positions of real-world object frame, and l is the position for predicting object frame, and x is the classification of mark, and c is classification confidence level.
A9, the method as described in A8, wherein calculate the positioning penalty values based on following formula:
Wherein, i is the serial number for predicting object frame, and j is the serial number of real-world object frame, and cx, cy are the center of candidate frame, w, h For the width and height of candidate frame, m indicates the size of candidate frame,For i-th of prediction object frame and j-th of real-world object frame Between position deviation, Pos indicates the quantity of positive sample candidate frame in training image, and N indicates the quantity of matched candidate frame,Indicate whether i-th of prediction object frame matches with j-th of real-world object frame about classification k.
A10, the method as described in A8, wherein calculate classification confidence penalty values based on following formula:
Wherein, i is the serial number for predicting object frame, and j is the serial number of real-world object frame, and N indicates the quantity of matched candidate frame, Pos indicates the quantity of the positive sample candidate frame in training image, and Neg indicates the quantity of the negative sample candidate frame in training image,Indicate the ratio that prediction classification is p,Indicate that i-th of prediction object frame corresponds to the classification confidence of classification p,It indicates Whether i-th of prediction object frame matches with j-th of real-world object frame about classification p.
A11, method as described in a1, wherein the described method includes:
The object detection model of pre-training is generated based on image data set, and training figure is included at least in described image data set The image of each object category as in, the object category in the training image includes cat face, dog face, face and background.
A12, method as described in a1, wherein the method also includes:
Data enhancing processing and normalized are carried out to training image.
A13, the method as described in A12, wherein data enhancing processing includes overturning, rotation, color jitter, at random It cuts, random brightness adjustment, random comparison is to any one of adjustment, Fuzzy Processing or multinomial.
It should be appreciated that in order to simplify the disclosure and help to understand one or more of the various inventive aspects, it is right above In the description of exemplary embodiment of the present invention, each feature of the invention be grouped together into sometimes single embodiment, figure or In person's descriptions thereof.However, the disclosed method should not be interpreted as reflecting the following intention: i.e. claimed hair Bright requirement is than feature more features expressly recited in each claim.More precisely, as the following claims As book reflects, inventive aspect is all features less than single embodiment disclosed above.Therefore, it then follows specific real Thus the claims for applying mode are expressly incorporated in the specific embodiment, wherein each claim itself is used as this hair Bright separate embodiments.
Those skilled in the art should understand that the module of the equipment in example disclosed herein or unit or groups Part can be arranged in equipment as depicted in this embodiment, or alternatively can be positioned at and the equipment in the example In different one or more equipment.Module in aforementioned exemplary can be combined into a module or furthermore be segmented into multiple Submodule.
Those skilled in the art will understand that can be carried out adaptively to the module in the equipment in embodiment Change and they are arranged in one or more devices different from this embodiment.It can be the module or list in embodiment Member or component are combined into a module or unit or component, and furthermore they can be divided into multiple submodule or subelement or Sub-component.Other than such feature and/or at least some of process or unit exclude each other, it can use any Combination is to all features disclosed in this specification (including adjoint claim, abstract and attached drawing) and so disclosed All process or units of what method or apparatus are combined.Unless expressly stated otherwise, this specification is (including adjoint power Benefit require, abstract and attached drawing) disclosed in each feature can carry out generation with an alternative feature that provides the same, equivalent, or similar purpose It replaces.
In addition, it will be appreciated by those of skill in the art that although some embodiments described herein include other embodiments In included certain features rather than other feature, but the combination of the feature of different embodiments mean it is of the invention Within the scope of and form different embodiments.For example, in the following claims, embodiment claimed is appointed Meaning one of can in any combination mode come using.
Various technologies described herein are realized together in combination with hardware or software or their combination.To the present invention Method and apparatus or the process and apparatus of the present invention some aspects or part can take insertion tangible media, such as it is soft The form of program code (instructing) in disk, CD-ROM, hard disk drive or other any machine readable storage mediums, Wherein when program is loaded into the machine of such as computer etc, and is executed by the machine, the machine becomes to practice this hair Bright equipment.
In the case where program code executes on programmable computers, calculates equipment and generally comprise processor, processor Readable storage medium (including volatile and non-volatile memory and or memory element), at least one input unit, and extremely A few output device.Wherein, memory is configured for storage program code;Processor is configured for according to the memory Instruction in the said program code of middle storage executes method of the present invention.
By way of example and not limitation, computer-readable medium includes computer storage media and communication media.It calculates Machine readable medium includes computer storage media and communication media.Computer storage medium storage such as computer-readable instruction, The information such as data structure, program module or other data.Communication media is generally modulated with carrier wave or other transmission mechanisms etc. Data-signal processed passes to embody computer readable instructions, data structure, program module or other data including any information Pass medium.Above any combination is also included within the scope of computer-readable medium.
In addition, be described as herein can be by the processor of computer system or by executing by some in the embodiment The combination of method or method element that other devices of the function are implemented.Therefore, have for implementing the method or method The processor of the necessary instruction of element forms the device for implementing this method or method element.In addition, Installation practice Element described in this is the example of following device: the device be used for implement as in order to implement the purpose of the invention element performed by Function.
As used in this, unless specifically stated, come using ordinal number " first ", " second ", " third " etc. Description plain objects, which are merely representative of, is related to the different instances of similar object, and is not intended to imply that the object being described in this way must Must have the time it is upper, spatially, sequence aspect or given sequence in any other manner.
Although the embodiment according to limited quantity describes the present invention, above description, the art are benefited from It is interior it is clear for the skilled person that in the scope of the present invention thus described, it can be envisaged that other embodiments.Additionally, it should be noted that Language used in this specification primarily to readable and introduction purpose and select, rather than in order to explain or limit Determine subject of the present invention and selects.Therefore, without departing from the scope and spirit of the appended claims, for this Many modifications and changes are obvious for the those of ordinary skill of technical field.For the scope of the present invention, to this Invent done disclosure be it is illustrative and not restrictive, the scope of the invention is defined by the appended claims.

Claims (10)

1. a kind of method for generating object detection model, the method is suitable for executing in calculating equipment, comprising:
The training image comprising labeled data is obtained, the labeled data is the position of target object and classification in training image;
It will be handled in the object detection model of training image input pre-training, the object detection model includes being mutually coupled Characteristic extracting module and prediction module, wherein
The characteristic extracting module includes multiple depth residual error network units and convolution processing unit, is suitable for the training image Process of convolution is carried out, to generate at least one characteristic pattern;
The prediction module is suitable for predicting classification and the position of target object from least one characteristic pattern;
Object category and position based on labeled data and prediction are trained the object detection model of the pre-training, with Object detection model after being trained is as object detection model generated.
2. the method for claim 1, wherein the depth residual error network unit includes multiple convolution kernels being mutually coupled Size is the process of convolution layer of 3*3 and jumps articulamentum, two process of convolution for jumping articulamentum and being suitable for be mutually coupled The characteristic pattern phase adduction output of layer output.
3. method according to claim 2, the process of convolution layer includes convolutional layer, batch normalized layer and activation Layer, wherein the batch normalization layer is merged into convolutional layer.
4. the method for claim 1, wherein the prediction module includes class prediction unit and position prediction unit, The class prediction unit is suitable for exporting the classification confidence level of each object in image, and the position prediction unit is suitable for output figure The position of target object is predicted as in.
5. the method for claim 1, wherein the position of the target object of the mark is that the characteristic point of target object is sat Mark or real-world object frame.
6. method as claimed in claim 5, wherein the prediction module further includes candidate frame generation unit and candidate frame matching Unit, each characteristic pattern that the candidate frame generation unit is suitable for export the characteristic extracting module according to different sizes with Length-width ratio generates corresponding multiple candidate frames, and the candidate frame matching unit is suitable for choosing and the matched candidate of real-world object frame Frame, to be predicted based on matched candidate frame.
7. method as claimed in claim 6, wherein the object category and position based on labeled data and prediction, to institute It states the step of the object detection model of pre-training is trained and includes:
The classification of positioning penalty values and mark between real-world object frame position based on mark and prediction object frame position in advance The classification confidence level penalty values between classification confidence level are surveyed, the parameter of object detection model are updated, until the positioning penalty values When meeting predetermined condition with the weighted sum of classification confidence level penalty values, training terminates.
8. a kind of object detecting method, this method is suitable for executing in the terminal, comprising:
Image to be detected is inputted in object detection model, to obtain the position of each object frame and classification in image,
Wherein the object detection model is generated using the method as described in claim 1-7 any one.
9. a kind of calculating equipment, comprising:
Memory;
One or more processors;
One or more programs, wherein one or more of programs are stored in the memory and are configured as by described one A or multiple processors execute, and one or more of programs include for executing in -8 the methods according to claim 1 The instruction of either method.
10. a kind of computer readable storage medium for storing one or more programs, one or more of programs include instruction, Described instruction is when calculating equipment execution, so that the equipment that calculates executes appointing in method described in -8 according to claim 1 One method.
CN201910369470.8A 2019-05-05 2019-05-05 A method of generating object detection model Pending CN110084313A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910369470.8A CN110084313A (en) 2019-05-05 2019-05-05 A method of generating object detection model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910369470.8A CN110084313A (en) 2019-05-05 2019-05-05 A method of generating object detection model

Publications (1)

Publication Number Publication Date
CN110084313A true CN110084313A (en) 2019-08-02

Family

ID=67418597

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910369470.8A Pending CN110084313A (en) 2019-05-05 2019-05-05 A method of generating object detection model

Country Status (1)

Country Link
CN (1) CN110084313A (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110688893A (en) * 2019-08-22 2020-01-14 成都通甲优博科技有限责任公司 Detection method for wearing safety helmet, model training method and related device
CN110796185A (en) * 2019-10-17 2020-02-14 北京爱数智慧科技有限公司 Method and device for detecting image annotation result
CN111046974A (en) * 2019-12-25 2020-04-21 珠海格力电器股份有限公司 Article classification method and device, storage medium and electronic equipment
CN111160156A (en) * 2019-12-17 2020-05-15 北京明略软件***有限公司 Moving object identification method and device
CN111179241A (en) * 2019-12-25 2020-05-19 成都数之联科技有限公司 Panel defect detection and classification method and system
CN111428591A (en) * 2020-03-11 2020-07-17 天津华来科技有限公司 AI face image processing method, device, equipment and storage medium
CN111444828A (en) * 2020-03-25 2020-07-24 腾讯科技(深圳)有限公司 Model training method, target detection method, device and storage medium
CN111709310A (en) * 2020-05-26 2020-09-25 重庆大学 Gesture tracking and recognition method based on deep learning
CN112529940A (en) * 2020-12-17 2021-03-19 北京深睿博联科技有限责任公司 Moving target position prediction method and device under fixed camera
CN112950613A (en) * 2020-05-19 2021-06-11 惠州高视科技有限公司 Surface defect detection method and device
WO2021155792A1 (en) * 2020-02-03 2021-08-12 华为技术有限公司 Processing apparatus, method and storage medium
CN113392927A (en) * 2021-07-01 2021-09-14 哈尔滨理工大学 Animal target detection method based on single-order deep neural network
CN113781416A (en) * 2021-08-30 2021-12-10 武汉理工大学 Conveyer belt tearing detection method and device and electronic equipment
CN114511041A (en) * 2022-04-01 2022-05-17 北京世纪好未来教育科技有限公司 Model training method, image processing method, device, equipment and storage medium
CN116524339A (en) * 2023-07-05 2023-08-01 宁德时代新能源科技股份有限公司 Object detection method, apparatus, computer device, storage medium, and program product

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108288075A (en) * 2018-02-02 2018-07-17 沈阳工业大学 A kind of lightweight small target detecting method improving SSD
CN108416394A (en) * 2018-03-22 2018-08-17 河南工业大学 Multi-target detection model building method based on convolutional neural networks
CN109712117A (en) * 2018-12-11 2019-05-03 重庆信息通信研究院 Lightweight TFT-LCD mould group scratch detection method based on convolutional neural networks

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108288075A (en) * 2018-02-02 2018-07-17 沈阳工业大学 A kind of lightweight small target detecting method improving SSD
CN108416394A (en) * 2018-03-22 2018-08-17 河南工业大学 Multi-target detection model building method based on convolutional neural networks
CN109712117A (en) * 2018-12-11 2019-05-03 重庆信息通信研究院 Lightweight TFT-LCD mould group scratch detection method based on convolutional neural networks

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
曾钰廷: "基于深度学习的物体检测与跟踪方法的研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
毕鹏程 等: "轻量化卷积神经网络技术研究", 《计算机工程与应用》 *

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110688893A (en) * 2019-08-22 2020-01-14 成都通甲优博科技有限责任公司 Detection method for wearing safety helmet, model training method and related device
CN110796185A (en) * 2019-10-17 2020-02-14 北京爱数智慧科技有限公司 Method and device for detecting image annotation result
CN111160156A (en) * 2019-12-17 2020-05-15 北京明略软件***有限公司 Moving object identification method and device
CN111046974A (en) * 2019-12-25 2020-04-21 珠海格力电器股份有限公司 Article classification method and device, storage medium and electronic equipment
CN111179241A (en) * 2019-12-25 2020-05-19 成都数之联科技有限公司 Panel defect detection and classification method and system
WO2021155792A1 (en) * 2020-02-03 2021-08-12 华为技术有限公司 Processing apparatus, method and storage medium
CN111428591A (en) * 2020-03-11 2020-07-17 天津华来科技有限公司 AI face image processing method, device, equipment and storage medium
CN111444828B (en) * 2020-03-25 2023-06-20 腾讯科技(深圳)有限公司 Model training method, target detection method, device and storage medium
CN111444828A (en) * 2020-03-25 2020-07-24 腾讯科技(深圳)有限公司 Model training method, target detection method, device and storage medium
CN112950613A (en) * 2020-05-19 2021-06-11 惠州高视科技有限公司 Surface defect detection method and device
CN111709310A (en) * 2020-05-26 2020-09-25 重庆大学 Gesture tracking and recognition method based on deep learning
CN111709310B (en) * 2020-05-26 2024-02-02 重庆大学 Gesture tracking and recognition method based on deep learning
CN112529940A (en) * 2020-12-17 2021-03-19 北京深睿博联科技有限责任公司 Moving target position prediction method and device under fixed camera
CN112529940B (en) * 2020-12-17 2022-02-11 北京深睿博联科技有限责任公司 Moving target position prediction method and device under fixed camera
CN113392927A (en) * 2021-07-01 2021-09-14 哈尔滨理工大学 Animal target detection method based on single-order deep neural network
CN113781416A (en) * 2021-08-30 2021-12-10 武汉理工大学 Conveyer belt tearing detection method and device and electronic equipment
CN114511041A (en) * 2022-04-01 2022-05-17 北京世纪好未来教育科技有限公司 Model training method, image processing method, device, equipment and storage medium
CN116524339A (en) * 2023-07-05 2023-08-01 宁德时代新能源科技股份有限公司 Object detection method, apparatus, computer device, storage medium, and program product
CN116524339B (en) * 2023-07-05 2023-10-13 宁德时代新能源科技股份有限公司 Object detection method, apparatus, computer device, storage medium, and program product

Similar Documents

Publication Publication Date Title
CN110084313A (en) A method of generating object detection model
CN110070072A (en) A method of generating object detection model
CN110084253A (en) A method of generating object detection model
Barsoum et al. Hp-gan: Probabilistic 3d human motion prediction via gan
Yi et al. ASSD: Attentive single shot multibox detector
CN111797893B (en) Neural network training method, image classification system and related equipment
Liu et al. Learning spatio-temporal representations for action recognition: A genetic programming approach
Babenko et al. Robust object tracking with online multiple instance learning
CN110378381A (en) Object detecting method, device and computer storage medium
CN110309856A (en) Image classification method, the training method of neural network and device
WO2020015752A1 (en) Object attribute identification method, apparatus and system, and computing device
Seyedhosseini et al. Semantic image segmentation with contextual hierarchical models
JP2020513637A (en) System and method for data management
CN109559300A (en) Image processing method, electronic equipment and computer readable storage medium
CN110096964A (en) A method of generating image recognition model
CN109934173A (en) Expression recognition method, device and electronic equipment
CN110516803A (en) Traditional computer vision algorithm is embodied as neural network
EP3987443A1 (en) Recurrent multi-task convolutional neural network architecture
US20230137337A1 (en) Enhanced machine learning model for joint detection and multi person pose estimation
CN110287857A (en) A kind of training method of characteristic point detection model
CN110276289A (en) Generate the method and human face characteristic point method for tracing of Matching Model
CN109583367A (en) Image text row detection method and device, storage medium and electronic equipment
CN109522970A (en) Image classification method, apparatus and system
CN110084312A (en) A method of generating object detection model
Kang et al. Yolo-6d+: single shot 6d pose estimation using privileged silhouette information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190802

RJ01 Rejection of invention patent application after publication