CN110503097A - Training method, device and the storage medium of image processing model - Google Patents
Training method, device and the storage medium of image processing model Download PDFInfo
- Publication number
- CN110503097A CN110503097A CN201910798468.2A CN201910798468A CN110503097A CN 110503097 A CN110503097 A CN 110503097A CN 201910798468 A CN201910798468 A CN 201910798468A CN 110503097 A CN110503097 A CN 110503097A
- Authority
- CN
- China
- Prior art keywords
- target object
- network
- region
- image processing
- target
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
- G06V10/462—Salient features, e.g. scale invariant feature transforms [SIFT]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
Abstract
The present invention provides a kind of training method of image processing model, device and storage mediums, image processing model includes: backbone network, region candidate network and detection network, method includes: to pass through backbone network, feature extraction is carried out to the sample image comprising target object, obtains the characteristic pattern of sample image;By region candidate network, region selection is carried out to characteristic pattern, determines candidate region;By detecting network, target object detection is carried out to candidate region, the classification parameter and location parameter of target object are obtained, the classification parameter includes the classification results of the corresponding target object, and location parameter includes: the encirclement frame, segmentation mask and key point mask of corresponding target object;Classification parameter and location parameter based on target object determine the value of the target loss function of image processing model;Based on the value of target loss function, model parameter is updated.By means of the invention it is possible to accurately position the target object in image, the precision of target detection is improved.
Description
Technical field
The present invention relates to machine learning techniques field more particularly to a kind of training method of image processing model, device and
Storage medium.
Background technique
Machine learning (ML, machine Learning) is a branch of artificial intelligence, generally includes artificial neural network
The technologies such as network, confidence network, intensified learning, transfer learning and inductive learning, the purpose of machine learning are to allow machine according to priori
Knowledge learnt, thus have classification and judgement logical capability.Using neural network as the machine learning model of representative not
Disconnected development, the target detection being gradually applied in image procossing.
In the related technology, the training for the neural network model of target object in detection image is based only upon mesh in image
The corresponding encirclement frame information of object is marked, so that accuracy when obtained image processing model being trained to carry out target detection is low.
Summary of the invention
The embodiment of the present invention provides training method, device and the storage medium of a kind of image processing model, can be accurate
The target object in image is positioned, the precision of target detection is improved.
The technical solution of the embodiment of the present invention is achieved in that
The embodiment of the present invention provides a kind of training method of image processing model, and it includes: backbone that described image, which handles model,
Network, region candidate network and detection network, which comprises
By the backbone network, feature extraction is carried out to the sample image comprising target object, obtains the sample graph
The characteristic pattern of picture;
By the region candidate network, region selection is carried out to the characteristic pattern, determines candidate region;
By the detection network, target object detection is carried out to the candidate region, obtains the position of the target object
Parameter and classification parameter are set, the classification parameter includes the classification results of the corresponding target object, and the location parameter includes:
Encirclement frame, segmentation mask and the key point mask of the corresponding target object;
Encirclement frame, segmentation mask, key point mask and classification results based on the target object, determine at described image
Manage the value of the target loss function of model;
Based on the value of the determining target loss function, the model parameter of described image processing model is updated.
The embodiment of the present invention provides a kind of training device of image processing model, comprising:
Characteristic extracting module, for carrying out feature to the sample image comprising target object and mentioning by the backbone network
It takes, obtains the characteristic pattern of the sample image;
Module is chosen in region, for carrying out region selection to the characteristic pattern, determining and wait by the region candidate network
Favored area;
Obj ect detection module, for carrying out target object detection to the candidate region, obtaining by the detection network
The location parameter and classification parameter of the target object, the classification parameter include the classification results of the corresponding target object,
The location parameter includes: the encirclement frame, segmentation mask and key point mask of the corresponding target object;
Determining module is lost, for encirclement frame, segmentation mask, key point mask and classification knot based on the target object
Fruit determines the value of the target loss function of described image processing model;
Parameter updating module updates described image and handles model for the value based on the determining target loss function
Model parameter.
In above scheme, module is chosen in the region, for generating the corresponding feature by the region candidate network
Multiple initial encirclement frames of figure;
The multiple initial encirclement frame is scanned by sliding window, determines in the multiple initial encirclement frame and corresponds to the initial of prospect
Surround frame;
The initial encirclement frame of the corresponding prospect is carried out to surround frame recurrence, to determine candidate region.
In above scheme, described device further include:
Divide module, for intercepting the corresponding spy in the candidate region from the characteristic pattern by the detection network
Region is levied, candidate feature region is obtained;
The feature in the candidate feature region is adjusted to the characteristic dimension of fixed size.
In above scheme, the obj ect detection module is also used to the fully-connected network for including by the detection network, point
It is other that target object detection is carried out to the candidate region, determine the candidate region comprising the target object;
Based on the candidate region comprising the target object, carries out surrounding frame recurrence, obtain corresponding to the target pair
The encirclement frame of elephant.
In above scheme, the obj ect detection module is also used to the convolutional network for including by the detection network, respectively
Correspond to the candidate region semantic segmentation of the target object, generates the segmentation mask of the corresponding target object.
In above scheme, the obj ect detection module is also used to the full convolutional network for including by the detection network, point
The semantic segmentation of the other key point for correspond to the candidate region target object generates the corresponding target object
The key point mask of key point.
In above scheme, the loss determining module, is also used to obtain the encirclement frame respectively and target is surrounded between frame
The first difference, it is described segmentation mask and Target Segmentation mask between the second difference, the key point mask and target critical
Third difference, the classification results between point mask and the 4th difference between target classification result;
Based on first difference, second difference, the third difference and the 4th difference, described image is determined
Handle the value of the loss function of model.
In above scheme, the parameter updating module is also used to when the value of the loss function is beyond preset threshold, base
Corresponding error signal is determined in the loss function of described image processing model;
By error signal backpropagation in described image processing model, and the figure is updated during propagation
As the model parameter of processing model.
The embodiment of the invention also provides a kind of training devices of image processing model, comprising:
Memory, for storing executable instruction;
Processor when for executing the executable instruction stored in the memory, is realized provided in an embodiment of the present invention
The training method of image processing model.
The embodiment of the present invention provides a kind of storage medium, is stored with executable instruction, real when for causing processor to execute
The training method of existing image processing model provided in an embodiment of the present invention.
The embodiment of the present invention has the advantages that
The embodiment of the present invention combines the encirclement frame of obtained target object, segmentation mask, key point mask and classification results,
It determines the value of the target loss function of image processing model, and then updates the model parameter of image processing model, realize to image
Handle the training of model;Scheming since the segmentation mask and key point mask of target object can more accurately characterize target object
Location information as in enables and trains obtained image processing model in conjunction with encirclement frame, segmentation mask and key point mask
The target object in image is more accurately positioned, the detection accuracy of target object is improved.
Detailed description of the invention
Fig. 1 is the schematic illustration of R-CNN provided in an embodiment of the present invention;
Fig. 2 is the schematic illustration of Fast R-CNN provided in an embodiment of the present invention;
Fig. 3 is the schematic illustration of Faster R-CNN provided in an embodiment of the present invention;
Fig. 4 is the configuration diagram of Faster R-CNN provided in an embodiment of the present invention;
Fig. 5 is the operation principle schematic diagram of RPN provided in an embodiment of the present invention;
Fig. 6 is the configuration diagram of Mask R-CNN provided in an embodiment of the present invention;
Fig. 7 is the configuration diagram of the training system of image processing model provided in an embodiment of the present invention;
Fig. 8 is the structural schematic diagram of electronic equipment 600 provided in an embodiment of the present invention;
Fig. 9 is the flow diagram of the training method of image processing model provided in an embodiment of the present invention;
Figure 10 is the schematic diagram that frame is initially surrounded in characteristic pattern provided in an embodiment of the present invention;
Figure 11 is the schematic illustration that encirclement frame provided in an embodiment of the present invention returns;
Figure 12 is the flow diagram of the training method of image processing model provided in an embodiment of the present invention;
Figure 13 is the flow diagram of the training method of image processing model provided in an embodiment of the present invention;
Figure 14 is the schematic diagram of advertisement position provided in an embodiment of the present invention detection.
Specific embodiment
To make the objectives, technical solutions, and advantages of the present invention clearer, below in conjunction with attached drawing to the present invention make into
It is described in detail to one step, described embodiment is not construed as limitation of the present invention, and those of ordinary skill in the art are not having
All other embodiment obtained under the premise of creative work is made, shall fall within the protection scope of the present invention.
In the following description, it is related to " some embodiments ", which depict the subsets of all possible embodiments, but can
To understand, " some embodiments " can be the same subsets or different subsets of all possible embodiments, and can not conflict
In the case where be combined with each other.
In the following description, related term " first second " be only be the similar object of difference, do not represent needle
To the particular sorted of object, it is possible to understand that specific sequence or successively can be interchanged in ground, " first second " in the case where permission
Order, so that the embodiment of the present invention described herein can be implemented with the sequence other than illustrating or describing herein.
Unless otherwise defined, all technical and scientific terms used herein and belong to technical field of the invention
The normally understood meaning of technical staff is identical.Term used herein is intended merely to the purpose of the description embodiment of the present invention,
It is not intended to limit the present invention.
Region convolutional neural networks (Regions with CNN features, R-CNN) are illustrated first.Fig. 1 is
The schematic illustration of R-CNN provided in an embodiment of the present invention, referring to Fig. 1, mode input first is a picture, then in picture
On propose the area to be tested of predetermined quantity (such as 2000), it is singly (serial mode) right by convolutional neural networks
The area to be tested of this predetermined quantity carries out feature extraction, and the feature extracted is carried out by a support vector machines (SVM)
Classification determines the classification of object, and surrounds frame by surrounding frame recurrence (Bounding box regression) adjustment target
Size.Image processing model based on R-CNN takes a long time when performing image processing, and treatment effeciency is low, and different in model
The module needs of function are respectively trained.
In the related technology, R-CNN is improved, proposes that Fast R-CNN, Fig. 2 are Fast provided in an embodiment of the present invention
The schematic illustration of R-CNN determines the area to be tested of predetermined quantity (such as 2000) referring to fig. 2 on picture, then, leads to
It crosses convolutional neural networks and feature extraction is carried out to the area to be tested of this predetermined quantity respectively, then pass through area-of-interest pond
Layer (ROI Pooling Layer, Region Of Interest Pooling Layer), wins each in full figure feature
The corresponding feature of ROI, then classification is carried out by full articulamentum (FC Layer, Fully Connected Layer) and surrounds frame
Amendment.Image processing model based on FastR-CNN directlys adopt one instead of the feature extraction mode that R-CNN is serial
Neural network extracts feature to full figure, but treatment effeciency is still lower.
In the related technology, propose that Faster R-CNN, Fig. 3 are provided in an embodiment of the present invention based on Fast R-CNN
The schematic illustration of Faster R-CNN is first that full figure extracts feature using shared convolutional layer referring to Fig. 3, then will
The characteristic pattern arrived is sent into region candidate network (RPN, Region Proposal Network), and it is (specified that RPN generates frame to be detected
The position of ROI) and first time amendment is carried out to the encirclement frame of ROI, it is later exactly the framework of Fast R-CNN, ROI
Pooling Layer chooses the corresponding feature of each ROI according to the output of RPN on characteristic pattern, and dimension is set to definite value,
Finally, classifying using full articulamentum to frame is surrounded, and surround second of amendment of frame.
Next the framework of Faster R-CNN is illustrated.Fig. 4 is Faster R- provided in an embodiment of the present invention
The configuration diagram of CNN, referring to fig. 4, the structure of Faster R-CNN are broadly divided into three parts, and first part is shared volume
Lamination-backbone network (backbone), second part are region candidate network RPN, and Part III is divided candidate region
The sorter network of class.
Here, the working principle of RPN is illustrated, the working principle that Fig. 5 is RPN provided in an embodiment of the present invention is illustrated
Figure, RPN rely on a window slided on sharing feature figure, and the encirclement frame of preset quantity (such as 9 kinds) is generated for each position
(anchor), for the encirclement frame of generation, there are two the things to be done of RPN, first be judgement surround frame correspond to prospect or
Background, namely judge that this surrounds frame on earth either with or without coverage goal, second is to belong to the encirclement frame of prospect and carrying out coordinate
Amendment.
In Faster R-CNN, feature is shared convolutional layer (backbone network) and disposably extracts, therefore, for each
It for ROI, needs to win corresponding feature from shared convolutional layer, and is sent into full articulamentum and classifies;Therefore, the pond ROI
Change layer and mainly done two pieces thing, first is to choose corresponding feature for each RoI, and second thing is to meet full articulamentum
Input demand, the dimension of the corresponding feature of each ROI is converted to definite value.
Based on the above-mentioned explanation to Faster R-CNN, next Mask R-CNN is illustrated.Fig. 6 is that the present invention is real
The configuration diagram of the Mask R-CNN of example offer is applied, referring to Fig. 6, Mask R-CNN is in Faster R-CNN to the pond ROI layer
It improves, proposes ROI Align, and increase Mask branch;Wherein, the effect of ROI Align is mainly exactly to eliminate
The floor operation of the pond ROI layer, and enable the feature obtained for each ROI that the ROI region in original image is better aligned;
By Mask branch, semantic segmentation, the Mask R-CNN mould that output segmentation mask, so training obtain are carried out to candidate region
Type improves the detection accuracy of image compared to Faster R-CNN model.
Next the training system of the image processing model of the embodiment of the present invention is illustrated, Fig. 7 is that the present invention is implemented
The configuration diagram of the training system for the image processing model that example provides supports an exemplary application referring to Fig. 7 to realize,
The training system 100 of image processing model includes terminal (illustratively showing terminal 400-1 and terminal 400-2), terminal 400
Server 200 is connected by network 300, network 300 can be wide area network or local area network, or be combination, makes
Data transmission is realized with wirelessly or non-wirelessly link.
Terminal (such as terminal 400-1), the train request for image processing model are taken in train request to server 200
Sample image of the band for image processing model training, which includes target object.
Here, in practical applications, terminal can be various types of for smart phone, tablet computer, laptop etc.
User terminal can also be broadcast for wearable computing devices, personal digital assistant (PDA), desktop computer, cellular phone, media
Put any two in device, navigation equipment, game machine, television set or these data processing equipments or other data processing equipments
Or multiple combination.
Server 200 obtains sample image for parsing to train request;
By backbone network, feature extraction is carried out to the sample image comprising target object, obtains the feature of sample image
Figure;
By region candidate network, region selection is carried out to characteristic pattern, determines candidate region;
By detect network, to candidate region carry out target object detection, obtain the target object location parameter and
Classification parameter, the classification parameter include the classification results of the corresponding target object, and the location parameter includes: described in correspondence
Encirclement frame, segmentation mask and the key point mask of target object;
Encirclement frame, segmentation mask, key point mask and classification results based on target object, determine image processing model
The value of target loss function;
Based on the value of determining target loss function, the model parameter of image processing model is updated.
In practical applications, server 200 both can be a server of the support various businesses being separately configured, and also may be used
To be configured to a server cluster.
Terminal (such as terminal 400-1) is also used to send the image processing requests for carrying target image to server 200;
Server 200, the image processing model for being also used to obtain using training (the i.e. execution updated image of above-mentioned parameter
Handle model), target object detection is carried out to target image, obtains and returns to the encirclement frame for including at least corresponding target object
Location parameter gives terminal (such as terminal 400-1);
Terminal (such as terminal 400-1), is also used in the user interface, knows target object by the encirclement collimation mark of return.
The electronic equipment for the training method for implementing real-time image processing of embodiment of the present invention model is illustrated below.In
In some embodiments, it can also be server that electronic equipment, which can be terminal,.It is that the embodiment of the present invention provides referring to Fig. 8, Fig. 8
Electronic equipment 600 structural schematic diagram, electronic equipment 600 shown in Fig. 8 includes: that processor 610, memory 650, network connect
Mouth 620 and user interface 630.Various components in electronic equipment 600 are coupled by bus system 640.It is understood that total
Linear system system 640 is for realizing the connection communication between these components.Bus system 640 except include data/address bus in addition to, further include
Power bus, control bus and status signal bus in addition.But for the sake of clear explanation, various buses are all designated as in fig. 8
Bus system 640.
Processor 610 can be a kind of IC chip, the processing capacity with signal, such as general processor, number
Word signal processor (DSP, Digital Signal Processor) either other programmable logic device, discrete gate or
Transistor logic, discrete hardware components etc., wherein general processor can be microprocessor or any conventional processing
Device etc..
User interface 630 include make it possible to present one or more output devices 631 of media content, including one or
Multiple loudspeakers and/or one or more visual display screens.User interface 630 further includes one or more input units 632, packet
Include the user interface component for facilitating user's input, for example keyboard, mouse, microphone, touch screen display screen, camera, other are defeated
Enter button and control.
Memory 650 can be it is removable, it is non-removable or combinations thereof.Illustrative hardware device includes that solid-state is deposited
Reservoir, hard disk drive, CD drive etc..Memory 650 optionally includes one geographically far from processor 610
A or multiple storage equipment.
Memory 650 includes volatile memory or nonvolatile memory, may also comprise volatile and non-volatile and deposits
Both reservoirs.Nonvolatile memory can be read-only memory (ROM, Read Only Me mory), and volatile memory can
To be random access memory (RAM, Random Access Memor y).The memory 650 of description of the embodiment of the present invention is intended to
Memory including any suitable type.
In some embodiments, memory 650 can storing data to support various operations, the example of these data includes
Program, module and data structure or its subset or superset, below exemplary illustration.
Operating system 651, including for handle various basic system services and execute hardware dependent tasks system program,
Such as ccf layer, core library layer, driving layer etc., for realizing various basic businesses and the hardware based task of processing;
Network communication module 652, for reaching other calculating via one or more (wired or wireless) network interfaces 620
Equipment, illustrative network interface 620 include: bluetooth, Wireless Fidelity (WiFi) and universal serial bus (USB,
Universal Serial Bus) etc.;
Module 653 is presented, for via one or more associated with user interface 630 output device 631 (for example,
Display screen, loudspeaker etc.) make it possible to present information (for example, for operating peripheral equipment and showing the user of content and information
Interface);
Input processing module 654, for one to one or more from one of one or more input units 632 or
Multiple user's inputs or interaction detect and translate input or interaction detected.
In some embodiments, the training device of image processing model provided in an embodiment of the present invention can use software side
Formula realizes that Fig. 8 shows the training device 655 for the image processing model being stored in memory 650, can be program and inserts
The software of the forms such as part, including following software module: module 6552, object detection mould are chosen in characteristic extracting module 6551, region
Block 6553, loss determining module 6554 and parameter updating module 6555, these modules are in logic, therefore according to being realized
Function can be combined arbitrarily or further split, and the function of modules will be described hereinafter.
In further embodiments, the training device of image processing model provided in an embodiment of the present invention can use hardware
Mode is realized, as an example, the training device of image processing model provided in an embodiment of the present invention can be and be decoded using hardware
The processor of processor form is programmed to perform the training method of image processing model provided in an embodiment of the present invention, example
Such as, the processor of hardware decoding processor form can using one or more application specific integrated circuit (ASIC,
Application Specific Integrated Circuit), DS P, programmable logic device (PLD, Programmable
Logic Device), Complex Programmable Logic Devices (CPLD, Complex Programmable Logic Device), scene
Programmable gate array (FPG A, Field-Programmable Gate Array) or other electronic components.
Next the training method of image processing model provided in an embodiment of the present invention is illustrated.The embodiment of the present invention
The image processing model of offer includes: backbone network, region candidate network and detection network, and Fig. 9 is that the embodiment of the present invention provides
Image processing model training method flow diagram, in some embodiments, which can be by server or end
End is implemented, or by server and terminal coordinated implementation, by taking server implementation as an example, referring to Fig. 9, and figure provided in an embodiment of the present invention
As the training method of processing model includes:
Step 701: server carries out feature extraction by backbone network, to the sample image comprising target object, obtains
The characteristic pattern of sample image.
Here, in actual implementation, backbone network can be convolutional neural networks, can extract to the full figure of sample image
Feature obtains the characteristic pattern of corresponding sample image;
In some embodiments, pre-training can be carried out to backbone network, such as directly existed using image classification data collection
Training obtains the backbone network for being only used for carrying out feature extraction above ImageNet.
Step 702: by region candidate network, region selection being carried out to characteristic pattern, determines candidate region.
In some embodiments, server can carry out region selection to characteristic pattern in the following way, determine candidate region:
By region candidate network, multiple initial encirclement frames of character pair figure are generated;It is scanned by sliding window multiple initial
Frame is surrounded, determines the initial encirclement frame for corresponding to prospect in multiple initial encirclement frames;The initial encirclement frame of corresponding prospect is wrapped
Peripheral frame returns, to determine candidate region.
Here, in actual implementation, the quantity of determining candidate region can be fixed quantity, illustratively, Tu10Wei
The schematic diagram of frame is initially surrounded in characteristic pattern provided in an embodiment of the present invention, region candidate network is slided on characteristic pattern by one
Dynamic window generates 9 kinds of initial encirclement frames for pre-setting length-width ratio and area referring to Figure 10 for each position
(anchor), this 9 kinds initial frames that surround include three kinds of areas (128 × 128,256 × 256,512 × 512), and every kind of area wraps again
Containing three kinds of length-width ratios (1:1,1:2,2:1), in this way, region candidate network generates in the case where characteristic pattern size is 40 × 60
The sum of initial encirclement frame be about 20000 (40 × 60 × 9).After generating multiple initial encirclement frames, region candidate net
Network executes two operations, first is that the initial frame that surrounds of judgement corresponds to prospect or background, second is that the initial encirclement frame for belonging to prospect
First time amendment is carried out, that is, carries out surrounding frame recurrence.
Here, the judgement for corresponding to prospect or background to initial encirclement frame is illustrated.In actual implementation, can set
It hands over and than (IoU, Intersection over Union) threshold value, when initially surrounding frame and target surrounds frame (Ground
Truth IoU) determines that the initial encirclement frame corresponds to prospect more than first threshold (such as 0.7), surrounds frame and target when initial
The IoU of frame is surrounded in second threshold (such as 0.3) with upper and lower, determines that the initial encirclement frame corresponds to background.
It is illustrated to frame recurrence is surrounded.Figure 11 is the schematic illustration that encirclement frame provided in an embodiment of the present invention returns,
It referring to Figure 11, is indicated for surrounding frame using four dimensional vectors (x, y, w, h), respectively indicates the center point coordinate and width for surrounding frame
Height, for Figure 11, the encirclement frame P of number 111 represents initial encirclement frame, and the encirclement frame G of number 112 represents target and surrounds frame, warp
Frame is surrounded to return so that input encirclement frame P, obtains one by mapping and target surrounds the closer recurrence window Z of frame G.
Step 703: by detecting network, target object detection being carried out to candidate region, obtains the position ginseng of target object
Several and classification parameter, classification parameter include the classification results of corresponding target object, and location parameter includes: the packet of corresponding target object
Peripheral frame, segmentation mask and key point mask.
In some embodiments, it after server determines candidate region, also by detection network, intercepts and waits from characteristic pattern
The corresponding characteristic area of favored area, obtains candidate feature region;The feature in candidate feature region is adjusted to the feature of fixed size
Dimension.
The framework of detection network is illustrated, in some embodiments, detection network includes candidate region alignment network
(ROI Align), target detection head (Bbox head), segmentation head (Mask) and key point head (Keypoint head).
In actual implementation, detection network obtains the characteristic area of fixed size in characteristic pattern using ROI Align technology,
For example, the characteristic area of fixed size (7X7), detection network do not use quantization operation in order to obtain, to avoid quantization is introduced
Error, such as 665/32=20.78 detect Web vector graphic 20.78, and do not have to 20 to substitute, and the processing for floating number, inspection
The mode for surveying Web vector graphic bilinear interpolation is handled.
It should be noted that the adjustment of the characteristic dimension for candidate feature region, it can be according to different input target tune
Whole to different fixed size, for example, the input target in corresponding candidate feature region adjusted is adjustable for target detection head
The characteristic dimension in candidate feature region is 7 × 7 × 256, is segmentation head for input target, adjusting candidate characteristic area
Characteristic dimension is 14 × 14 × 256.
In some embodiments, server can obtain encirclement frame of the target object in candidate region in the following way:
The fully-connected network (i.e. target detection head) for including by detecting network, carries out target object to candidate region respectively
Detection determines the candidate region comprising target object;Based on the candidate region comprising target object, carries out surrounding frame recurrence, obtain
To the encirclement frame of corresponding target object.
In some embodiments, corresponding mesh is also exported by detecting the fully-connected network (i.e. target detection head) that network includes
Whether two classification can be used to the classification of target object in embodiments of the present invention in the classification results of mark object, i.e., be target pair
As (such as whether being advertisement position).
Illustratively, the feature for inputting the candidate region of target detection head is 7 × 7 × 256, by this feature according to pixel exhibition
It opens as 12544 dimensional vectors, is 12544 by an input dimension, output dimension is the full articulamentums of 1024 dimensions, obtains 1024 dimensions
Vector, by this vector by output dimension be 1 full articulamentum obtain classification results (such as whether being advertisement position), while by this to
Amount by output dimension be 4 full articulamentum obtain surround frame regression result, i.e., prediction encirclement frame central point transverse and longitudinal coordinate and
Four variables such as high, wide surround the offset of frame relative to target.
In some embodiments, server can obtain segmentation of the target object in candidate region in the following way and cover
Code:
Server carries out corresponding target to candidate region respectively by detecting the convolutional network (segmentation head) that network includes
The semantic segmentation of object generates the segmentation mask of corresponding target object.
Illustratively, the feature of the candidate region on input segmentation head is 14 × 14 × 256, this feature is defeated by 4
Dimension is 256 out, and convolution kernel size is 3 × 3, and the convolutional layer that step-length is 1 obtains one 14 × 14 × 256 feature, then pass through
One output dimension is 256, and convolution kernel size is 3 × 3, the warp lamination that step-length is 2, and resolution ratio expansion is twice, obtains one
A 28 × 28 × 256 feature is 1 finally by an output dimension, and the convolutional layer that convolution kernel size is 1 × 1 is divided
Mask.
In some embodiments, server can obtain key point of the target object in candidate region in the following way and cover
Code:
The full convolutional network (key point head) for including by detecting network carries out corresponding target pair to candidate region respectively
The semantic segmentation of the key point of elephant generates the key point mask of the key point of corresponding target object.
Here, in actual implementation, for different target objects, different key points can be preset, for example, working as target
When object is advertisement position, correspondingly, the key point of target object is four angle points of advertisement position.
Illustratively, the feature for inputting the candidate region on key point head is 14 × 14 × 256, and key point head is by 8
Convolution kernel size is 3 × 3, and output dimension is the convolutional layer composition of 512 dimensions, and being followed by convolution kernel size is 3 × 3, step-length 2,
Convolutional layer and the 2 times of bilinearity sample levels that dimension is 1 are exported, 56 × 56 output resolution ratio is generated.
Step 704: encirclement frame, segmentation mask, key point mask and classification results based on target object determine at image
Manage the value of the target loss function of model.
Here, the target loss function of image processing model is illustrated.The target loss function of image processing model
It is shown below:
L=Ldet+Lmask+βLkeypoint;(1)
Wherein,
Ldet=Lrpn+Lrcnn;(2)
Here, LdetThe loss function of Faster R-CNN is represented, is made of respectively the part RPN and the part R-CNN.Wherein,
The part RPN includes two loss (loss) items, is the prediction probability of classification and the recurrence loss of bounding box, the prediction of classification respectively
Probability is cross entropy, and PRN is two classification, that is, has object type and without object type, the recurrence loss of bounding box is smooth_L1 letter
Number, predict respectively object central point, length, it is high with the central point of mark, length, it is high compared with offset, four variables with (x, y, w,
H) form is returned, and x indicates that central point abscissa, y indicate that central point ordinate, w indicate object width, and h indicates object
Highly.The part R-CNN (i.e. detection head) also includes two loss, is the recurrence of the prediction probability and bounding box of classification respectively
Loss, the prediction probability of classification are cross entropy, and R-CNN is also two classification in target object detection, and it is (such as wide to be divided into target object
Accuse class) and non-targeted object (non-commercial paper), the recurrence loss of bounding box it is identical with RPN.
LmaskFor the average value of the cross entropy of the class probability and label (information marked) of each pixel, Mei Gedian
It is also two classification in object detection task, i.e., the pixel, which belongs to target object or is not belonging to target object, (such as belongs to advertisement
Be not belonging to advertisement), exported using softmax, for each object, classification calculated to all the points in candidate region and is intersected
Entropy is averaged as Lmask。
LkeypointWith LmaskIt is similar, it is also the average value of the cross entropy of the class probability and label of each pixel, it is different
It is LkeypointLabel in be only classified as target object at key point, it is all non-targeted object that other pixels, which punish class,;By
In LkeypointWith LmaskIt is similar, therefore the weight of this loss is adjusted using beta coefficient, play the work for emphasizing key point
With in actual implementation, β value can be set to 5.
In some embodiments, server can be determined as follows the target loss function of image processing model
Value: obtain respectively it is described encirclement frame and target surround frame between the first difference, the segmentation mask and Target Segmentation mask it
Between the second difference, third difference, the classification results and target between the key point mask and target critical point mask
The 4th difference between classification results;Based on the first difference, the second difference, third difference and the 4th difference, image procossing is determined
The value of the loss function of model.
Here, it is respectively pair marked in sample image that target, which surrounds frame, Target Segmentation mask and target critical point mask,
Answer the encirclement frame, segmentation mask and key point mask of target object.
Step 705: based on the value of determining target loss function, updating the model parameter of image processing model.
In some embodiments, server can update the model parameter of image processing model in the following way:
When the value of loss function exceeds preset threshold, corresponding error is determined based on the loss function of image processing model
Signal;The backpropagation in image processing model by error signal, and during propagation update image processing model mould
Shape parameter.
Here backpropagation is illustrated, training sample data is input to the input layer of neural network model, passed through
Hidden layer finally reaches output layer and exports as a result, this is the propagated forward process of neural network model, due to neural network mould
The output result of type and actual result have error, then calculate the error between output result and actual value, and by the error from defeated
Layer is to hidden layer backpropagation out, until input layer is traveled to, during backpropagation, according to error transfer factor model parameter
Value;The continuous iteration above process, until convergence.
In some embodiments, by updating training of the model parameter realization of image processing model to image processing model
Later, the image processing model that training obtains can be used, the detection of target object (such as advertisement position), tool are carried out to images to be recognized
Body, images to be recognized is input to image processing model, by backbone network, feature extraction is carried out to images to be recognized, is obtained
To the characteristic pattern of images to be recognized;By region candidate network, region selection is carried out to characteristic pattern, determines candidate region;Pass through
Network is detected, target object detection is carried out to candidate region, obtains the location parameter and classification parameter of target object.
Next by taking target object is advertisement position as an example, to the training side of image processing model provided in an embodiment of the present invention
Method is illustrated.Figure 12 and Figure 13 is the flow diagram of the training method of image processing model provided in an embodiment of the present invention,
In some embodiments, which can be implemented by server or terminal, or by server and terminal coordinated implementation, with service
For device is implemented, include: in conjunction with the training method referring to Figure 12 and Figure 13, image processing model provided in an embodiment of the present invention
Step 801: server carries out feature extraction by backbone network, to the sample image comprising advertisement position, obtains sample
The characteristic pattern of this image.
Here, in actual implementation, sample image is labelled with following information:
Corresponding testing result/the classification results of advertisement position, that is, belong to advertisement position or be not belonging to advertisement position;
The encirclement frame of corresponding advertisement position;The segmentation mask of corresponding advertisement position;The key point mask of corresponding advertisement position.
Step 802: by region candidate network, region selection being carried out to characteristic pattern, determines the candidate region of fixed quantity.
In actual implementation, server generates multiple initial encirclement frames of character pair figure by region candidate network;It is logical
It crosses sliding window and scans multiple initial encirclement frames, determine the initial encirclement frame for corresponding to prospect in multiple initial encirclement frames;To corresponding prospect
Initial encirclement frame carry out surround frame return, to determine candidate region.
Step 803: by detecting network, the corresponding characteristic area in candidate region is intercepted from characteristic pattern, obtains candidate spy
Region is levied, adjusts the feature in candidate feature region to the characteristic dimension of fixed size.
Step 804: by detecting network, advertisement position detection being carried out to candidate feature region adjusted, is obtained corresponding wide
Accuse the classification results of position and the location parameter of advertisement position.
Here, whether the classification results of advertisement position are advertisement position for it is corresponding to characterize candidate feature region, i.e. classification knot
Fruit include: be advertisement position, be not advertisement position.
Location parameter includes: the encirclement frame, segmentation mask and key point mask of corresponding advertisement position.Here, the pass of advertisement position
Key point is four angle points of advertisement position.
Step 805: the location parameter of testing result and advertisement position based on advertisement position determines the target of image processing model
The value of loss function.
Here, the target loss function of image processing model is referring to formula (1) and formula (2).
Step 806: based on the value of determining target loss function, updating the model parameter of image processing model.
In actual implementation, server can update the model parameter of image processing model in the following way:
When the value of loss function exceeds preset threshold, corresponding error is determined based on the loss function of image processing model
Signal;The backpropagation in image processing model by error signal, and during propagation update image processing model mould
Shape parameter.
Next by taking target object is advertisement position as an example, detection such as to advertisement position in video, advertisement can be poster, have
Frame advertisement and rimless advertisement, referring to Figure 14, Figure 14 is the schematic diagram of advertisement position provided in an embodiment of the present invention detection, image procossing
The purpose of model is to export the position of number 141 in Figure 14.Referring to Figure 12, image processing model base provided in an embodiment of the present invention
It is improved in Mask R-CNN frame, Mask R-CNN frame is the frame divided for image instance, is based on target detection
Frame Faster R-CNN, on the basis of Faster R-CNN, increase a branch for target Pixel-level divide.
Mask R-CNN points are four parts: backbone network (backbone), region candidate network (region proposal n
Etwork, RPN), detection head (bbox head) and divide head (mask head), this frame is by image instance segmentation problem
Decoupling is target detection problems and semantic segmentation problem.
Wherein, backbone network is usually convolutional neural networks (convolution neural network, CNN), is used for
Picture feature is extracted, in actual implementation, the CN N of backbone network training usually on ImageNet, such as pre-training are good
Resnet, Inception V3, DenseNet etc.;Region candidate network (RPN), usually small-sized convolutional neural networks, are used for
It proposes area-of-interest and judges whether the region has object, bounding box recurrence is carried out to the candidate region for being predicted as object;
Head is detected, usually target detection frame R-CNN (regions with CNN features), is used for region candidate net
Having the candidate region of object to carry out obtained in network, further bounding box returns and object category is predicted;Divide head, usually
For full convolutional network (fully convolutional network, FCN), for having object to obtained in region candidate network
The candidate region of body carries out the semantic segmentation of object.
The frame of image processing model provided in an embodiment of the present invention improves Mask R-CNN frame, increases pixel
The critical point detection branch (key point head) of grade, key point head are a full convolutional network (f ully
Convolutional network, FCN) export the key point mask that a resolution ratio is twice of mask of segmentation.Compared with segmentation
Relatively high resolution ratio is needed to export the positioning accuracy of key point rank.
The target detection frame of image is usually two stages detector, and the first stage proposes area-of-interest and carries out to it
Rough location returns, and second stage carries out object classification and further position to the area-of-interest tentatively returned excessively
It puts back into and returns.Comprising the branch to object progress Pixel-level segmentation in image processing model frame provided in an embodiment of the present invention, lead to
Cross full convolutional network to realize, introduce Pixel-level label, enhance feature semantic information and accurate location information, can be obvious
The accuracy of ground raising target detection;Include simultaneously critical point detection head, is realized by full convolutional network, in conjunction with advertisement object
Particularity, detect that four angle points of advertisement may further determine that location advertising, while critical point detection is required than segmentation
Higher output resolution ratio improves the detection accuracy of image processing model.
In image procossing, key point is substantially a kind of feature, is to a fixed area or space physics relationship
Abstractdesription, the combination or context relation in certain contiguous range are described, key point is not only information,
Or a position is represented, more represent the syntagmatic of context and surrounding neighbors.Key point in advertisement is relatively easy to model,
Advertisement is the more regular object of a kind of shape, and usually quadrangle, four angle points are considered four key points.It will close
The position of key point is modeled as an individual on e-hot mask, the corresponding channel of a characteristic point, each channel corresponding one
A characteristic pattern predicts 4 masks using Mask R-CNN.For 4 key points of an example (advertisement position), training objective is
M × m binary mask of one one-hot coding, only one of them pixel are labeled as prospect.
It is 7 by the feature that RPN and ROI Align obtains each candidate region for target detection head referring to Figure 12
× 7 × 2 56, by this feature according to start pixel be 12544 dimensional vectors, by an input dimension be 12544, output dimension
For the full articulamentum of 1024 dimensions, 1024 dimensional vectors are obtained, this vector is obtained into classification knot by the full articulamentum that output dimension is 1
Fruit (whether being advertisement, an only commercial paper), while this vector being surrounded by the full articulamentum that output dimension is 4
Box regression result, i.e. four variables such as the central point transverse and longitudinal coordinate of prediction bounding box and height and width are relative to the inclined of label bounding box
Shifting amount.
It is 14 × 14 × 256 by the feature that RPN and ROI Align obtains each candidate region for dividing head, it will
This feature is 256 by 4 output dimensions, and convolution kernel size is 3 × 3, step-length for 1 convolutional layer, obtain one 14 × 14 ×
256 feature, then passing through an output dimension is 256, convolution kernel size is 3 × 3, the warp lamination that step-length is 2, by resolution ratio
Expansion is twice, and obtains one 28 × 28 × 256 feature, is 1 finally by an output dimension, and convolution kernel size is 1 × 1
Convolutional layer obtains segmentation mask.
It is 14 × 14 × 256 by the feature that RPN and ROI Align obtains each candidate region for key point head,
Key point head is 3 × 3 by 8 convolution kernel sizes, and output dimension is the convolutional layer composition of 512 dimensions, is followed by convolution kernel size
It is 3 × 3, step-length 2, the convolutional layer and 2 times of bilinearity sample levels that output dimension is 1 generates 56 × 56 output resolution ratio, with
Mask, which is compared, needs relatively high resolution ratio to export the positioning accuracy of key point rank.
Next target loss function used by model is handled to training image to be illustrated.The mesh of image processing model
Mark loss function is shown below:
L=Ldet+Lmask+βLkeypoint
Wherein,
Ldet=Lrpn+Lrcnn
LdetThe loss function of Faster R-CNN is represented, is made of respectively the part RPN and the part R-CNN.RPN is wrapped part
It is the prediction probability of classification and the recurrence loss of bounding box respectively containing two loss, the prediction probability of classification is cross entropy,
PRN is two classification, that is, has object type and without object type, the recurrence loss of bounding box is smooth_L1 function, predict object center
Point and the offset grown tall compared with label, four variables are returned in the form of (x, y, w, h), and x indicates the horizontal seat of central point
Mark, y indicate that central point ordinate, w indicate object width, and h indicates object height.The part R-CNN (i.e. detection head) also includes
Two loss, be the prediction probability of classification and the recurrence loss of bounding box respectively, and the prediction probability of classification is cross entropy, R-
CNN is also two classification in location advertising prediction, is divided into commercial paper and non-commercial paper, phase in the recurrence loss and RPN of bounding box
Together.
LmaskFor the average value of the cross entropy of the class probability and label of each pixel, each point is predicted in location advertising
It is also two classification in task, i.e., the pixel belongs to advertisement and is not belonging to advertisement, is exported using softmax.For each
Object calculates classification cross entropy to all the points in candidate region, is averaged as Lmask。
LkeypointWith LmaskIt is similar, it is also the average value of the cross entropy of the class probability and label of each pixel, it is different
It is LkeypointLabel in be only classified as advertisement at key point, it is all non-advertisement that other pixels, which punish class,;Due to Lkeypoint
With LmaskIt is similar, therefore the weight of this loss is adjusted using beta coefficient, play the role of emphasizing key point, in reality
β value can be set to 5 when implementation.
In the training process of image processing model, the above-mentioned target loss function based on image processing model, using with
The mode implementation model training of machine gradient decline.
In practical applications, the image processing model obtained based on training is inputted the video pictures of advertisement position to be detected
Image processing model carries out forward calculation, obtains testing result.
Using the above embodiment of the present invention, since the segmentation mask and key point mask of target object being capable of more accurate tables
The location information of target object in the picture is levied, surrounds the figure that frame, segmentation mask and the training of key point mask obtain so that combining
The target object in image can be more accurately positioned as handling model, improves the detection accuracy of target object.
Continue to be illustrated the software realization of the training device of image processing model provided in an embodiment of the present invention.Referring to
The training device of Fig. 8, image processing model provided in an embodiment of the present invention includes:
Characteristic extracting module, for carrying out feature to the sample image comprising target object and mentioning by the backbone network
It takes, obtains the characteristic pattern of the sample image;
Module is chosen in region, for carrying out region selection to the characteristic pattern, determining and wait by the region candidate network
Favored area;
Obj ect detection module, for carrying out target object detection to the candidate region, obtaining by the detection network
Location parameter of the target object in the candidate region, the location parameter include: the packet of the corresponding target object
Peripheral frame, segmentation mask and key point mask;
Determining module is lost, for encirclement frame, segmentation mask and key point mask based on the target object, determines institute
State the value of the target loss function of image processing model;
Parameter updating module updates described image and handles model for the value based on the determining target loss function
Model parameter.
In some embodiments, module is chosen in the region, for generating described in corresponding to by the region candidate network
Multiple initial encirclement frames of characteristic pattern;
The multiple initial encirclement frame is scanned by sliding window, determines in the multiple initial encirclement frame and corresponds to the initial of prospect
Surround frame;
The initial encirclement frame of the corresponding prospect is carried out to surround frame recurrence, to determine candidate region.
In some embodiments, described device further include:
Divide module, for intercepting the corresponding spy in the candidate region from the characteristic pattern by the detection network
Region is levied, candidate feature region is obtained;
The feature in the candidate feature region is adjusted to the characteristic dimension of fixed size.
In some embodiments, the obj ect detection module is also used to the fully connected network for including by the detection network
Network carries out target object detection to the candidate region respectively, determines the candidate region comprising the target object;
Based on the candidate region comprising the target object, carry out surrounding frame recurrence, the corresponding target object
Surround frame.
In some embodiments, the obj ect detection module is also used to the convolutional network for including by the detection network,
Correspond to the candidate region semantic segmentation of the target object respectively, the segmentation for generating the corresponding target object is covered
Code.
In some embodiments, the obj ect detection module is also used to the full convolution net for including by the detection network
Network correspond to the candidate region semantic segmentation of the key point of the target object respectively, generates the corresponding target
The key point mask of the key point of object.
In some embodiments, the loss determining module, is also used to obtain the encirclement frame respectively and target surrounds frame
Between the first difference, it is described segmentation mask and Target Segmentation mask between the second difference, the key point mask and target
Third difference between key point mask;
Based on first difference, second difference and the third difference, the damage of described image processing model is determined
Lose the value of function.
In some embodiments, the parameter updating module is also used to the value when the loss function beyond preset threshold
When, corresponding error signal is determined based on the loss function of described image processing model;
By error signal backpropagation in described image processing model, and the figure is updated during propagation
As the model parameter of processing model.
It need to be noted that: above is referred to the description of device, be with above method description it is similar, with having for method
Beneficial effect description, does not repeat them here, for undisclosed technical detail in described device of the embodiment of the present invention, please refers to present invention side
The description of method embodiment.
The embodiment of the invention also provides a kind of electronic equipment, the electronic equipment includes:
Memory, for storing executable program;
Processor when for executing the executable program stored in the memory, is realized provided in an embodiment of the present invention
The training method of above-mentioned image processing model.
The embodiment of the present invention also provides a kind of storage medium for being stored with executable instruction, wherein being stored with executable finger
It enables, when executable instruction is executed by processor, processor will be caused to execute image processing model provided in an embodiment of the present invention
Training method.
This can be accomplished by hardware associated with program instructions for all or part of the steps of embodiment, and program above-mentioned can be with
It is stored in a computer readable storage medium, which when being executed, executes step including the steps of the foregoing method embodiments;And
Storage medium above-mentioned includes: movable storage device, random access memory (RAM, Random Access Memory), read-only
The various media that can store program code such as memory (ROM, Read-Only Memory), magnetic or disk.
If alternatively, the above-mentioned integrated unit of the present invention is realized in the form of software function module and as independent product
When selling or using, it also can store in a computer readable storage medium.Based on this understanding, the present invention is implemented
The technical solution of example substantially in other words can be embodied in the form of software products the part that the relevant technologies contribute,
The computer software product is stored in a storage medium, including some instructions are used so that computer equipment (can be with
It is personal computer, server or network equipment etc.) execute all or part of each embodiment the method for the present invention.
And storage medium above-mentioned includes: that movable storage device, RAM, ROM, magnetic or disk etc. are various can store program code
Medium.
The above, only the embodiment of the present invention, are not intended to limit the scope of the present invention.It is all in this hair
Made any modifications, equivalent replacements, and improvements etc. within bright spirit and scope, be all contained in protection scope of the present invention it
It is interior.
Claims (10)
1. a kind of training method of image processing model, which is characterized in that it includes: backbone network, area that described image, which handles model,
Domain candidate network and detection network, which comprises
By the backbone network, feature extraction is carried out to the sample image comprising target object, obtains the sample image
Characteristic pattern;
By the region candidate network, region selection is carried out to the characteristic pattern, determines candidate region;
By the detection network, target object detection is carried out to the candidate region, obtains the position ginseng of the target object
Several and classification parameter, the classification parameter include the classification results of the corresponding target object, and the location parameter includes: correspondence
Encirclement frame, segmentation mask and the key point mask of the target object;
Encirclement frame, segmentation mask, key point mask and classification results based on the target object, determine that described image handles mould
The value of the target loss function of type;
Based on the value of the determining target loss function, the model parameter of described image processing model is updated.
2. the method according to claim 1, wherein described by the region candidate network, to the feature
Figure carries out region selection, determines candidate region, comprising:
By the region candidate network, multiple initial encirclement frames of the corresponding characteristic pattern are generated;
The multiple initial encirclement frame is scanned by sliding window, determines the initial encirclement for corresponding to prospect in the multiple initial encirclement frame
Frame;
The initial encirclement frame of the corresponding prospect is carried out to surround frame recurrence, to determine candidate region.
3. the method according to claim 1, wherein the method also includes:
By the detection network, the corresponding characteristic area in the candidate region is intercepted from the characteristic pattern, obtains candidate spy
Levy region;
The feature in the candidate feature region is adjusted to the characteristic dimension of fixed size.
4. the method according to claim 1, wherein described by the detection network, to the candidate region
Target object detection is carried out, the location parameter of the target object is obtained, comprising:
The fully-connected network for including by the detection network, carries out target object detection to the candidate region respectively, determines
Candidate region comprising the target object;
Based on the candidate region comprising the target object, carries out surrounding frame recurrence, obtain corresponding to the target object
Surround frame.
5. the method according to claim 1, wherein described by the detection network, to the candidate region
Target object detection is carried out, the location parameter of the target object is obtained, comprising:
The convolutional network for including by the detection network, correspond to the candidate region language of the target object respectively
Justice segmentation generates the segmentation mask of the corresponding target object.
6. the method according to claim 1, wherein described by the detection network, to the candidate region
Target object detection is carried out, the location parameter of the target object is obtained, comprising:
The full convolutional network for including by the detection network, carries out the candidate region to correspond to the target object respectively
The semantic segmentation of key point generates the key point mask of the key point of the corresponding target object.
7. the method according to claim 1, wherein the encirclement frame based on the target object, segmentation are covered
Code, key point mask and classification results determine the value of the target loss function of described image processing model, comprising:
Obtain respectively it is described encirclement frame and target surround frame between the first difference, the segmentation mask and Target Segmentation mask it
Between the second difference, third difference, the classification results and target between the key point mask and target critical point mask
The 4th difference between classification results;
Based on first difference, second difference, the third difference and the 4th difference, determine that described image is handled
The value of the loss function of model.
8. the method according to claim 1, wherein the value based on the determining target loss function,
Update the model parameter of described image processing model, comprising:
When the value of the loss function exceeds preset threshold, determined accordingly based on the loss function of described image processing model
Error signal;
By error signal backpropagation in described image processing model, and updated at described image during propagation
Manage the model parameter of model.
9. a kind of training device of image processing model, which is characterized in that described device includes:
Characteristic extracting module, for carrying out feature extraction to the sample image comprising target object, obtaining by the backbone network
To the characteristic pattern of the sample image;
Module is chosen in region, for carrying out region selection to the characteristic pattern, determining candidate regions by the region candidate network
Domain;
Obj ect detection module obtains described for carrying out target object detection to the candidate region by the detection network
The location parameter and classification parameter of target object, the classification parameter includes the classification results of the corresponding target object, described
Location parameter includes: the encirclement frame, segmentation mask and key point mask of the corresponding target object;
Determining module is lost, for encirclement frame, segmentation mask, key point mask and classification results based on the target object,
Determine the value of the target loss function of described image processing model;
Parameter updating module updates the mould of described image processing model for the value based on the determining target loss function
Shape parameter.
10. a kind of storage medium, which is characterized in that the storage medium is stored with executable instruction, for causing processor to be held
When row, the training method of image processing model described in any item of the claim 1 to 8 is realized.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910798468.2A CN110503097A (en) | 2019-08-27 | 2019-08-27 | Training method, device and the storage medium of image processing model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910798468.2A CN110503097A (en) | 2019-08-27 | 2019-08-27 | Training method, device and the storage medium of image processing model |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110503097A true CN110503097A (en) | 2019-11-26 |
Family
ID=68589972
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910798468.2A Pending CN110503097A (en) | 2019-08-27 | 2019-08-27 | Training method, device and the storage medium of image processing model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110503097A (en) |
Cited By (55)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111008622A (en) * | 2020-03-11 | 2020-04-14 | 腾讯科技(深圳)有限公司 | Image object detection method and device and computer readable storage medium |
CN111027621A (en) * | 2019-12-09 | 2020-04-17 | 上海扩博智能技术有限公司 | Training method, system, equipment and storage medium of image recognition model |
CN111127502A (en) * | 2019-12-10 | 2020-05-08 | 北京地平线机器人技术研发有限公司 | Method and device for generating instance mask and electronic equipment |
CN111160434A (en) * | 2019-12-19 | 2020-05-15 | 中国平安人寿保险股份有限公司 | Training method and device of target detection model and computer readable storage medium |
CN111340092A (en) * | 2020-02-21 | 2020-06-26 | 浙江大华技术股份有限公司 | Target association processing method and device |
CN111341438A (en) * | 2020-02-25 | 2020-06-26 | 中国科学技术大学 | Image processing apparatus, electronic device, and medium |
CN111401376A (en) * | 2020-03-12 | 2020-07-10 | 腾讯科技(深圳)有限公司 | Target detection method, target detection device, electronic equipment and storage medium |
CN111428875A (en) * | 2020-03-11 | 2020-07-17 | 北京三快在线科技有限公司 | Image recognition method and device and corresponding model training method and device |
CN111462060A (en) * | 2020-03-24 | 2020-07-28 | 湖南大学 | Method and device for detecting standard section image in fetal ultrasonic image |
CN111462094A (en) * | 2020-04-03 | 2020-07-28 | 联觉(深圳)科技有限公司 | PCBA component detection method and device and computer readable storage medium |
CN111488911A (en) * | 2020-03-15 | 2020-08-04 | 北京理工大学 | Image entity extraction method based on Mask R-CNN and GAN |
CN111709471A (en) * | 2020-06-12 | 2020-09-25 | 腾讯科技(深圳)有限公司 | Object detection model training method and object detection method and device |
CN111754532A (en) * | 2020-08-12 | 2020-10-09 | 腾讯科技(深圳)有限公司 | Image segmentation model searching method and device, computer equipment and storage medium |
CN111860413A (en) * | 2020-07-29 | 2020-10-30 | Oppo广东移动通信有限公司 | Target object detection method and device, electronic equipment and storage medium |
CN111860522A (en) * | 2020-07-23 | 2020-10-30 | 中国平安人寿保险股份有限公司 | Identity card picture processing method and device, terminal and storage medium |
CN111932530A (en) * | 2020-09-18 | 2020-11-13 | 北京百度网讯科技有限公司 | Three-dimensional object detection method, device and equipment and readable storage medium |
CN111932545A (en) * | 2020-07-14 | 2020-11-13 | 浙江大华技术股份有限公司 | Image processing method, target counting method and related device thereof |
CN111985488A (en) * | 2020-09-01 | 2020-11-24 | 江苏方天电力技术有限公司 | Target detection segmentation method and system based on offline Gaussian model |
CN112200115A (en) * | 2020-10-21 | 2021-01-08 | 平安国际智慧城市科技股份有限公司 | Face recognition training method, recognition method, device, equipment and storage medium |
CN112232346A (en) * | 2020-09-02 | 2021-01-15 | 北京迈格威科技有限公司 | Semantic segmentation model training method and device and image semantic segmentation method and device |
CN112434715A (en) * | 2020-12-10 | 2021-03-02 | 腾讯科技(深圳)有限公司 | Target identification method and device based on artificial intelligence and storage medium |
CN112581567A (en) * | 2020-12-25 | 2021-03-30 | 腾讯科技(深圳)有限公司 | Image processing method, image processing device, electronic equipment and computer readable storage medium |
CN112613560A (en) * | 2020-12-24 | 2021-04-06 | 哈尔滨市科佳通用机电股份有限公司 | Method for identifying front opening and closing damage fault of railway bullet train head cover based on Faster R-CNN |
CN112949510A (en) * | 2021-03-08 | 2021-06-11 | 香港理工大学深圳研究院 | Human detection method based on fast R-CNN thermal infrared image |
CN112967200A (en) * | 2021-03-05 | 2021-06-15 | 北京字跳网络技术有限公司 | Image processing method, apparatus, electronic device, medium, and computer program product |
CN113096134A (en) * | 2020-01-09 | 2021-07-09 | 舜宇光学(浙江)研究院有限公司 | Real-time instance segmentation method based on single-stage network, system and electronic equipment thereof |
CN113111872A (en) * | 2021-06-16 | 2021-07-13 | 智道网联科技(北京)有限公司 | Training method and device of image recognition model, electronic equipment and storage medium |
CN113139441A (en) * | 2021-04-07 | 2021-07-20 | 青岛以萨数据技术有限公司 | Image processing method and system |
CN113139546A (en) * | 2020-01-19 | 2021-07-20 | 北京达佳互联信息技术有限公司 | Training method of image segmentation model, and image segmentation method and device |
CN113313720A (en) * | 2021-06-30 | 2021-08-27 | 上海商汤科技开发有限公司 | Object segmentation method and device |
CN113421275A (en) * | 2021-05-13 | 2021-09-21 | 影石创新科技股份有限公司 | Image processing method, image processing device, computer equipment and storage medium |
CN113449538A (en) * | 2020-03-24 | 2021-09-28 | 顺丰科技有限公司 | Visual model training method, device, equipment and storage medium |
CN113470124A (en) * | 2021-06-30 | 2021-10-01 | 北京达佳互联信息技术有限公司 | Training method and device of special effect model and special effect generation method and device |
CN113468908A (en) * | 2020-03-30 | 2021-10-01 | 北京四维图新科技股份有限公司 | Target identification method and device |
CN113496158A (en) * | 2020-03-20 | 2021-10-12 | 中移(上海)信息通信科技有限公司 | Object detection model optimization method, device, equipment and storage medium |
CN113591893A (en) * | 2021-01-26 | 2021-11-02 | 腾讯医疗健康(深圳)有限公司 | Image processing method and device based on artificial intelligence and computer equipment |
CN113628208A (en) * | 2021-08-30 | 2021-11-09 | 北京中星天视科技有限公司 | Ship detection method, device, electronic equipment and computer readable medium |
CN113673505A (en) * | 2021-06-29 | 2021-11-19 | 北京旷视科技有限公司 | Example segmentation model training method, device and system and storage medium |
CN113695256A (en) * | 2021-08-18 | 2021-11-26 | 国网江苏省电力有限公司电力科学研究院 | Power grid foreign matter detection and identification method and device |
CN113807147A (en) * | 2020-06-15 | 2021-12-17 | 北京达佳互联信息技术有限公司 | Target detection and network training method and device |
CN113822302A (en) * | 2020-06-18 | 2021-12-21 | 北京金山数字娱乐科技有限公司 | Training method and device for target detection model |
CN113837205A (en) * | 2021-09-28 | 2021-12-24 | 北京有竹居网络技术有限公司 | Method, apparatus, device and medium for image feature representation generation |
CN114419337A (en) * | 2022-03-25 | 2022-04-29 | 阿里巴巴达摩院(杭州)科技有限公司 | Image detection method, three-dimensional modeling method, image analysis method and device |
CN114708569A (en) * | 2022-02-22 | 2022-07-05 | 广州文远知行科技有限公司 | Road curve detection method, device, equipment and storage medium |
WO2022178833A1 (en) * | 2021-02-26 | 2022-09-01 | 京东方科技集团股份有限公司 | Target detection network training method, target detection method, and apparatus |
CN115049899A (en) * | 2022-08-16 | 2022-09-13 | 粤港澳大湾区数字经济研究院(福田) | Model training method, reference expression generation method and related equipment |
CN115115825A (en) * | 2022-05-27 | 2022-09-27 | 腾讯科技(深圳)有限公司 | Method and device for detecting object in image, computer equipment and storage medium |
WO2022213718A1 (en) * | 2021-04-07 | 2022-10-13 | 北京百度网讯科技有限公司 | Sample image increment method, image detection model training method, and image detection method |
CN116152758A (en) * | 2023-04-25 | 2023-05-23 | 松立控股集团股份有限公司 | Intelligent real-time accident detection and vehicle tracking method |
CN116523903A (en) * | 2023-06-25 | 2023-08-01 | 天津师范大学 | Multi-mode fracture injury detection and identification method and system |
CN116563665A (en) * | 2023-04-25 | 2023-08-08 | 北京百度网讯科技有限公司 | Training method of target detection model, target detection method, device and equipment |
EP4181067A4 (en) * | 2020-11-30 | 2023-12-27 | Samsung Electronics Co., Ltd. | Device and method for ai encoding and ai decoding of image |
CN117523320A (en) * | 2024-01-03 | 2024-02-06 | 深圳金三立视频科技股份有限公司 | Image classification model training method and terminal based on key points |
WO2024050207A1 (en) * | 2022-08-27 | 2024-03-07 | Qualcomm Incorporated | Online adaptation of segmentation machine learning systems |
US12002254B2 (en) | 2021-02-26 | 2024-06-04 | Boe Technology Group Co., Ltd. | Method and apparatus of training object detection network and object detection method and apparatus |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1453747A (en) * | 2002-04-25 | 2003-11-05 | 微软公司 | Cluster |
US6842538B2 (en) * | 2001-03-23 | 2005-01-11 | Shih-Jong J. Lee | Automatic detection of alignment or registration marks |
CN1630439A (en) * | 2004-05-06 | 2005-06-22 | 友达光电股份有限公司 | Separate type mask device for manufacturing OLED display |
CN1941850A (en) * | 2005-09-29 | 2007-04-04 | 中国科学院自动化研究所 | Pedestrian tracting method based on principal axis marriage under multiple vedio cameras |
CN104835150A (en) * | 2015-04-23 | 2015-08-12 | 深圳大学 | Learning-based eyeground blood vessel geometric key point image processing method and apparatus |
CN106203423A (en) * | 2016-06-26 | 2016-12-07 | 广东外语外贸大学 | A kind of weak structure perception visual target tracking method of integrating context detection |
CN106887008A (en) * | 2017-01-04 | 2017-06-23 | 努比亚技术有限公司 | A kind of method for realizing interactive image segmentation, device and terminal |
EP3493105A1 (en) * | 2017-12-03 | 2019-06-05 | Facebook, Inc. | Optimizations for dynamic object instance detection, segmentation, and structure mapping |
-
2019
- 2019-08-27 CN CN201910798468.2A patent/CN110503097A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6842538B2 (en) * | 2001-03-23 | 2005-01-11 | Shih-Jong J. Lee | Automatic detection of alignment or registration marks |
CN1453747A (en) * | 2002-04-25 | 2003-11-05 | 微软公司 | Cluster |
CN1630439A (en) * | 2004-05-06 | 2005-06-22 | 友达光电股份有限公司 | Separate type mask device for manufacturing OLED display |
CN1941850A (en) * | 2005-09-29 | 2007-04-04 | 中国科学院自动化研究所 | Pedestrian tracting method based on principal axis marriage under multiple vedio cameras |
CN104835150A (en) * | 2015-04-23 | 2015-08-12 | 深圳大学 | Learning-based eyeground blood vessel geometric key point image processing method and apparatus |
CN106203423A (en) * | 2016-06-26 | 2016-12-07 | 广东外语外贸大学 | A kind of weak structure perception visual target tracking method of integrating context detection |
CN106887008A (en) * | 2017-01-04 | 2017-06-23 | 努比亚技术有限公司 | A kind of method for realizing interactive image segmentation, device and terminal |
EP3493105A1 (en) * | 2017-12-03 | 2019-06-05 | Facebook, Inc. | Optimizations for dynamic object instance detection, segmentation, and structure mapping |
Non-Patent Citations (5)
Title |
---|
KAIMING HE 等: "Mask R-CNN", 《ARXIV:1703.06870V3》 * |
是否龙磊磊真的一无所有: "基于CNN目标检测方法(RCNN,Fast-RCNN,Faster-RCNN,Mask-RCNN,YOLO,SSD)行人检测,目标追踪,卷积神经网络", 《《HTTPS://BLOG.CSDN.NET/QQ_32998593/ARTICLE/DETAILS/80558449》 * |
曾星 等: "基于深度图像的嵌入式人体坐姿检测***的实现", 《计算机测量与控制》 * |
纸上得来终觉浅: "目标检测中的RPN区域算法", 《HTTPS://BLOG.CSDN.NET/QQ_32172681/ARTICLE/DETAILS/99104310》 * |
苦尽甘来定不负生而善之: "RPN", 《HTTPS://WWW.CNBLOGS.COM/PACINO12134/P/11409620.HTML》 * |
Cited By (77)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111027621A (en) * | 2019-12-09 | 2020-04-17 | 上海扩博智能技术有限公司 | Training method, system, equipment and storage medium of image recognition model |
CN111127502A (en) * | 2019-12-10 | 2020-05-08 | 北京地平线机器人技术研发有限公司 | Method and device for generating instance mask and electronic equipment |
CN111127502B (en) * | 2019-12-10 | 2023-08-29 | 北京地平线机器人技术研发有限公司 | Method and device for generating instance mask and electronic equipment |
CN111160434B (en) * | 2019-12-19 | 2024-06-07 | 中国平安人寿保险股份有限公司 | Training method and device for target detection model and computer readable storage medium |
CN111160434A (en) * | 2019-12-19 | 2020-05-15 | 中国平安人寿保险股份有限公司 | Training method and device of target detection model and computer readable storage medium |
CN113096134A (en) * | 2020-01-09 | 2021-07-09 | 舜宇光学(浙江)研究院有限公司 | Real-time instance segmentation method based on single-stage network, system and electronic equipment thereof |
CN113139546A (en) * | 2020-01-19 | 2021-07-20 | 北京达佳互联信息技术有限公司 | Training method of image segmentation model, and image segmentation method and device |
CN111340092A (en) * | 2020-02-21 | 2020-06-26 | 浙江大华技术股份有限公司 | Target association processing method and device |
CN111340092B (en) * | 2020-02-21 | 2023-09-22 | 浙江大华技术股份有限公司 | Target association processing method and device |
CN111341438A (en) * | 2020-02-25 | 2020-06-26 | 中国科学技术大学 | Image processing apparatus, electronic device, and medium |
CN111008622B (en) * | 2020-03-11 | 2020-06-12 | 腾讯科技(深圳)有限公司 | Image object detection method and device and computer readable storage medium |
CN111008622A (en) * | 2020-03-11 | 2020-04-14 | 腾讯科技(深圳)有限公司 | Image object detection method and device and computer readable storage medium |
CN111428875A (en) * | 2020-03-11 | 2020-07-17 | 北京三快在线科技有限公司 | Image recognition method and device and corresponding model training method and device |
CN111401376B (en) * | 2020-03-12 | 2023-06-30 | 腾讯科技(深圳)有限公司 | Target detection method, target detection device, electronic equipment and storage medium |
CN111401376A (en) * | 2020-03-12 | 2020-07-10 | 腾讯科技(深圳)有限公司 | Target detection method, target detection device, electronic equipment and storage medium |
CN111488911A (en) * | 2020-03-15 | 2020-08-04 | 北京理工大学 | Image entity extraction method based on Mask R-CNN and GAN |
CN113496158A (en) * | 2020-03-20 | 2021-10-12 | 中移(上海)信息通信科技有限公司 | Object detection model optimization method, device, equipment and storage medium |
CN111462060A (en) * | 2020-03-24 | 2020-07-28 | 湖南大学 | Method and device for detecting standard section image in fetal ultrasonic image |
CN113449538A (en) * | 2020-03-24 | 2021-09-28 | 顺丰科技有限公司 | Visual model training method, device, equipment and storage medium |
CN113468908B (en) * | 2020-03-30 | 2024-05-10 | 北京四维图新科技股份有限公司 | Target identification method and device |
CN113468908A (en) * | 2020-03-30 | 2021-10-01 | 北京四维图新科技股份有限公司 | Target identification method and device |
CN111462094A (en) * | 2020-04-03 | 2020-07-28 | 联觉(深圳)科技有限公司 | PCBA component detection method and device and computer readable storage medium |
CN111709471A (en) * | 2020-06-12 | 2020-09-25 | 腾讯科技(深圳)有限公司 | Object detection model training method and object detection method and device |
CN113807147B (en) * | 2020-06-15 | 2024-05-21 | 北京达佳互联信息技术有限公司 | Target detection and network training method and device thereof |
CN113807147A (en) * | 2020-06-15 | 2021-12-17 | 北京达佳互联信息技术有限公司 | Target detection and network training method and device |
CN113822302A (en) * | 2020-06-18 | 2021-12-21 | 北京金山数字娱乐科技有限公司 | Training method and device for target detection model |
CN111932545A (en) * | 2020-07-14 | 2020-11-13 | 浙江大华技术股份有限公司 | Image processing method, target counting method and related device thereof |
CN111860522B (en) * | 2020-07-23 | 2024-02-02 | 中国平安人寿保险股份有限公司 | Identity card picture processing method, device, terminal and storage medium |
CN111860522A (en) * | 2020-07-23 | 2020-10-30 | 中国平安人寿保险股份有限公司 | Identity card picture processing method and device, terminal and storage medium |
CN111860413A (en) * | 2020-07-29 | 2020-10-30 | Oppo广东移动通信有限公司 | Target object detection method and device, electronic equipment and storage medium |
CN111754532B (en) * | 2020-08-12 | 2023-07-11 | 腾讯科技(深圳)有限公司 | Image segmentation model searching method, device, computer equipment and storage medium |
CN111754532A (en) * | 2020-08-12 | 2020-10-09 | 腾讯科技(深圳)有限公司 | Image segmentation model searching method and device, computer equipment and storage medium |
CN111985488A (en) * | 2020-09-01 | 2020-11-24 | 江苏方天电力技术有限公司 | Target detection segmentation method and system based on offline Gaussian model |
CN111985488B (en) * | 2020-09-01 | 2022-06-10 | 江苏方天电力技术有限公司 | Target detection segmentation method and system based on offline Gaussian model |
WO2022048151A1 (en) * | 2020-09-02 | 2022-03-10 | 北京迈格威科技有限公司 | Semantic segmentation model training method and apparatus, and image semantic segmentation method and apparatus |
CN112232346A (en) * | 2020-09-02 | 2021-01-15 | 北京迈格威科技有限公司 | Semantic segmentation model training method and device and image semantic segmentation method and device |
CN111932530B (en) * | 2020-09-18 | 2024-02-23 | 北京百度网讯科技有限公司 | Three-dimensional object detection method, device, equipment and readable storage medium |
CN111932530A (en) * | 2020-09-18 | 2020-11-13 | 北京百度网讯科技有限公司 | Three-dimensional object detection method, device and equipment and readable storage medium |
CN112200115A (en) * | 2020-10-21 | 2021-01-08 | 平安国际智慧城市科技股份有限公司 | Face recognition training method, recognition method, device, equipment and storage medium |
CN112200115B (en) * | 2020-10-21 | 2024-04-19 | 平安国际智慧城市科技股份有限公司 | Face recognition training method, recognition method, device, equipment and storage medium |
EP4181067A4 (en) * | 2020-11-30 | 2023-12-27 | Samsung Electronics Co., Ltd. | Device and method for ai encoding and ai decoding of image |
CN112434715A (en) * | 2020-12-10 | 2021-03-02 | 腾讯科技(深圳)有限公司 | Target identification method and device based on artificial intelligence and storage medium |
CN112613560A (en) * | 2020-12-24 | 2021-04-06 | 哈尔滨市科佳通用机电股份有限公司 | Method for identifying front opening and closing damage fault of railway bullet train head cover based on Faster R-CNN |
CN112581567B (en) * | 2020-12-25 | 2024-05-28 | 腾讯科技(深圳)有限公司 | Image processing method, device, electronic equipment and computer readable storage medium |
CN112581567A (en) * | 2020-12-25 | 2021-03-30 | 腾讯科技(深圳)有限公司 | Image processing method, image processing device, electronic equipment and computer readable storage medium |
CN113591893A (en) * | 2021-01-26 | 2021-11-02 | 腾讯医疗健康(深圳)有限公司 | Image processing method and device based on artificial intelligence and computer equipment |
US12002254B2 (en) | 2021-02-26 | 2024-06-04 | Boe Technology Group Co., Ltd. | Method and apparatus of training object detection network and object detection method and apparatus |
WO2022178833A1 (en) * | 2021-02-26 | 2022-09-01 | 京东方科技集团股份有限公司 | Target detection network training method, target detection method, and apparatus |
CN112967200A (en) * | 2021-03-05 | 2021-06-15 | 北京字跳网络技术有限公司 | Image processing method, apparatus, electronic device, medium, and computer program product |
CN112949510A (en) * | 2021-03-08 | 2021-06-11 | 香港理工大学深圳研究院 | Human detection method based on fast R-CNN thermal infrared image |
WO2022213718A1 (en) * | 2021-04-07 | 2022-10-13 | 北京百度网讯科技有限公司 | Sample image increment method, image detection model training method, and image detection method |
CN113139441A (en) * | 2021-04-07 | 2021-07-20 | 青岛以萨数据技术有限公司 | Image processing method and system |
CN113421275A (en) * | 2021-05-13 | 2021-09-21 | 影石创新科技股份有限公司 | Image processing method, image processing device, computer equipment and storage medium |
CN113111872A (en) * | 2021-06-16 | 2021-07-13 | 智道网联科技(北京)有限公司 | Training method and device of image recognition model, electronic equipment and storage medium |
CN113111872B (en) * | 2021-06-16 | 2022-04-05 | 智道网联科技(北京)有限公司 | Training method and device of image recognition model, electronic equipment and storage medium |
CN113673505A (en) * | 2021-06-29 | 2021-11-19 | 北京旷视科技有限公司 | Example segmentation model training method, device and system and storage medium |
CN113313720A (en) * | 2021-06-30 | 2021-08-27 | 上海商汤科技开发有限公司 | Object segmentation method and device |
CN113313720B (en) * | 2021-06-30 | 2024-03-29 | 上海商汤科技开发有限公司 | Object segmentation method and device |
CN113470124A (en) * | 2021-06-30 | 2021-10-01 | 北京达佳互联信息技术有限公司 | Training method and device of special effect model and special effect generation method and device |
CN113470124B (en) * | 2021-06-30 | 2023-09-22 | 北京达佳互联信息技术有限公司 | Training method and device for special effect model, and special effect generation method and device |
CN113695256A (en) * | 2021-08-18 | 2021-11-26 | 国网江苏省电力有限公司电力科学研究院 | Power grid foreign matter detection and identification method and device |
CN113695256B (en) * | 2021-08-18 | 2023-05-23 | 国网江苏省电力有限公司电力科学研究院 | Power grid foreign matter detection and identification method and device |
CN113628208A (en) * | 2021-08-30 | 2021-11-09 | 北京中星天视科技有限公司 | Ship detection method, device, electronic equipment and computer readable medium |
CN113628208B (en) * | 2021-08-30 | 2024-02-06 | 北京中星天视科技有限公司 | Ship detection method, device, electronic equipment and computer readable medium |
CN113837205A (en) * | 2021-09-28 | 2021-12-24 | 北京有竹居网络技术有限公司 | Method, apparatus, device and medium for image feature representation generation |
CN114708569A (en) * | 2022-02-22 | 2022-07-05 | 广州文远知行科技有限公司 | Road curve detection method, device, equipment and storage medium |
CN114419337A (en) * | 2022-03-25 | 2022-04-29 | 阿里巴巴达摩院(杭州)科技有限公司 | Image detection method, three-dimensional modeling method, image analysis method and device |
CN115115825A (en) * | 2022-05-27 | 2022-09-27 | 腾讯科技(深圳)有限公司 | Method and device for detecting object in image, computer equipment and storage medium |
CN115115825B (en) * | 2022-05-27 | 2024-05-03 | 腾讯科技(深圳)有限公司 | Method, device, computer equipment and storage medium for detecting object in image |
CN115049899B (en) * | 2022-08-16 | 2022-11-11 | 粤港澳大湾区数字经济研究院(福田) | Model training method, reference expression generation method and related equipment |
CN115049899A (en) * | 2022-08-16 | 2022-09-13 | 粤港澳大湾区数字经济研究院(福田) | Model training method, reference expression generation method and related equipment |
WO2024050207A1 (en) * | 2022-08-27 | 2024-03-07 | Qualcomm Incorporated | Online adaptation of segmentation machine learning systems |
CN116563665A (en) * | 2023-04-25 | 2023-08-08 | 北京百度网讯科技有限公司 | Training method of target detection model, target detection method, device and equipment |
CN116152758A (en) * | 2023-04-25 | 2023-05-23 | 松立控股集团股份有限公司 | Intelligent real-time accident detection and vehicle tracking method |
CN116523903A (en) * | 2023-06-25 | 2023-08-01 | 天津师范大学 | Multi-mode fracture injury detection and identification method and system |
CN117523320B (en) * | 2024-01-03 | 2024-05-24 | 深圳金三立视频科技股份有限公司 | Image classification model training method and terminal based on key points |
CN117523320A (en) * | 2024-01-03 | 2024-02-06 | 深圳金三立视频科技股份有限公司 | Image classification model training method and terminal based on key points |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110503097A (en) | Training method, device and the storage medium of image processing model | |
CN110163640B (en) | Method for implanting advertisement in video and computer equipment | |
CN111670457B (en) | Optimization of dynamic object instance detection, segmentation and structure mapping | |
US20190172223A1 (en) | Optimizations for Dynamic Object Instance Detection, Segmentation, and Structure Mapping | |
CN110390644A (en) | The bigger sense of reality is added by the image that smooth jagged edges are generated to computer | |
CN108520229A (en) | Image detecting method, device, electronic equipment and computer-readable medium | |
CN108229355A (en) | Activity recognition method and apparatus, electronic equipment, computer storage media, program | |
CN108229303A (en) | Detection identification and the detection identification training method of network and device, equipment, medium | |
CN110321952A (en) | A kind of training method and relevant device of image classification model | |
CN109165645A (en) | A kind of image processing method, device and relevant device | |
EP3493106A1 (en) | Optimizations for dynamic object instance detection, segmentation, and structure mapping | |
CN111274981B (en) | Target detection network construction method and device and target detection method | |
CN108280451A (en) | Semantic segmentation and network training method and device, equipment, medium, program | |
WO2019108250A1 (en) | Optimizations for dynamic object instance detection, segmentation, and structure mapping | |
JPWO2020240808A1 (en) | Learning device, classification device, learning method, classification method, learning program, and classification program | |
US20200320165A1 (en) | Techniques for generating templates from reference single page graphic images | |
CN112101344B (en) | Video text tracking method and device | |
CN112200041A (en) | Video motion recognition method and device, storage medium and electronic equipment | |
Han et al. | TSR-VFD: Generating temporal super-resolution for unsteady vector field data | |
CN109389660A (en) | Image generating method and device | |
US8370115B2 (en) | Systems and methods of improved boolean forms | |
Li et al. | Weakly supervised segmentation loss based on graph cuts and superpixel algorithm | |
Zhao et al. | Modified object detection method based on YOLO | |
Absetan et al. | Integration of deep learned and handcrafted features for image retargeting quality assessment | |
US20230072445A1 (en) | Self-supervised video representation learning by exploring spatiotemporal continuity |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |