CN114219757A - Vehicle intelligent loss assessment method based on improved Mask R-CNN - Google Patents
Vehicle intelligent loss assessment method based on improved Mask R-CNN Download PDFInfo
- Publication number
- CN114219757A CN114219757A CN202111311347.4A CN202111311347A CN114219757A CN 114219757 A CN114219757 A CN 114219757A CN 202111311347 A CN202111311347 A CN 202111311347A CN 114219757 A CN114219757 A CN 114219757A
- Authority
- CN
- China
- Prior art keywords
- damage
- detection
- model
- mask
- convolution
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 54
- 238000001514 detection method Methods 0.000 claims abstract description 73
- 238000012549 training Methods 0.000 claims abstract description 48
- 238000005070 sampling Methods 0.000 claims abstract description 22
- 238000000605 extraction Methods 0.000 claims abstract description 16
- 238000012360 testing method Methods 0.000 claims abstract description 16
- 230000006872 improvement Effects 0.000 claims abstract description 4
- 238000010586 diagram Methods 0.000 claims description 11
- 238000002372 labelling Methods 0.000 claims description 9
- 230000003902 lesion Effects 0.000 claims description 6
- 230000011218 segmentation Effects 0.000 claims description 5
- 230000009466 transformation Effects 0.000 claims description 5
- 238000001914 filtration Methods 0.000 claims description 4
- 230000001629 suppression Effects 0.000 claims description 3
- 238000011156 evaluation Methods 0.000 claims description 2
- 238000010606 normalization Methods 0.000 claims description 2
- 238000013527 convolutional neural network Methods 0.000 description 10
- 230000008569 process Effects 0.000 description 7
- 238000004364 calculation method Methods 0.000 description 5
- 230000000694 effects Effects 0.000 description 5
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 238000013528 artificial neural network Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 230000004927 fusion Effects 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000010355 oscillation Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/0002—Inspection of images, e.g. flaw detection
- G06T7/0004—Industrial image inspection
- G06T7/0008—Industrial image inspection checking presence/absence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/2431—Multiple classes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30248—Vehicle exterior or interior
- G06T2207/30252—Vehicle exterior; Vicinity of vehicle
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Evolutionary Biology (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Quality & Reliability (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses an intelligent vehicle loss assessment method based on improved Mask R-CNN, which comprises the following steps: s1, after the damage type and the part marking are carried out on the vehicle damage picture, a marking data set in a coco format is manufactured, and a training set and a testing set are divided; s2, constructing a multi-detection model which is an improved Mask R-CNN; wherein the improvement comprises replacing the 3 x 3 convolution of the feature extraction network portion with DCNv2, replacing the interpolation upsampling method with the caroafe sampling method, adding a branch of part classification behind the RPN network, and replacing the full-header for bounding box regression in the detection header with a convolution header; s3, sending the training set into a multi-detection model for training to obtain a weight file; and S4, detecting the damaged vehicle picture based on the weight file to obtain a final damage assessment picture. The invention can synchronously output the damage type and the part type on the basis of using one model, is very efficient and concise, and improves the accuracy and the recall rate of the model by improvement.
Description
Technical Field
The invention relates to the technical field of target detection and example segmentation, in particular to an intelligent loss assessment method for a vehicle based on improved Mask R-CNN.
Background
The traditional automobile damage assessment process is complicated, the treatment period is long, damage assessment results can be different due to the difference of professional literacy of damage assessment personnel, the labor expenditure also occupies a large part of the cost, and the automobile insurance industry has urgent requirements on intelligent damage assessment. Due to the enhancement of computing power, the increase of data and the maturity of learning algorithms, deep learning is increasingly applied to practical problems. And (3) intelligently determining damage of the vehicle, namely rapidly judging the damaged part, type and degree of the vehicle by using an image recognition detection technology through vehicle appearance pictures uploaded by a vehicle owner or a damage determiner by adopting a deep learning method. The artificial intelligence is used for replacing eyes and brains of people, so that the damaged parts, types and degrees of the vehicles can be more conveniently and accurately determined, and the whole damage assessment process is simplified.
Most of the existing vehicle intelligent damage assessment methods involve two detection and segmentation models, wherein one model detects damage, and the other model detects parts where the damage is located. Both models have their own drawbacks, either by being connected in series or in parallel. The serial connection method results in low efficiency of model detection, while the parallel connection method is prone to errors in the process of combining the damage detection result and the part detection result. In addition, in a complex scene of vehicle damage identification, various and multi-scale characteristics of damage and parts themselves bring great challenges to the existing detection segmentation model.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides an automobile intelligent damage assessment method based on improved Mask R-CNN, which can effectively improve the synchronous output of damage detection and part identification, the model accuracy and the recall rate and reduce the phenomena of automobile damage false detection and missed detection.
In order to achieve the purpose, the technical scheme provided by the invention is as follows: an intelligent vehicle loss assessment method based on improved Mask R-CNN comprises the following steps:
s1, after the damage type and the part marking are carried out on the vehicle damage picture, a marking data set in a coco format is manufactured, and a training set and a testing set are divided;
s2, constructing a multi-detection model which is an improved Mask R-CNN; the improvement of the Mask R-CNN comprises the steps of replacing 3 x 3 convolution of a feature extraction network part with DCnv2, replacing an interpolation upsampling method with a CARAFE sampling method, adding a branch of part classification behind an RPN network, and replacing a full-connection header for frame regression in a detection header with a convolution header;
s3, sending the training set into a multi-detection model for training to obtain a weight file;
and S4, detecting the damaged vehicle picture based on the obtained weight file to obtain a final damage assessment picture.
Further, in step S1, marking the damage in the vehicle damage picture, including the contour, the damage type, and the type of the part where the damage is located, by using marking software; and after labeling, acquiring a json file containing labeling information, dividing the json file and the corresponding original image file into a training set and a test set according to the ratio of 9:1, and converting the training set and the test set into a coco data set format.
Further, in step S2, the 3 × 3 convolution of the feature extraction network part of the model is changed to DCNv2, specifically: 3 x 3 convolution of submodules in stage3, stage4 and stage5 in a feature extraction network ResNet50 is replaced by DCnv2, DCnv2 is an improved version of DCN, and DCnv2 not only adds offset to each sampling position but also adds different weights to each sampling position on the basis of DCN, so that the modeling capability of geometric transformation is further enhanced;
the interpolation upsampling method is replaced by the CARAFE sampling method, so that the error caused by the upsampling operation in the model can be effectively reducedThe difference is specifically: for a feature map with a shape of H W C, H being the length of the feature map, W being the width of the feature map, and C being the number of channels of the feature map, first, it is compressed to C by a convolution with 1W 1m,Cm<C, reducing the calculation amount of the subsequent steps, and then utilizing a kencoder*kencoderPredicting the upsampled kernel, k, by the convolutional layer ofencoderFor the convolution kernel size, the number of input channels is CmThe number of output channels is sigma2*kup 2σ is the up-sampling magnification, kupFor up-sampling kernel size, channel dimensions are expanded in the spatial dimension to obtain a shape of σ H σ W kup 2The upsampling kernel is normalized by utilizing softmax, each pixel point in the output characteristic diagram is mapped back to the input characteristic diagram, and k taking the k as the center is taken outup*kupPerforming dot product on the predicted upsampling kernel of the pixel point to obtain an output value, wherein different channels at the same position share the same upsampling kernel;
adding a branch of part classification behind the RPN, and paralleling the branch of damage classification with the branch of part classification, specifically: after passing through the RPN, the characteristic diagram is classified by two full-connection layer outputs, the two full-connection layers form a full-connection head, damage and parts are classified into two categories, and the output of two branches is needed, so that a branch is added behind the full-connection layer so as to simultaneously output the damage category and the category of the parts where the damage is located, and the branch of the damage category shares parameters and the full-connection head with the branch of the part category;
will detect that the head that is used for the frame to return is changed into the convolution head for the full head, specifically is: the regression and classification of the detection frame in the original Mask R-CNN model share a full-connection header, the regression of the detection frame is removed from the full-connection header, meanwhile, a convolution header is added behind the RPN network and used for the regression of the detection frame, and the convolution header consists of 3 residual modules and 2 Non-local modules.
Further, the step S3 includes the steps of:
s301, adjusting the input picture to a specified size, setting the maximum size to be 800 pixel points, randomly selecting the minimum size from (640,672,704,736,768,800), and randomly horizontally turning the image at a probability of 0.5;
s302, loading a training set by a plurality of detection models;
s303, setting training times and learning rate parameters to start training the model;
s304, calculating Loss by the model, performing back propagation to update the weight, and storing the final model weight after training to obtain a weight file; wherein Loss is expressed as:
Loss=Ldamages+Lcomponents+Lbbox+Lmask
in the formula, LdamagesRepresents a damage classification loss, LcomponentsRepresents a part classification loss, LbboxIndicates the regression loss of the detection box, LmaskRepresents pixel segmentation mask loss; i is the index of the anchor frame, pi *Whether the ith anchor frame contains the target or not is shown, and when the ith anchor frame contains the target, pi *Equal to 1; when in the ith anchor frame is background, pi *Is 0; p is a radical ofiRepresenting the probability value of the target in the ith anchor box; v. ofiParameterized vector representing coordinates of center point and width and height of ith anchor frame prediction, viIs the parameterized vector of the ith label bounding box; n iscIs the number of classes of lesions, nijPredicting the number of lesions of type j for a pixel belonging to type iiiRepresenting the number of pixels belonging to and predicted as the ith class, i.e. the number of pixels for which the ith class is predicted correctly, niA total number of detection boxes representing predicted class i lesions; n is a radical ofdamages、Ncomponents、NbboxAre constants used for normalization.
Further, the step S4 includes the steps of:
s401, loading a test set, namely a vehicle damage picture to be detected, by multiple detection models, and detecting the test set based on a weight file obtained by model training;
s402, filtering out repeated detection boxes by using a non-maximum suppression algorithm (NMS);
s403, outputting a damage classification and score, a part classification, a detection frame and a damage mask by the model detection head; filtering out detection frames with partial damage classification scores lower than the threshold value again according to the set score threshold value, and loading the rest detection frames, the corresponding damage classification and score, part classification and damage mask on an original picture to obtain a final damage assessment picture;
s404, evaluating the detection result, and calculating a corresponding evaluation index: precision, Recall, F1 value F1-score and mean of average Precision mapp.
Compared with the prior art, the invention has the following advantages and beneficial effects:
1. by adding the part detection branch, the number of models is reduced, the execution efficiency of the algorithm is improved, and the damage detection and the identification of the part where the damage is positioned can be simultaneously realized. Under the condition that only one model is used, the damage category, the score, the part category, the damage detection frame and the damage mask are output simultaneously, and the method is efficient and concise.
2. The invention uses DCnv2 convolution method, and the introduction of offset value is equivalent to enhancement of data set. Under the condition that the labeled data set is limited, the identification effect of the model is improved.
3. The invention reduces the error in the sampling process by adopting the CARAFE sampling method.
4. According to the invention, the full-connection head part for detecting frame regression is replaced by the convolution head part, so that the positioning accuracy is further improved.
Drawings
FIG. 1 is an overall flow diagram of the method of the present invention.
Fig. 2 is a structural diagram of a multi-detection model.
Fig. 3 is a structural diagram of a convolution header.
FIG. 4 is a block diagram of Non-local modules.
Detailed Description
The present invention will be described in further detail with reference to examples and drawings, but the present invention is not limited thereto.
As shown in fig. 1, the vehicle intelligent damage assessment method based on the improved Mask R-CNN provided in this embodiment includes the following specific implementation steps:
s1, after the damage type labeling and the part labeling are carried out on the vehicle damage picture, a coco-format labeling data set is manufactured, and a training set and a testing set are divided, wherein the coco-format labeling data set specifically comprises the following steps:
finding out all the damages in the vehicle picture, using the polygon marking in the labelme marking tool to circle the outline of the damages, selecting the corresponding damage type in the type part, and selecting the corresponding part type in the group _ id part. The original group _ id is different examples in the same class, and is filled as the id of the part (the range is 0-31, corresponding to 32 parts). And after the labeling is finished, generating a label file in a json format corresponding to the picture.
All json files and corresponding original image files are divided into training sets and test sets in a ratio of 9: 1. All json files in the training set are then merged into a co-formatted json file, which is slightly different from a common co-formatted file in that the group _ id in the original json file needs to be converted into a component category, namely component _ id, in the co-formatted json file. The original category _ id is used as the id of the damage.
The test set is processed in the same way;
because the data set is slightly different from the common data set, the model is also modified properly when the data set is loaded, so that the model can be loaded with not only the damage types but also the part types.
S2, constructing a multi-detection model which is an improved Mask R-CNN.
As shown in fig. 2, the multi-detection model is composed of a feature extraction network, a feature fusion network, a region extraction network (RPN network), a region of interest matching (Roi Align) module, a multi-branch classification sub-network, a detection frame regression sub-network, and a mask sub-network. The feature extraction network ResNet50 extracts features (C1, C2, C3, C4 and C5) of different scales of an input image (inut) through 5 stages, then the features are laterally connected and enter a feature fusion network FPN to fuse the features of different scales to obtain (P5, P4, P3 and P2), and each layer is followed by 3-by-3 convolution to eliminate aliasing effect caused by upsampling. The feature map (feature map) is then entered into the RPN network to extract candidate region boxes (explosals). After the Roi Align operation, the feature map is transformed to the same size of 7 × 7, and the number of channels is 256. Then, the damage (damages) type and the component (components) type are output through 1 full connection (fc) head with 1 × 1024 and one classification number of 1 × 1. At the same time, 7 × 256 feature maps are passed through a convolution (conv) header and a fully connected layer to generate a detection box (box). Mask branches are processed through 4 convolution layers and two deconvolution layers to finally obtain the damage Mask with the size of 28 × classification number.
3 x 3 convolution of the feature extraction network part is replaced by DCnv2, and the modeling capacity of geometric transformation is enhanced; the CARAFE sampling method is adopted to reduce errors in the sampling process; a classified branch is expanded behind the RPN to realize multi-detection of damage and parts; the full-connection head of the model for regression is replaced by the convolution head, and the positioning error is reduced.
Compared with the method for detecting the damage and the part by using two models respectively, the method has the advantages that the detection results of the part where the damage and the part are located are directly output through one model, an additional combination process is not needed, and the method is efficient and concise.
The 3 × 3 convolution of the feature extraction network part is replaced by DCNv2, which is specifically as follows:
the feature extraction network ResNet50 is composed of 50 convolutional layers of 18 sub-modules, which can be divided into 5 stages. Each submodule consists of one 1 × 1 convolution, 13 × 3 convolution, one 1 × 1 convolution and one residual concatenation. The convolution of 3 × 3 of the neutron modules in stage3, stage4 and stage5 was all changed to DCNv 2.
DCNv2 is a modified version of DCN. DCN is a kind of deformable convolution that enhances the feature extraction capability of the network by adding an offset (offset) to the convolutional layer. The offset is learned under the guidance of supervision information, so that the network can focus more attention on positions related to training targets when extracting features, and better cover targets with different sizes and shapes. DCnv2 not only adds an offset to each sampling position, but also adds different weights to each sampling position on the basis of DCN, thereby further enhancing the modeling capability of the geometric transformation.
After the features of different scales are extracted by the feature extraction network ResNet50, feature maps with high-level semantic information of different sizes are constructed through a top-down network structure (FPN) with lateral connection. The top-down network is mainly composed of up-sampling operation, the up-sampling method adopted in the original model is nearest neighbor interpolation, when the picture is enlarged, the missing pixels are generated by directly using the original color nearest to the missing pixels, and the method of moving the adjacent pixels causes the picture to generate obvious visible saw teeth. The interpolation upsampling method is replaced by CARAFE, so that errors caused by upsampling operation in the model can be effectively reduced.
The CARAFE sampling method specifically comprises the following steps:
for a feature map with a shape of H W C, H being the length of the feature map, W being the width of the feature map, and C being the number of channels of the feature map, first, it is compressed to C by a convolution with 1W 1m,Cm<C, reducing calculation of subsequent stepsAmount, then using a kencoder*kencoderPredicting the upsampled kernel, k, by the convolutional layer ofencoderFor the convolution kernel size, the number of input channels is CmThe number of output channels is sigma2*kup 2σ is the up-sampling magnification, kupFor up-sampling kernel size, channel dimensions are expanded in the spatial dimension to obtain a shape of σ H σ W kup 2The upsampling kernel is normalized by utilizing softmax, each pixel point in the output characteristic diagram is mapped back to the input characteristic diagram, and k taking the k as the center is taken outup*kupAnd performing dot product on the predicted upsampling kernel of the pixel point to obtain an output value, wherein different channels at the same position share the same upsampling kernel. .
After the RPN network, a branch of part classification is added, specifically as follows:
after passing through the RPN network, the feature map is processed through a Roi Align operation to obtain a feature map with a fixed size of 7 × 256. And then the damage classification result can be output through the two full connection layers. In order to output the part classification result of the damage at the same time, a part branch is added, the part branch and the damaged classification branch share a full-connection head, and weight is shared.
The specific process of the RPN network is as follows:
each pixel point firstly generates 9 anchor frames with different length-width ratios, the characteristic graph is changed into 256 × 16 after being convoluted by 3 × 3, then an 18 × 16 characteristic graph and a 36 × 16 characteristic graph are respectively obtained after being convoluted by two times of 1 × 1, namely 16 × 9 results, each result comprises 2 scores and 4 coordinates, the two scores are the scores of the foreground and the background respectively, the anchor frame with the foreground score being more than 0.5 is selected and reserved as a positive sample according to the scores, and the anchor frame with the background score being more than 0.5 is selected and reserved as a negative sample.
And the Roi Align operation is to readjust the feature map to a fixed size, to correspond the original map to the feature map, and to map the feature map to the fixed feature map after transformation. In the improved method of Roi Align, when the characteristic diagram is mapped to a fixed size, decimal possibly occurs through proportional calculation, and pixel points have no decimal. If the rounding is directly performed, a certain error is caused, and the error is greatly increased when being fed back to the original drawing. The Roi Align method is that if the decimal is obtained, meaning that the decimal does not fall on a real pixel point, the adjacent pixel point is used for carrying out bilinear interpolation on the virtual pixel point to obtain the value of the pixel point, and therefore errors caused by direct rounding are avoided.
The full-connection header used for regression in the detection header is replaced by a convolution header, and the method specifically comprises the following steps:
in the Mask R-CNN model, the regression and classification of the detection frame share a fully connected header, and the detection header comprises the regression and classification of the detection frame and outputs a Mask. Research shows that the fully connected header is suitable for classification tasks, and the convolution header is more suitable for regression tasks. Therefore, the regression of the detection frame is removed from the full-connection header, and a convolution header consisting of 3 residual blocks and 2 Non-local blocks is added from the back of the RPN network for regression of the detection frame. After the Mask branch takes the front detection frame, the object is divided on the basis of the front detection frame, firstly, each frame generates a feature map of 14 × 256 through convolution, after multiple times of convolution, the feature map is changed into 28 × 256 through deconvolution operation, and finally, a Mask of 28 × classification number is output. The detection head needs to output the contents of the damage category and the score, the part category where the damage exists, the detection frame and the damage mask.
As shown in fig. 3, the convolution header is composed of 3 Residual blocks (Residual blocks) and 2 Non-local neural network (Non-local) blocks. In principle, the deeper the network, the better, but the deeper the network, the longer the detection time, in order to balance speed and time, we choose 3 residual modules and 2 Non-local modules. The first residual module needs to change the dimension of the feature map and consists of a 1 × 1 convolution, a 3 × 3 convolution and a residual join with a 1 × 1 convolution. The last two residual blocks are identical to the sub-blocks in ResNet, and consist of a 1 × 1 convolution, a 3 × 3 convolution, a 1 × 1 convolution and a residual connection, and the dimensions remain unchanged.
As shown in FIG. 4, the Non-local module is as follows:
the dimension of the input feature graph x is N x H x W C, the feature graphs k, q and v with the dimensions of N x H x W C/2 are obtained by respectively convolving convolution kernels with the channel number of C/2 and the dimension of 1 x 1, the dimension is reduced to C/2 to improve the efficiency of the subsequent calculation, then the k and the q are subjected to matrix multiplication to obtain an output with the dimension of N HW, the output is subjected to softmax processing and then subjected to matrix multiplication with the output of the third branch to obtain an output with the dimension of N HW C/2, the output is subjected to convolution kernel number of C after being processed into the output with the dimension of N x H W C/2, the convolution layer with the dimension of 1 x 1C is subjected to residual connection with the original output of N x H x W element, and the residual current is obtained by performing rectification on the output of the final residual current of the volume of the, keeping consistent with the original input dimension.
The Non-local module is used as a simple, efficient and universal component for capturing long-distance dependency of the neural network. Unlike the limited receptive field of convolution, Non-local can weight the sum of the features of all locations in the feature map for a location as the response value for that location, and is not limited to neighboring points. Very good performance is obtained with a small number of layers.
And S3, sending the training set into a multi-detection model for training to obtain a weight file.
The training set is first data enhanced. Data annotation is a very tedious task, and training of models often relies on large amounts of data. The data enhancement can play a role of data expansion, so that a limited data set plays a greater role. The data enhancement operations used in this model include:
resize: in a batch size, the input size is fixed, so pictures in a batch size usually need to be resized to the same size. Max _ size is set to 1333, and original min _ szie is set to 800, and min _ size is set to be randomly selected from (640,672,704,736,768,800) these numbers in order to function as an extension data set. Experiments show that the method can effectively improve the detection effect of the model;
horizon _ file: the pictures were randomly horizontally flipped with a probability of 0.5.
After parameters such as training times, learning rate and the like are set, the training model is started, and the method specifically comprises the following steps:
before training, 90000 times of training are set, and the training times are optimal according to experiments. The batch _ size is set to 4, which can also be adjusted according to the memory of the graphics card. The setting of the initial learning rate lr follows lr × batch _ size ═ 0.01. In order to achieve a better training effect, a method of war _ up is adopted for adjusting the learning rate, and the learning rate is multiplied by 0.1 when the training is carried out for 60000 times, and the learning rate is multiplied by 0.1 when the training is carried out for 80000 times.
The arm _ up method is a learning rate warm-up method. When the training is started, the weight of the model is initialized randomly, and the large learning rate can make the training of the model unstable and generate oscillation. The method comprises the steps of firstly selecting a smaller learning rate when training is started, and then selecting a preset learning rate for training after the model is relatively stable, so that the model convergence effect is better.
To speed up model convergence and prevent overfitting, the ResNet50 pre-trained model parameters are used as initial weights for the present feature extraction network. And loading data from the training set and starting training according to the set parameters.
During training, the output result of the detection head and the label are subjected to loss calculation:
the classification of the damage and the parts adopts a cross entropy loss function; the detection frame regression adopts smooth _ L1 loss function; the mask uses a binary cross entropy loss function. And performing inverse gradient propagation after the Loss addition, and updating the model weight.
And S4, detecting the damaged vehicle picture based on the weight file obtained by model training.
And adjusting the images to be detected in the test set to be the same size, setting min _ size to be 800 pixels and max _ size to be 1333 pixels. And setting a threshold value of NMS (non-maximum suppression algorithm) to be 0.5, detecting the test set based on a final weight file obtained by model training, and detecting and outputting the types and scores score, the types of the parts where the damage is located, a detection frame and a damage mask. Setting the threshold value of the damage score to be 0.3, removing the objects with the damage score lower than 0.3 in prediction, and only keeping the objects with the damage score higher than 0.3 to obtain the final detection result. And loading the damage result to an original image to output a damage assessment image, and generating a corresponding json file at the same time. And the detection results are simultaneously stored as pth files so as to be capable of calculating the average mean accuracy mAP of all the categories by comparing box and mask respectively.
The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.
Claims (5)
1. An intelligent vehicle loss assessment method based on improved Mask R-CNN is characterized by comprising the following steps:
s1, after the damage type and the part marking are carried out on the vehicle damage picture, a marking data set in a coco format is manufactured, and a training set and a testing set are divided;
s2, constructing a multi-detection model which is an improved Mask R-CNN; the improvement of the Mask R-CNN comprises the steps of replacing 3 x 3 convolution of a feature extraction network part with DCnv2, replacing an interpolation upsampling method with a CARAFE sampling method, adding a branch of part classification behind an RPN network, and replacing a full-connection header for frame regression in a detection header with a convolution header;
s3, sending the training set into a multi-detection model for training to obtain a weight file;
and S4, detecting the damaged vehicle picture based on the obtained weight file to obtain a final damage assessment picture.
2. The vehicle intelligent damage assessment method based on the improved Mask R-CNN as claimed in claim 1, wherein in step S1, the damages existing in the vehicle damage picture are marked out by using marking software, including the contour, the damage type, and the type of the part where the damage is located; and after labeling, acquiring a json file containing labeling information, dividing the json file and the corresponding original image file into a training set and a test set according to the ratio of 9:1, and converting the training set and the test set into a coco data set format.
3. The vehicle intelligent damage assessment method based on improved Mask R-CNN as claimed in claim 1, wherein in step S2, the 3 x 3 convolution of the feature extraction network part of the model is changed to DCnv2, specifically: 3 x 3 convolution of submodules in stage3, stage4 and stage5 in a feature extraction network ResNet50 is replaced by DCnv2, DCnv2 is an improved version of DCN, and DCnv2 not only adds offset to each sampling position but also adds different weights to each sampling position on the basis of DCN, so that the modeling capability of geometric transformation is further enhanced;
the interpolation upsampling method is replaced by the CARAFE sampling method, so that errors caused by upsampling operation in the model can be effectively reduced;
adding a branch of part classification behind the RPN, and paralleling the branch of damage classification with the branch of part classification, specifically: after passing through the RPN, the characteristic diagram is classified by two full-connection layer outputs, the two full-connection layers form a full-connection head, damage and parts are classified into two categories, and the output of two branches is needed, so that a branch is added behind the full-connection layer so as to simultaneously output the damage category and the category of the parts where the damage is located, and the branch of the damage category shares parameters and the full-connection head with the branch of the part category;
will detect that the head that is used for the frame to return is changed into the convolution head for the full head, specifically is: the regression and classification of the detection frame in the original Mask R-CNN model share a full-connection header, the regression of the detection frame is removed from the full-connection header, meanwhile, a convolution header is added behind the RPN network and used for the regression of the detection frame, and the convolution header consists of 3 residual modules and 2 Non-local modules.
4. The improved Mask R-CNN-based intelligent damage assessment method according to claim 1, wherein the step S3 comprises the following steps:
s301, adjusting the input picture to a specified size, setting the maximum size to be 800 pixel points, randomly selecting the minimum size from (640,672,704,736,768,800), and randomly horizontally turning the image at a probability of 0.5;
s302, loading a training set by a plurality of detection models;
s303, setting training times and learning rate parameters to start training the model;
s304, calculating Loss by the model, performing back propagation to update the weight, and storing the final model weight after training to obtain a weight file; wherein Loss is expressed as:
Loss=Ldamages+Lcomponents+Lbbox+Lmask
in the formula, LdamagesRepresents a damage classification loss, LcomponentsRepresents a part classification loss, LbboxIndicates the regression loss of the detection box, LmaskRepresents pixel segmentation mask loss; i is the index of the anchor frame, pi *Indicating whether the ith anchor frame contains the target or not when the ith anchor frame contains the targetWhen the frame contains an object, pi *Equal to 1; when in the ith anchor frame is background, pi *Is 0; p is a radical ofiRepresenting the probability value of the target in the ith anchor box; v. ofiParameterized vector representing coordinates of center point and width and height of ith anchor frame prediction, viIs the parameterized vector of the ith label bounding box; n iscIs the number of classes of lesions, nijPredicting the number of lesions of type j for a pixel belonging to type iiiRepresenting the number of pixels belonging to and predicted as the ith class, i.e. the number of pixels for which the ith class is predicted correctly, niA total number of detection boxes representing predicted class i lesions; n is a radical ofdamages、Ncomponents、NbboxAre constants used for normalization.
5. The improved Mask R-CNN-based intelligent damage assessment method according to claim 1, wherein the step S4 comprises the following steps:
s401, loading a test set, namely a vehicle damage picture to be detected, by multiple detection models, and detecting the test set based on a weight file obtained by model training;
s402, filtering out repeated detection frames by using a non-maximum suppression algorithm NMS;
s403, outputting a damage classification and score, a part classification, a detection frame and a damage mask by the model detection head; filtering out detection frames with partial damage classification scores lower than the threshold value again according to the set score threshold value, and loading the rest detection frames, the corresponding damage classification and score, part classification and damage mask on an original picture to obtain a final damage assessment picture;
s404, evaluating the detection result, and calculating a corresponding evaluation index: precision, Recall, F1 value F1-score and mean of average Precision mapp.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111311347.4A CN114219757B (en) | 2021-11-08 | 2021-11-08 | Intelligent damage assessment method for vehicle based on improved Mask R-CNN |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111311347.4A CN114219757B (en) | 2021-11-08 | 2021-11-08 | Intelligent damage assessment method for vehicle based on improved Mask R-CNN |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114219757A true CN114219757A (en) | 2022-03-22 |
CN114219757B CN114219757B (en) | 2024-05-10 |
Family
ID=80696552
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111311347.4A Active CN114219757B (en) | 2021-11-08 | 2021-11-08 | Intelligent damage assessment method for vehicle based on improved Mask R-CNN |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114219757B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117671330A (en) * | 2023-11-14 | 2024-03-08 | 平安科技(上海)有限公司 | Vehicle damage assessment method, device, computer equipment and storage medium |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112802005A (en) * | 2021-02-07 | 2021-05-14 | 安徽工业大学 | Automobile surface scratch detection method based on improved Mask RCNN |
CN113205026A (en) * | 2021-04-26 | 2021-08-03 | 武汉大学 | Improved vehicle type recognition method based on fast RCNN deep learning network |
-
2021
- 2021-11-08 CN CN202111311347.4A patent/CN114219757B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112802005A (en) * | 2021-02-07 | 2021-05-14 | 安徽工业大学 | Automobile surface scratch detection method based on improved Mask RCNN |
CN113205026A (en) * | 2021-04-26 | 2021-08-03 | 武汉大学 | Improved vehicle type recognition method based on fast RCNN deep learning network |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117671330A (en) * | 2023-11-14 | 2024-03-08 | 平安科技(上海)有限公司 | Vehicle damage assessment method, device, computer equipment and storage medium |
CN117671330B (en) * | 2023-11-14 | 2024-06-21 | 平安科技(上海)有限公司 | Vehicle damage assessment method, device, computer equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN114219757B (en) | 2024-05-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109584248B (en) | Infrared target instance segmentation method based on feature fusion and dense connection network | |
CN110176027B (en) | Video target tracking method, device, equipment and storage medium | |
CN111640125B (en) | Aerial photography graph building detection and segmentation method and device based on Mask R-CNN | |
US11315253B2 (en) | Computer vision system and method | |
CN108416266B (en) | Method for rapidly identifying video behaviors by extracting moving object through optical flow | |
CN109583483B (en) | Target detection method and system based on convolutional neural network | |
CN112132156B (en) | Image saliency target detection method and system based on multi-depth feature fusion | |
CN109712165B (en) | Similar foreground image set segmentation method based on convolutional neural network | |
CN113469073A (en) | SAR image ship detection method and system based on lightweight deep learning | |
CN113052006B (en) | Image target detection method, system and readable storage medium based on convolutional neural network | |
CN111612008A (en) | Image segmentation method based on convolution network | |
CN115131797B (en) | Scene text detection method based on feature enhancement pyramid network | |
CN111768415A (en) | Image instance segmentation method without quantization pooling | |
CN112927209B (en) | CNN-based significance detection system and method | |
CN117253154B (en) | Container weak and small serial number target detection and identification method based on deep learning | |
CN115631344B (en) | Target detection method based on feature self-adaptive aggregation | |
CN113449691A (en) | Human shape recognition system and method based on non-local attention mechanism | |
CN113591719A (en) | Method and device for detecting text with any shape in natural scene and training method | |
CN115294356A (en) | Target detection method based on wide area receptive field space attention | |
CN114998756A (en) | Yolov 5-based remote sensing image detection method and device and storage medium | |
CN116863194A (en) | Foot ulcer image classification method, system, equipment and medium | |
CN111832508B (en) | DIE _ GA-based low-illumination target detection method | |
CN114219757B (en) | Intelligent damage assessment method for vehicle based on improved Mask R-CNN | |
CN112446292B (en) | 2D image salient object detection method and system | |
GB2623387A (en) | Learnable image transformation training methods and systems in graphics rendering |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |