CN114219757A - Vehicle intelligent loss assessment method based on improved Mask R-CNN - Google Patents

Vehicle intelligent loss assessment method based on improved Mask R-CNN Download PDF

Info

Publication number
CN114219757A
CN114219757A CN202111311347.4A CN202111311347A CN114219757A CN 114219757 A CN114219757 A CN 114219757A CN 202111311347 A CN202111311347 A CN 202111311347A CN 114219757 A CN114219757 A CN 114219757A
Authority
CN
China
Prior art keywords
damage
detection
model
mask
convolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111311347.4A
Other languages
Chinese (zh)
Other versions
CN114219757B (en
Inventor
袁华
陈雨欣
董守斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN202111311347.4A priority Critical patent/CN114219757B/en
Publication of CN114219757A publication Critical patent/CN114219757A/en
Application granted granted Critical
Publication of CN114219757B publication Critical patent/CN114219757B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0004Industrial image inspection
    • G06T7/0008Industrial image inspection checking presence/absence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2431Multiple classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30248Vehicle exterior or interior
    • G06T2207/30252Vehicle exterior; Vicinity of vehicle

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Biology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Quality & Reliability (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an intelligent vehicle loss assessment method based on improved Mask R-CNN, which comprises the following steps: s1, after the damage type and the part marking are carried out on the vehicle damage picture, a marking data set in a coco format is manufactured, and a training set and a testing set are divided; s2, constructing a multi-detection model which is an improved Mask R-CNN; wherein the improvement comprises replacing the 3 x 3 convolution of the feature extraction network portion with DCNv2, replacing the interpolation upsampling method with the caroafe sampling method, adding a branch of part classification behind the RPN network, and replacing the full-header for bounding box regression in the detection header with a convolution header; s3, sending the training set into a multi-detection model for training to obtain a weight file; and S4, detecting the damaged vehicle picture based on the weight file to obtain a final damage assessment picture. The invention can synchronously output the damage type and the part type on the basis of using one model, is very efficient and concise, and improves the accuracy and the recall rate of the model by improvement.

Description

Vehicle intelligent loss assessment method based on improved Mask R-CNN
Technical Field
The invention relates to the technical field of target detection and example segmentation, in particular to an intelligent loss assessment method for a vehicle based on improved Mask R-CNN.
Background
The traditional automobile damage assessment process is complicated, the treatment period is long, damage assessment results can be different due to the difference of professional literacy of damage assessment personnel, the labor expenditure also occupies a large part of the cost, and the automobile insurance industry has urgent requirements on intelligent damage assessment. Due to the enhancement of computing power, the increase of data and the maturity of learning algorithms, deep learning is increasingly applied to practical problems. And (3) intelligently determining damage of the vehicle, namely rapidly judging the damaged part, type and degree of the vehicle by using an image recognition detection technology through vehicle appearance pictures uploaded by a vehicle owner or a damage determiner by adopting a deep learning method. The artificial intelligence is used for replacing eyes and brains of people, so that the damaged parts, types and degrees of the vehicles can be more conveniently and accurately determined, and the whole damage assessment process is simplified.
Most of the existing vehicle intelligent damage assessment methods involve two detection and segmentation models, wherein one model detects damage, and the other model detects parts where the damage is located. Both models have their own drawbacks, either by being connected in series or in parallel. The serial connection method results in low efficiency of model detection, while the parallel connection method is prone to errors in the process of combining the damage detection result and the part detection result. In addition, in a complex scene of vehicle damage identification, various and multi-scale characteristics of damage and parts themselves bring great challenges to the existing detection segmentation model.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides an automobile intelligent damage assessment method based on improved Mask R-CNN, which can effectively improve the synchronous output of damage detection and part identification, the model accuracy and the recall rate and reduce the phenomena of automobile damage false detection and missed detection.
In order to achieve the purpose, the technical scheme provided by the invention is as follows: an intelligent vehicle loss assessment method based on improved Mask R-CNN comprises the following steps:
s1, after the damage type and the part marking are carried out on the vehicle damage picture, a marking data set in a coco format is manufactured, and a training set and a testing set are divided;
s2, constructing a multi-detection model which is an improved Mask R-CNN; the improvement of the Mask R-CNN comprises the steps of replacing 3 x 3 convolution of a feature extraction network part with DCnv2, replacing an interpolation upsampling method with a CARAFE sampling method, adding a branch of part classification behind an RPN network, and replacing a full-connection header for frame regression in a detection header with a convolution header;
s3, sending the training set into a multi-detection model for training to obtain a weight file;
and S4, detecting the damaged vehicle picture based on the obtained weight file to obtain a final damage assessment picture.
Further, in step S1, marking the damage in the vehicle damage picture, including the contour, the damage type, and the type of the part where the damage is located, by using marking software; and after labeling, acquiring a json file containing labeling information, dividing the json file and the corresponding original image file into a training set and a test set according to the ratio of 9:1, and converting the training set and the test set into a coco data set format.
Further, in step S2, the 3 × 3 convolution of the feature extraction network part of the model is changed to DCNv2, specifically: 3 x 3 convolution of submodules in stage3, stage4 and stage5 in a feature extraction network ResNet50 is replaced by DCnv2, DCnv2 is an improved version of DCN, and DCnv2 not only adds offset to each sampling position but also adds different weights to each sampling position on the basis of DCN, so that the modeling capability of geometric transformation is further enhanced;
the interpolation upsampling method is replaced by the CARAFE sampling method, so that the error caused by the upsampling operation in the model can be effectively reducedThe difference is specifically: for a feature map with a shape of H W C, H being the length of the feature map, W being the width of the feature map, and C being the number of channels of the feature map, first, it is compressed to C by a convolution with 1W 1m,Cm<C, reducing the calculation amount of the subsequent steps, and then utilizing a kencoder*kencoderPredicting the upsampled kernel, k, by the convolutional layer ofencoderFor the convolution kernel size, the number of input channels is CmThe number of output channels is sigma2*kup 2σ is the up-sampling magnification, kupFor up-sampling kernel size, channel dimensions are expanded in the spatial dimension to obtain a shape of σ H σ W kup 2The upsampling kernel is normalized by utilizing softmax, each pixel point in the output characteristic diagram is mapped back to the input characteristic diagram, and k taking the k as the center is taken outup*kupPerforming dot product on the predicted upsampling kernel of the pixel point to obtain an output value, wherein different channels at the same position share the same upsampling kernel;
adding a branch of part classification behind the RPN, and paralleling the branch of damage classification with the branch of part classification, specifically: after passing through the RPN, the characteristic diagram is classified by two full-connection layer outputs, the two full-connection layers form a full-connection head, damage and parts are classified into two categories, and the output of two branches is needed, so that a branch is added behind the full-connection layer so as to simultaneously output the damage category and the category of the parts where the damage is located, and the branch of the damage category shares parameters and the full-connection head with the branch of the part category;
will detect that the head that is used for the frame to return is changed into the convolution head for the full head, specifically is: the regression and classification of the detection frame in the original Mask R-CNN model share a full-connection header, the regression of the detection frame is removed from the full-connection header, meanwhile, a convolution header is added behind the RPN network and used for the regression of the detection frame, and the convolution header consists of 3 residual modules and 2 Non-local modules.
Further, the step S3 includes the steps of:
s301, adjusting the input picture to a specified size, setting the maximum size to be 800 pixel points, randomly selecting the minimum size from (640,672,704,736,768,800), and randomly horizontally turning the image at a probability of 0.5;
s302, loading a training set by a plurality of detection models;
s303, setting training times and learning rate parameters to start training the model;
s304, calculating Loss by the model, performing back propagation to update the weight, and storing the final model weight after training to obtain a weight file; wherein Loss is expressed as:
Loss=Ldamages+Lcomponents+Lbbox+Lmask
Figure BDA0003341960170000041
Figure BDA0003341960170000042
Figure BDA0003341960170000043
Figure BDA0003341960170000044
Figure BDA0003341960170000045
in the formula, LdamagesRepresents a damage classification loss, LcomponentsRepresents a part classification loss, LbboxIndicates the regression loss of the detection box, LmaskRepresents pixel segmentation mask loss; i is the index of the anchor frame, pi *Whether the ith anchor frame contains the target or not is shown, and when the ith anchor frame contains the target, pi *Equal to 1; when in the ith anchor frame is background, pi *Is 0; p is a radical ofiRepresenting the probability value of the target in the ith anchor box; v. ofiParameterized vector representing coordinates of center point and width and height of ith anchor frame prediction, viIs the parameterized vector of the ith label bounding box; n iscIs the number of classes of lesions, nijPredicting the number of lesions of type j for a pixel belonging to type iiiRepresenting the number of pixels belonging to and predicted as the ith class, i.e. the number of pixels for which the ith class is predicted correctly, niA total number of detection boxes representing predicted class i lesions; n is a radical ofdamages、Ncomponents、NbboxAre constants used for normalization.
Further, the step S4 includes the steps of:
s401, loading a test set, namely a vehicle damage picture to be detected, by multiple detection models, and detecting the test set based on a weight file obtained by model training;
s402, filtering out repeated detection boxes by using a non-maximum suppression algorithm (NMS);
s403, outputting a damage classification and score, a part classification, a detection frame and a damage mask by the model detection head; filtering out detection frames with partial damage classification scores lower than the threshold value again according to the set score threshold value, and loading the rest detection frames, the corresponding damage classification and score, part classification and damage mask on an original picture to obtain a final damage assessment picture;
s404, evaluating the detection result, and calculating a corresponding evaluation index: precision, Recall, F1 value F1-score and mean of average Precision mapp.
Compared with the prior art, the invention has the following advantages and beneficial effects:
1. by adding the part detection branch, the number of models is reduced, the execution efficiency of the algorithm is improved, and the damage detection and the identification of the part where the damage is positioned can be simultaneously realized. Under the condition that only one model is used, the damage category, the score, the part category, the damage detection frame and the damage mask are output simultaneously, and the method is efficient and concise.
2. The invention uses DCnv2 convolution method, and the introduction of offset value is equivalent to enhancement of data set. Under the condition that the labeled data set is limited, the identification effect of the model is improved.
3. The invention reduces the error in the sampling process by adopting the CARAFE sampling method.
4. According to the invention, the full-connection head part for detecting frame regression is replaced by the convolution head part, so that the positioning accuracy is further improved.
Drawings
FIG. 1 is an overall flow diagram of the method of the present invention.
Fig. 2 is a structural diagram of a multi-detection model.
Fig. 3 is a structural diagram of a convolution header.
FIG. 4 is a block diagram of Non-local modules.
Detailed Description
The present invention will be described in further detail with reference to examples and drawings, but the present invention is not limited thereto.
As shown in fig. 1, the vehicle intelligent damage assessment method based on the improved Mask R-CNN provided in this embodiment includes the following specific implementation steps:
s1, after the damage type labeling and the part labeling are carried out on the vehicle damage picture, a coco-format labeling data set is manufactured, and a training set and a testing set are divided, wherein the coco-format labeling data set specifically comprises the following steps:
finding out all the damages in the vehicle picture, using the polygon marking in the labelme marking tool to circle the outline of the damages, selecting the corresponding damage type in the type part, and selecting the corresponding part type in the group _ id part. The original group _ id is different examples in the same class, and is filled as the id of the part (the range is 0-31, corresponding to 32 parts). And after the labeling is finished, generating a label file in a json format corresponding to the picture.
All json files and corresponding original image files are divided into training sets and test sets in a ratio of 9: 1. All json files in the training set are then merged into a co-formatted json file, which is slightly different from a common co-formatted file in that the group _ id in the original json file needs to be converted into a component category, namely component _ id, in the co-formatted json file. The original category _ id is used as the id of the damage.
The test set is processed in the same way;
because the data set is slightly different from the common data set, the model is also modified properly when the data set is loaded, so that the model can be loaded with not only the damage types but also the part types.
S2, constructing a multi-detection model which is an improved Mask R-CNN.
As shown in fig. 2, the multi-detection model is composed of a feature extraction network, a feature fusion network, a region extraction network (RPN network), a region of interest matching (Roi Align) module, a multi-branch classification sub-network, a detection frame regression sub-network, and a mask sub-network. The feature extraction network ResNet50 extracts features (C1, C2, C3, C4 and C5) of different scales of an input image (inut) through 5 stages, then the features are laterally connected and enter a feature fusion network FPN to fuse the features of different scales to obtain (P5, P4, P3 and P2), and each layer is followed by 3-by-3 convolution to eliminate aliasing effect caused by upsampling. The feature map (feature map) is then entered into the RPN network to extract candidate region boxes (explosals). After the Roi Align operation, the feature map is transformed to the same size of 7 × 7, and the number of channels is 256. Then, the damage (damages) type and the component (components) type are output through 1 full connection (fc) head with 1 × 1024 and one classification number of 1 × 1. At the same time, 7 × 256 feature maps are passed through a convolution (conv) header and a fully connected layer to generate a detection box (box). Mask branches are processed through 4 convolution layers and two deconvolution layers to finally obtain the damage Mask with the size of 28 × classification number.
3 x 3 convolution of the feature extraction network part is replaced by DCnv2, and the modeling capacity of geometric transformation is enhanced; the CARAFE sampling method is adopted to reduce errors in the sampling process; a classified branch is expanded behind the RPN to realize multi-detection of damage and parts; the full-connection head of the model for regression is replaced by the convolution head, and the positioning error is reduced.
Compared with the method for detecting the damage and the part by using two models respectively, the method has the advantages that the detection results of the part where the damage and the part are located are directly output through one model, an additional combination process is not needed, and the method is efficient and concise.
The 3 × 3 convolution of the feature extraction network part is replaced by DCNv2, which is specifically as follows:
the feature extraction network ResNet50 is composed of 50 convolutional layers of 18 sub-modules, which can be divided into 5 stages. Each submodule consists of one 1 × 1 convolution, 13 × 3 convolution, one 1 × 1 convolution and one residual concatenation. The convolution of 3 × 3 of the neutron modules in stage3, stage4 and stage5 was all changed to DCNv 2.
DCNv2 is a modified version of DCN. DCN is a kind of deformable convolution that enhances the feature extraction capability of the network by adding an offset (offset) to the convolutional layer. The offset is learned under the guidance of supervision information, so that the network can focus more attention on positions related to training targets when extracting features, and better cover targets with different sizes and shapes. DCnv2 not only adds an offset to each sampling position, but also adds different weights to each sampling position on the basis of DCN, thereby further enhancing the modeling capability of the geometric transformation.
After the features of different scales are extracted by the feature extraction network ResNet50, feature maps with high-level semantic information of different sizes are constructed through a top-down network structure (FPN) with lateral connection. The top-down network is mainly composed of up-sampling operation, the up-sampling method adopted in the original model is nearest neighbor interpolation, when the picture is enlarged, the missing pixels are generated by directly using the original color nearest to the missing pixels, and the method of moving the adjacent pixels causes the picture to generate obvious visible saw teeth. The interpolation upsampling method is replaced by CARAFE, so that errors caused by upsampling operation in the model can be effectively reduced.
The CARAFE sampling method specifically comprises the following steps:
for a feature map with a shape of H W C, H being the length of the feature map, W being the width of the feature map, and C being the number of channels of the feature map, first, it is compressed to C by a convolution with 1W 1m,Cm<C, reducing calculation of subsequent stepsAmount, then using a kencoder*kencoderPredicting the upsampled kernel, k, by the convolutional layer ofencoderFor the convolution kernel size, the number of input channels is CmThe number of output channels is sigma2*kup 2σ is the up-sampling magnification, kupFor up-sampling kernel size, channel dimensions are expanded in the spatial dimension to obtain a shape of σ H σ W kup 2The upsampling kernel is normalized by utilizing softmax, each pixel point in the output characteristic diagram is mapped back to the input characteristic diagram, and k taking the k as the center is taken outup*kupAnd performing dot product on the predicted upsampling kernel of the pixel point to obtain an output value, wherein different channels at the same position share the same upsampling kernel. .
After the RPN network, a branch of part classification is added, specifically as follows:
after passing through the RPN network, the feature map is processed through a Roi Align operation to obtain a feature map with a fixed size of 7 × 256. And then the damage classification result can be output through the two full connection layers. In order to output the part classification result of the damage at the same time, a part branch is added, the part branch and the damaged classification branch share a full-connection head, and weight is shared.
The specific process of the RPN network is as follows:
each pixel point firstly generates 9 anchor frames with different length-width ratios, the characteristic graph is changed into 256 × 16 after being convoluted by 3 × 3, then an 18 × 16 characteristic graph and a 36 × 16 characteristic graph are respectively obtained after being convoluted by two times of 1 × 1, namely 16 × 9 results, each result comprises 2 scores and 4 coordinates, the two scores are the scores of the foreground and the background respectively, the anchor frame with the foreground score being more than 0.5 is selected and reserved as a positive sample according to the scores, and the anchor frame with the background score being more than 0.5 is selected and reserved as a negative sample.
And the Roi Align operation is to readjust the feature map to a fixed size, to correspond the original map to the feature map, and to map the feature map to the fixed feature map after transformation. In the improved method of Roi Align, when the characteristic diagram is mapped to a fixed size, decimal possibly occurs through proportional calculation, and pixel points have no decimal. If the rounding is directly performed, a certain error is caused, and the error is greatly increased when being fed back to the original drawing. The Roi Align method is that if the decimal is obtained, meaning that the decimal does not fall on a real pixel point, the adjacent pixel point is used for carrying out bilinear interpolation on the virtual pixel point to obtain the value of the pixel point, and therefore errors caused by direct rounding are avoided.
The full-connection header used for regression in the detection header is replaced by a convolution header, and the method specifically comprises the following steps:
in the Mask R-CNN model, the regression and classification of the detection frame share a fully connected header, and the detection header comprises the regression and classification of the detection frame and outputs a Mask. Research shows that the fully connected header is suitable for classification tasks, and the convolution header is more suitable for regression tasks. Therefore, the regression of the detection frame is removed from the full-connection header, and a convolution header consisting of 3 residual blocks and 2 Non-local blocks is added from the back of the RPN network for regression of the detection frame. After the Mask branch takes the front detection frame, the object is divided on the basis of the front detection frame, firstly, each frame generates a feature map of 14 × 256 through convolution, after multiple times of convolution, the feature map is changed into 28 × 256 through deconvolution operation, and finally, a Mask of 28 × classification number is output. The detection head needs to output the contents of the damage category and the score, the part category where the damage exists, the detection frame and the damage mask.
As shown in fig. 3, the convolution header is composed of 3 Residual blocks (Residual blocks) and 2 Non-local neural network (Non-local) blocks. In principle, the deeper the network, the better, but the deeper the network, the longer the detection time, in order to balance speed and time, we choose 3 residual modules and 2 Non-local modules. The first residual module needs to change the dimension of the feature map and consists of a 1 × 1 convolution, a 3 × 3 convolution and a residual join with a 1 × 1 convolution. The last two residual blocks are identical to the sub-blocks in ResNet, and consist of a 1 × 1 convolution, a 3 × 3 convolution, a 1 × 1 convolution and a residual connection, and the dimensions remain unchanged.
As shown in FIG. 4, the Non-local module is as follows:
the dimension of the input feature graph x is N x H x W C, the feature graphs k, q and v with the dimensions of N x H x W C/2 are obtained by respectively convolving convolution kernels with the channel number of C/2 and the dimension of 1 x 1, the dimension is reduced to C/2 to improve the efficiency of the subsequent calculation, then the k and the q are subjected to matrix multiplication to obtain an output with the dimension of N HW, the output is subjected to softmax processing and then subjected to matrix multiplication with the output of the third branch to obtain an output with the dimension of N HW C/2, the output is subjected to convolution kernel number of C after being processed into the output with the dimension of N x H W C/2, the convolution layer with the dimension of 1 x 1C is subjected to residual connection with the original output of N x H x W element, and the residual current is obtained by performing rectification on the output of the final residual current of the volume of the, keeping consistent with the original input dimension.
The Non-local module is used as a simple, efficient and universal component for capturing long-distance dependency of the neural network. Unlike the limited receptive field of convolution, Non-local can weight the sum of the features of all locations in the feature map for a location as the response value for that location, and is not limited to neighboring points. Very good performance is obtained with a small number of layers.
And S3, sending the training set into a multi-detection model for training to obtain a weight file.
The training set is first data enhanced. Data annotation is a very tedious task, and training of models often relies on large amounts of data. The data enhancement can play a role of data expansion, so that a limited data set plays a greater role. The data enhancement operations used in this model include:
resize: in a batch size, the input size is fixed, so pictures in a batch size usually need to be resized to the same size. Max _ size is set to 1333, and original min _ szie is set to 800, and min _ size is set to be randomly selected from (640,672,704,736,768,800) these numbers in order to function as an extension data set. Experiments show that the method can effectively improve the detection effect of the model;
horizon _ file: the pictures were randomly horizontally flipped with a probability of 0.5.
After parameters such as training times, learning rate and the like are set, the training model is started, and the method specifically comprises the following steps:
before training, 90000 times of training are set, and the training times are optimal according to experiments. The batch _ size is set to 4, which can also be adjusted according to the memory of the graphics card. The setting of the initial learning rate lr follows lr × batch _ size ═ 0.01. In order to achieve a better training effect, a method of war _ up is adopted for adjusting the learning rate, and the learning rate is multiplied by 0.1 when the training is carried out for 60000 times, and the learning rate is multiplied by 0.1 when the training is carried out for 80000 times.
The arm _ up method is a learning rate warm-up method. When the training is started, the weight of the model is initialized randomly, and the large learning rate can make the training of the model unstable and generate oscillation. The method comprises the steps of firstly selecting a smaller learning rate when training is started, and then selecting a preset learning rate for training after the model is relatively stable, so that the model convergence effect is better.
To speed up model convergence and prevent overfitting, the ResNet50 pre-trained model parameters are used as initial weights for the present feature extraction network. And loading data from the training set and starting training according to the set parameters.
During training, the output result of the detection head and the label are subjected to loss calculation:
the classification of the damage and the parts adopts a cross entropy loss function; the detection frame regression adopts smooth _ L1 loss function; the mask uses a binary cross entropy loss function. And performing inverse gradient propagation after the Loss addition, and updating the model weight.
And S4, detecting the damaged vehicle picture based on the weight file obtained by model training.
And adjusting the images to be detected in the test set to be the same size, setting min _ size to be 800 pixels and max _ size to be 1333 pixels. And setting a threshold value of NMS (non-maximum suppression algorithm) to be 0.5, detecting the test set based on a final weight file obtained by model training, and detecting and outputting the types and scores score, the types of the parts where the damage is located, a detection frame and a damage mask. Setting the threshold value of the damage score to be 0.3, removing the objects with the damage score lower than 0.3 in prediction, and only keeping the objects with the damage score higher than 0.3 to obtain the final detection result. And loading the damage result to an original image to output a damage assessment image, and generating a corresponding json file at the same time. And the detection results are simultaneously stored as pth files so as to be capable of calculating the average mean accuracy mAP of all the categories by comparing box and mask respectively.
The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims (5)

1. An intelligent vehicle loss assessment method based on improved Mask R-CNN is characterized by comprising the following steps:
s1, after the damage type and the part marking are carried out on the vehicle damage picture, a marking data set in a coco format is manufactured, and a training set and a testing set are divided;
s2, constructing a multi-detection model which is an improved Mask R-CNN; the improvement of the Mask R-CNN comprises the steps of replacing 3 x 3 convolution of a feature extraction network part with DCnv2, replacing an interpolation upsampling method with a CARAFE sampling method, adding a branch of part classification behind an RPN network, and replacing a full-connection header for frame regression in a detection header with a convolution header;
s3, sending the training set into a multi-detection model for training to obtain a weight file;
and S4, detecting the damaged vehicle picture based on the obtained weight file to obtain a final damage assessment picture.
2. The vehicle intelligent damage assessment method based on the improved Mask R-CNN as claimed in claim 1, wherein in step S1, the damages existing in the vehicle damage picture are marked out by using marking software, including the contour, the damage type, and the type of the part where the damage is located; and after labeling, acquiring a json file containing labeling information, dividing the json file and the corresponding original image file into a training set and a test set according to the ratio of 9:1, and converting the training set and the test set into a coco data set format.
3. The vehicle intelligent damage assessment method based on improved Mask R-CNN as claimed in claim 1, wherein in step S2, the 3 x 3 convolution of the feature extraction network part of the model is changed to DCnv2, specifically: 3 x 3 convolution of submodules in stage3, stage4 and stage5 in a feature extraction network ResNet50 is replaced by DCnv2, DCnv2 is an improved version of DCN, and DCnv2 not only adds offset to each sampling position but also adds different weights to each sampling position on the basis of DCN, so that the modeling capability of geometric transformation is further enhanced;
the interpolation upsampling method is replaced by the CARAFE sampling method, so that errors caused by upsampling operation in the model can be effectively reduced;
adding a branch of part classification behind the RPN, and paralleling the branch of damage classification with the branch of part classification, specifically: after passing through the RPN, the characteristic diagram is classified by two full-connection layer outputs, the two full-connection layers form a full-connection head, damage and parts are classified into two categories, and the output of two branches is needed, so that a branch is added behind the full-connection layer so as to simultaneously output the damage category and the category of the parts where the damage is located, and the branch of the damage category shares parameters and the full-connection head with the branch of the part category;
will detect that the head that is used for the frame to return is changed into the convolution head for the full head, specifically is: the regression and classification of the detection frame in the original Mask R-CNN model share a full-connection header, the regression of the detection frame is removed from the full-connection header, meanwhile, a convolution header is added behind the RPN network and used for the regression of the detection frame, and the convolution header consists of 3 residual modules and 2 Non-local modules.
4. The improved Mask R-CNN-based intelligent damage assessment method according to claim 1, wherein the step S3 comprises the following steps:
s301, adjusting the input picture to a specified size, setting the maximum size to be 800 pixel points, randomly selecting the minimum size from (640,672,704,736,768,800), and randomly horizontally turning the image at a probability of 0.5;
s302, loading a training set by a plurality of detection models;
s303, setting training times and learning rate parameters to start training the model;
s304, calculating Loss by the model, performing back propagation to update the weight, and storing the final model weight after training to obtain a weight file; wherein Loss is expressed as:
Loss=Ldamages+Lcomponents+Lbbox+Lmask
Figure FDA0003341960160000021
Figure FDA0003341960160000031
Figure FDA0003341960160000032
Figure FDA0003341960160000033
Figure FDA0003341960160000034
in the formula, LdamagesRepresents a damage classification loss, LcomponentsRepresents a part classification loss, LbboxIndicates the regression loss of the detection box, LmaskRepresents pixel segmentation mask loss; i is the index of the anchor frame, pi *Indicating whether the ith anchor frame contains the target or not when the ith anchor frame contains the targetWhen the frame contains an object, pi *Equal to 1; when in the ith anchor frame is background, pi *Is 0; p is a radical ofiRepresenting the probability value of the target in the ith anchor box; v. ofiParameterized vector representing coordinates of center point and width and height of ith anchor frame prediction, viIs the parameterized vector of the ith label bounding box; n iscIs the number of classes of lesions, nijPredicting the number of lesions of type j for a pixel belonging to type iiiRepresenting the number of pixels belonging to and predicted as the ith class, i.e. the number of pixels for which the ith class is predicted correctly, niA total number of detection boxes representing predicted class i lesions; n is a radical ofdamages、Ncomponents、NbboxAre constants used for normalization.
5. The improved Mask R-CNN-based intelligent damage assessment method according to claim 1, wherein the step S4 comprises the following steps:
s401, loading a test set, namely a vehicle damage picture to be detected, by multiple detection models, and detecting the test set based on a weight file obtained by model training;
s402, filtering out repeated detection frames by using a non-maximum suppression algorithm NMS;
s403, outputting a damage classification and score, a part classification, a detection frame and a damage mask by the model detection head; filtering out detection frames with partial damage classification scores lower than the threshold value again according to the set score threshold value, and loading the rest detection frames, the corresponding damage classification and score, part classification and damage mask on an original picture to obtain a final damage assessment picture;
s404, evaluating the detection result, and calculating a corresponding evaluation index: precision, Recall, F1 value F1-score and mean of average Precision mapp.
CN202111311347.4A 2021-11-08 2021-11-08 Intelligent damage assessment method for vehicle based on improved Mask R-CNN Active CN114219757B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111311347.4A CN114219757B (en) 2021-11-08 2021-11-08 Intelligent damage assessment method for vehicle based on improved Mask R-CNN

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111311347.4A CN114219757B (en) 2021-11-08 2021-11-08 Intelligent damage assessment method for vehicle based on improved Mask R-CNN

Publications (2)

Publication Number Publication Date
CN114219757A true CN114219757A (en) 2022-03-22
CN114219757B CN114219757B (en) 2024-05-10

Family

ID=80696552

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111311347.4A Active CN114219757B (en) 2021-11-08 2021-11-08 Intelligent damage assessment method for vehicle based on improved Mask R-CNN

Country Status (1)

Country Link
CN (1) CN114219757B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117671330A (en) * 2023-11-14 2024-03-08 平安科技(上海)有限公司 Vehicle damage assessment method, device, computer equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112802005A (en) * 2021-02-07 2021-05-14 安徽工业大学 Automobile surface scratch detection method based on improved Mask RCNN
CN113205026A (en) * 2021-04-26 2021-08-03 武汉大学 Improved vehicle type recognition method based on fast RCNN deep learning network

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112802005A (en) * 2021-02-07 2021-05-14 安徽工业大学 Automobile surface scratch detection method based on improved Mask RCNN
CN113205026A (en) * 2021-04-26 2021-08-03 武汉大学 Improved vehicle type recognition method based on fast RCNN deep learning network

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117671330A (en) * 2023-11-14 2024-03-08 平安科技(上海)有限公司 Vehicle damage assessment method, device, computer equipment and storage medium
CN117671330B (en) * 2023-11-14 2024-06-21 平安科技(上海)有限公司 Vehicle damage assessment method, device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN114219757B (en) 2024-05-10

Similar Documents

Publication Publication Date Title
CN109584248B (en) Infrared target instance segmentation method based on feature fusion and dense connection network
CN110176027B (en) Video target tracking method, device, equipment and storage medium
CN111640125B (en) Aerial photography graph building detection and segmentation method and device based on Mask R-CNN
US11315253B2 (en) Computer vision system and method
CN108416266B (en) Method for rapidly identifying video behaviors by extracting moving object through optical flow
CN109583483B (en) Target detection method and system based on convolutional neural network
CN112132156B (en) Image saliency target detection method and system based on multi-depth feature fusion
CN109712165B (en) Similar foreground image set segmentation method based on convolutional neural network
CN113469073A (en) SAR image ship detection method and system based on lightweight deep learning
CN113052006B (en) Image target detection method, system and readable storage medium based on convolutional neural network
CN111612008A (en) Image segmentation method based on convolution network
CN115131797B (en) Scene text detection method based on feature enhancement pyramid network
CN111768415A (en) Image instance segmentation method without quantization pooling
CN112927209B (en) CNN-based significance detection system and method
CN117253154B (en) Container weak and small serial number target detection and identification method based on deep learning
CN115631344B (en) Target detection method based on feature self-adaptive aggregation
CN113449691A (en) Human shape recognition system and method based on non-local attention mechanism
CN113591719A (en) Method and device for detecting text with any shape in natural scene and training method
CN115294356A (en) Target detection method based on wide area receptive field space attention
CN114998756A (en) Yolov 5-based remote sensing image detection method and device and storage medium
CN116863194A (en) Foot ulcer image classification method, system, equipment and medium
CN111832508B (en) DIE _ GA-based low-illumination target detection method
CN114219757B (en) Intelligent damage assessment method for vehicle based on improved Mask R-CNN
CN112446292B (en) 2D image salient object detection method and system
GB2623387A (en) Learnable image transformation training methods and systems in graphics rendering

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant