CN116703835A - Intelligent reinforcement detection method and system based on convolutional neural network and binocular vision - Google Patents
Intelligent reinforcement detection method and system based on convolutional neural network and binocular vision Download PDFInfo
- Publication number
- CN116703835A CN116703835A CN202310578442.3A CN202310578442A CN116703835A CN 116703835 A CN116703835 A CN 116703835A CN 202310578442 A CN202310578442 A CN 202310578442A CN 116703835 A CN116703835 A CN 116703835A
- Authority
- CN
- China
- Prior art keywords
- mask
- reinforcement
- module
- neural network
- convolutional neural
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000002787 reinforcement Effects 0.000 title claims abstract description 67
- 238000013527 convolutional neural network Methods 0.000 title claims abstract description 51
- 238000001514 detection method Methods 0.000 title claims abstract description 45
- 229910000831 Steel Inorganic materials 0.000 claims abstract description 43
- 239000010959 steel Substances 0.000 claims abstract description 43
- 229910001294 Reinforcing steel Inorganic materials 0.000 claims abstract description 23
- 238000000034 method Methods 0.000 claims abstract description 16
- 230000011218 segmentation Effects 0.000 claims abstract description 16
- 238000005516 engineering process Methods 0.000 claims abstract description 10
- 230000000007 visual effect Effects 0.000 claims abstract description 7
- 238000012549 training Methods 0.000 claims description 19
- 239000011159 matrix material Substances 0.000 claims description 18
- 238000000605 extraction Methods 0.000 claims description 16
- 238000010586 diagram Methods 0.000 claims description 14
- 230000007246 mechanism Effects 0.000 claims description 14
- 238000004364 calculation method Methods 0.000 claims description 10
- 238000012545 processing Methods 0.000 claims description 8
- 238000012360 testing method Methods 0.000 claims description 8
- 230000000694 effects Effects 0.000 claims description 5
- 238000004519 manufacturing process Methods 0.000 claims description 4
- 238000006243 chemical reaction Methods 0.000 claims description 3
- 238000003708 edge detection Methods 0.000 claims description 3
- 238000011156 evaluation Methods 0.000 claims description 3
- 238000003384 imaging method Methods 0.000 claims description 3
- 238000002372 labelling Methods 0.000 claims description 3
- 230000007935 neutral effect Effects 0.000 claims description 3
- 238000011176 pooling Methods 0.000 claims description 3
- 238000012216 screening Methods 0.000 claims description 3
- 238000013508 migration Methods 0.000 claims description 2
- 230000005012 migration Effects 0.000 claims description 2
- 238000005070 sampling Methods 0.000 claims description 2
- 238000007689 inspection Methods 0.000 abstract description 3
- 230000003014 reinforcing effect Effects 0.000 abstract description 2
- 230000009466 transformation Effects 0.000 description 6
- 230000008569 process Effects 0.000 description 3
- PXFBZOLANLWPMH-UHFFFAOYSA-N 16-Epiaffinine Natural products C1C(C2=CC=CC=C2N2)=C2C(=O)CC2C(=CC)CN(C)C1C2CO PXFBZOLANLWPMH-UHFFFAOYSA-N 0.000 description 2
- 239000004567 concrete Substances 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 235000012434 pretzels Nutrition 0.000 description 2
- 238000012163 sequencing technique Methods 0.000 description 2
- 238000013519 translation Methods 0.000 description 2
- 230000032683 aging Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013434 data augmentation Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000012938 design process Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000000691 measurement method Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000011897 real-time detection Methods 0.000 description 1
- 239000011150 reinforced concrete Substances 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000035882 stress Effects 0.000 description 1
- 238000013526 transfer learning Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/0002—Inspection of images, e.g. flaw detection
- G06T7/0004—Industrial image inspection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/096—Transfer learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/60—Analysis of geometric attributes
- G06T7/66—Analysis of geometric attributes of image moments or centre of gravity
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/60—Analysis of geometric attributes
- G06T7/68—Analysis of geometric attributes of symmetry
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
- G06V10/267—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/28—Quantising the image, e.g. histogram thresholding for discrimination between background and foreground patterns
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/75—Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/30—Computing systems specially adapted for manufacturing
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Geometry (AREA)
- Quality & Reliability (AREA)
- Image Analysis (AREA)
Abstract
The application discloses an intelligent reinforcement detection method and system based on a convolutional neural network and binocular vision. The method comprises the following steps: s1: acquiring RGB image and depth image data of the steel bar by using a depth camera; s2: inputting RGB images of the reinforcing steel bars into a convolutional neural network for reinforcing steel bar identification, and obtaining a prediction boundary frame and a mask of the reinforcing steel bars; s3: based on the reinforcement recognition result, reinforcement detection is performed by utilizing a binocular vision technology, and a visual reinforcement quality acceptance result is output. The convolutional neural network is used for reinforcing steel bar identification, so that the accuracy of reinforcing steel bar target detection and segmentation is improved; and combining with binocular vision technology, make intelligent reinforcement detection possess real-time, output visual result simultaneously, assist the staff to carry out quality inspection and acceptance in reinforcing bar concealing engineering, promote work efficiency greatly, reduce the cost of labor.
Description
Technical Field
The application relates to the technical field of deep learning and computer vision, in particular to an intelligent reinforcement detection method and system based on convolutional neural network and binocular vision.
Background
The reinforced concrete structure is the most widely used structural form in the existing engineering structure due to the advantages of easy material obtaining, strong plasticity, reasonable material stress, simple construction process, low cost and the like. In the structural design process, the bearing capacity of the components is guaranteed by controlling the diameters and the spacing of the reinforcing steel bars, and reinforcement concealed engineering acceptance is required before concrete pouring, namely whether the reinforcement binding specifications, the number and the spacing meet the design requirements is checked. The traditional reinforcement arrangement detection mainly relies on a manual measurement method, so that the detection precision and the detection range are limited, and potential safety hazards exist in the detection process of the construction site. Under the background of labor shortage and serious ageing of practitioners, the traditional method for detecting reinforcing bars needs to be transformed into intelligent.
The steel bar detection method based on the digital image processing technology is easily affected by factors such as illumination, background, shielding and the like, the precision can not meet the requirements of actual engineering, and the real-time detection effect is poor. Along with the development of laser scanning technology and equipment, a method for realizing high-precision measurement by utilizing three-dimensional point cloud is widely applied to the field of civil engineering, but the method is limited in practical application due to the fact that the laser scanning equipment is high in price, the data acquisition and calculation processes are complex and the like.
In recent years, due to strong feature learning capability, a target detection and example segmentation algorithm based on a convolutional neural network is widely applied, and good effects such as prefabricated part identification, reinforcement binding net point positioning, reinforcement section counting and the like are obtained.
The patent specification with publication number of CN113269718A discloses a concrete precast member crack detection method based on deep learning, which comprises the steps of collecting crack image data, preprocessing the collected crack image sample, and manually marking the sample; the marked data samples are subjected to data augmentation and are divided into a training set, a verification set and a test set; building a convolutional neural network model; training, verifying and testing the convolutional neural network model to obtain a final algorithm model; and detecting the crack image to be detected by using the algorithm model to obtain a detection result.
The patent specification with the publication number of CN115222652A discloses a method for identifying, counting and centering end faces of bundled reinforcing steel bars, which comprises the following steps of S1, shooting images of the end faces of the reinforcing steel bars, and obtaining images to be identified after processing; s2, performing data enhancement operation on the image to be identified by adopting a first preset algorithm; s3, forming a final detection frame in the image to be identified by adopting a second preset algorithm with a lightweight convolutional neural network, and calculating the number of the final detection frames; s4, generating a counting result.
Disclosure of Invention
Aiming at the defects of the existing steel bar detection technology, the application provides an intelligent reinforcement detection method based on a convolutional neural network and binocular vision, which utilizes an improved Mask R-CNN instance segmentation model to improve the accuracy of steel bar identification, combines the binocular vision technology to output a visual reinforcement detection result, assists staff in quality inspection and acceptance in steel bar hiding engineering, greatly improves work efficiency and reduces labor cost.
An intelligent reinforcement detection method based on convolutional neural network and binocular vision comprises the following steps:
s1: acquiring RGB image and depth image data of the steel bar by using a depth camera;
s2: inputting RGB images of the reinforcing steel bars into a convolutional neural network for reinforcing steel bar identification, and obtaining a prediction boundary frame and a mask of the reinforcing steel bars;
s3: based on the reinforcement recognition result, reinforcement detection is performed by utilizing a binocular vision technology, and a visual reinforcement quality acceptance result is output.
In a preferred embodiment, in step S1, depth image data of the steel bar is obtained by the following steps:
s1.1: performing stereo matching on left and right eye images of a depth camera to obtain a parallax image;
s1.2: according to the relation between depth and parallax, converting the parallax image into a depth image, wherein the calculation formula of the depth z is as follows:
wherein f is the focal length of the depth camera, b is the base line length of the depth camera, d is the parallax of the left and right eye images, and x l Is the abscissa of the projection point of the left-eye camera, x r Is the abscissa of the projected point of the right eye camera.
In a preferred embodiment, the step S2 specifically includes the steps of:
s2.1: the method comprises the steps of collecting original pictures of reinforcing steel bars by using camera equipment, manufacturing reinforcing steel bar mask labels by using a manual labeling method, dividing a data set into a training set and a testing set, and amplifying the data set through data enhancement;
s2.2: pre-training on an improved Mask R-CNN model by using a public data set COCO2017, and initializing network parameters based on a migration learning principle;
s2.3: training the improved Mask R-CNN model established in the step S2.2 through the data set in the step S2.1, and constructing a reinforcement instance segmentation model;
s2.4: and inputting the RGB image of the steel bar acquired by the depth camera into a steel bar example segmentation model to acquire a prediction boundary frame and a mask of the steel bar.
In step S2.1, the data enhancement refers to a geometric transformation operation of random translation, rotation, mirroring, affine transformation, and a pixel transformation operation of random brightness adjustment, contrast adjustment, HSV adjustment, gaussian noise addition, pretzel noise addition, and the like.
Further preferably, in step S2.2, the improved Mask R-CNN model includes an optimized feature extraction module, an RPN module, an ROI alignment module, and an output branch;
the optimized feature extraction module is a CA-SA module which is formed by adding a bottom-up propagation path and a channel attention CA module and a space attention mechanism SA module into a feature pyramid structure ResNet-FPN based on a residual error network; in the CA module, after a feature map with height of H, width of W and channel number of C is input into a global average pooling layer, space dimensions W and H are compressed into a unit of 1, then the obtained 1 multiplied by C feature map is subjected to convolution operation, the sum of channels is 1 through softmax processing, the output at the moment is the attention mechanism weight of each channel, and the weight is multiplied by the input feature map on the channel correspondingly to obtain an output feature map; in the SA module, after the feature map is subjected to 1X 1 convolution and softmax processing, the channel dimension is compressed into a unit 1, the SA module acquires a weight matrix of the H X W dimension feature map on a two-dimensional plane, the weight matrix corresponds to the spatial attention mechanism weight of each pixel point, represents the importance degree of spatial position information, and endows the input feature map with the weight matrix to amplify important features and weaken background information, so that the effects of feature screening and enhancement are realized; the bottom-up propagation path specifically refers to: transmitting the characteristic information of P2 to N2, N2 to perform 3X 3 convolution on the { P2, P3, P4 and P5} characteristic diagram obtained based on ResNet-FPN, down-sampling the height and width to the size of P3, adding the height and width with P3 element by element, sending the obtained result to a CA-SA module to obtain N3, and further extracting N4 and N5 on the P4 and P5 characteristic diagrams so as to obtain { N2, N3, N4 and N5} characteristic diagrams;
generating a feature map by the image through the optimized feature extraction module; the RPN module further generates a strong priori anchor point frame for each point on the feature map, and obtains the classification score and the bounding box regression quantity of the anchor point frame through 1X 1 convolution, so that a group of better candidate frames are screened out and input into the ROI alignment module; the ROI alignment module transforms the feature map generated by the optimized feature extraction module and the candidate frames screened by the RPN module to the same dimension so as to meet the requirement of the follow-up full convolution network on the input features; and finally, inputting the characteristics obtained by the ROI alignment module into a full-connection layer, and respectively outputting the prediction category score, the bounding box regression quantity and the pixel mask of the object at the classification branch, the bounding box regression branch and the mask branch, thereby completing the whole detection and segmentation task.
Further preferably, the step S2.3 specifically includes the steps of:
s2.3.1: setting an evaluation index of an improved Mask R-CNN model as mAP calculated by a COCO2017 data set definition method, namely using the average value of the sum of all average accuracy rates (AP) under different cross ratio thresholds (0.50:0.05:0.95), wherein the calculation formula of the AP is specifically as follows:
wherein P represents the proportion predicted correctly among all predicted rebar targets, and R represents the proportion predicted as a positive sample among all actual correct rebar targets;
s2.3.2: respectively taking the reinforcement picture and the reinforcement Mask label in the training set as input and output to be transmitted into an improved Mask R-CNN model to obtain weight parameters of a reinforcement instance segmentation model; carrying out boundary frame and mask prediction on the reinforced bar pictures in the test set by using the obtained weight parameters, carrying out loss and mAP index calculation on the result and the reinforced bar mask label true value, and adjusting the weight parameters; and obtaining a weight parameter corresponding to the maximum mAP until training is finished, and using the weight parameter for subsequent reinforcement recognition.
In a preferred embodiment, the step S3 specifically includes the steps of:
s3.1: based on the reinforcement recognition result, binarizing each mask image, extracting pixel point coordinates of the edges of each mask image by using an edge detection algorithm, and further calculating the pixel point coordinates of the center line of each mask by using a center axis conversion principle;
s3.2: calculating the normal vector of the center line of each mask by using a k nearest neighbor algorithm, dividing the mask with the horizontal component of the center line normal vector smaller than the vertical component into the up-down direction, and dividing the mask with the horizontal component of the center line normal vector larger than or equal to the vertical component into the left-right direction; then using the neutral line coordinates to automatically sequence the reinforcement masks according to the sequence from top to bottom and from left to right;
s3.3: extending the normal vector of the line pixel points along the mask to the edges of both sides, extracting the paired pixel points of the edges by utilizing linear interpolation, and calculating the diameters of the steel bars; similarly, extending the normal vector of the pixel points in the mask center line to the adjacent mask center line, and extracting the paired pixel points in the mask center line for calculating the distance between the steel bars;
s3.4: aligning the RGB image of the steel bar acquired by the depth camera with the depth image, and acquiring the depth information of each pixel point in the RGB image; converting the paired pixel points of the edge and the center line extracted in the step S3.3 from a pixel coordinate system to a camera coordinate system by using a camera internal reference matrix;
s3.5: substituting the camera coordinates of the paired pixel points of the edge and the central line of the steel bar obtained in the step S3.4 into a space distance formula, calculating the actual diameter and the actual distance of the steel bar, and outputting a visualized steel bar quality acceptance result.
Further preferably, in step S3.4, the specific formula of the camera internal reference matrix is:
wherein M represents a camera reference matrix, (u) 0 ,v 0 ) D is the coordinate of the central point of the RGB image in the pixel coordinate system x 、d y For the length of a single pixel point in the x-axis and the y-axis, f is the focal length of the depth camera, f x 、f y The number of pixel points represented by each f length on the x axis and the y axis of the imaging plane.
Further preferably, in step S3.5, the spatial distance formula is specifically:
wherein D is the distance between any two points in space, x 1 、y 1 、z 1 Is the x, y, z coordinates of point 1, x 2 、y 2 、z 2 Is the x, y, z coordinates of point 2。
The application also provides an intelligent reinforcement detection system based on the convolutional neural network and the binocular vision, and the system can execute the intelligent reinforcement detection method based on the convolutional neural network and the binocular vision.
Compared with the prior art, the application has the beneficial effects that:
(1) The application improves Mask R-CNN, adds a bottom-up propagation path into a feature extraction module, embeds a CA-SA module combining channel attention and space attention, strengthens fusion of shallow and deep feature information, and simultaneously gives a larger weight coefficient to a channel with high target response through a channel attention mechanism, wherein the space attention mechanism enables a target pixel to be a key point of feature extraction, thereby improving accuracy of detection and segmentation of a steel bar target.
(2) The application combines the binocular vision technology, so that the intelligent reinforcement detection has real-time performance, and simultaneously outputs a visual result, thereby assisting the staff in quality inspection and acceptance in reinforcement concealing engineering, greatly improving work efficiency and reducing labor cost.
Drawings
FIG. 1 is a flow chart of an intelligent reinforcement detection method based on convolutional neural network and binocular vision;
fig. 2 is image data of a steel bar acquired by a depth camera according to an embodiment, wherein (a) is an RGB diagram of the steel bar and (b) is a depth diagram of the steel bar;
FIG. 3 is a schematic diagram of a Mask R-CNN network architecture;
FIG. 4 is a diagram of a CA module architecture for a channel attention mechanism of an embodiment;
FIG. 5 is a diagram of a spatial attention mechanism SA module according to one embodiment;
FIG. 6 is a bottom-up attention mechanism path block diagram of an embodiment;
fig. 7 is an output result of the intelligent reinforcement detection method in the embodiment, wherein (a) is a reinforcement prediction result based on the improved Mask R-CNN, and (b) is a visualization result of reinforcement quality detection, and the diameter and the spacing of the reinforcement are intuitively shown.
Detailed Description
The application will be further elucidated with reference to the drawings and to specific embodiments. It is to be understood that these examples are illustrative of the present application and are not intended to limit the scope of the present application.
As shown in fig. 1, an intelligent reinforcement detection method based on convolutional neural network and binocular vision includes the steps:
s1: the RGB image and the depth image data of the steel bar are acquired by using a depth camera, and the depth image data of the steel bar is acquired specifically through the following steps:
s1.1: performing stereo matching on left and right eye images of a depth camera to obtain a parallax image;
s1.2: according to the relation between depth and parallax, converting the parallax image into a depth image, wherein the calculation formula of the depth z is as follows:
wherein f is the focal length of the depth camera, b is the base line length of the depth camera, d is the parallax of the left and right eye images, and x l Is the abscissa of the projection point of the left-eye camera, x r Is the abscissa of the projected point of the right eye camera.
By way of example, fig. 2 (a) is an RGB diagram of a rebar and fig. 2 (b) is a depth diagram of a rebar.
S2: the RGB image of the reinforcing steel bar is input into a convolutional neural network for reinforcing steel bar identification, and a prediction boundary frame and a mask of the reinforcing steel bar are obtained, which specifically comprises the following steps:
s2.1: manufacturing a steel bar data set; the method comprises the steps of collecting original pictures of reinforcing steel bars by using camera equipment, manufacturing reinforcing steel bar mask labels by using a manual labeling method, randomly dividing data into a training set and a testing set according to a ratio of 7:3, and amplifying the data set through data enhancement. The data enhancement refers to geometric transformation operations of random translation, rotation, mirroring and affine transformation, and pixel transformation operations of random brightness adjustment, contrast adjustment, HSV, gaussian noise increase, pretzel noise increase and the like.
S2.2: pre-training on an improved Mask R-CNN model using a public dataset COCO2017, initializing network parameters based on the principles of transfer learning.
The network structure of the Mask R-CNN is shown in fig. 3, and the Mask R-CNN model comprises a feature extraction module, an RPN module, an ROI alignment module and an output branch. Forming a feature extraction module by using a feature pyramid structure ResNet-FPN based on a residual error network, and generating a feature map by the image through the feature extraction module; the RPN module further generates a strong priori anchor point frame for each point on the feature map, and obtains the classification score and the bounding box regression quantity of the anchor point frame through 1X 1 convolution, so that a group of better candidate frames are screened out and input into the ROI alignment module; the ROI alignment module transforms the feature map generated by the optimized feature extraction module and the candidate frames screened by the RPN module to the same dimension so as to meet the requirement of the follow-up full convolution network on the input features; and finally, inputting the characteristics obtained by the ROI alignment module into a full-connection layer, and respectively outputting the prediction category score, the bounding box regression quantity and the pixel mask of the object at the classification branch, the bounding box regression branch and the mask branch, thereby completing the whole detection and segmentation task.
In the embodiment, the feature extraction module of the Mask R-CNN network structure is optimized to form an improved Mask R-CNN model. The optimized feature extraction module is a CA-SA module which is formed by adding a bottom-up propagation path and a channel attention CA module and a space attention mechanism SA module into a feature pyramid structure ResNet-FPN based on a residual network. In the CA module, after a feature map with height H, width W and channel number C is input into a global average pooling layer, space dimensions W and H are compressed into a unit 1, then convolution operation is carried out on the obtained 1×1×C feature map, and the sum of channels is made to be 1 through softmax processing, at the moment, output is attention mechanism weight of each channel, and the weight is multiplied by the input feature map correspondingly on the channels to obtain an output feature map. In the SA module, after the feature map is subjected to 1×1 convolution and softmax processing, the channel dimension is compressed into a unit 1, the SA module learns a weight matrix of the H×W size feature map on a two-dimensional plane, the weight matrix corresponds to the spatial attention mechanism weight of each pixel point, represents the importance degree of spatial position information, and gives the weight matrix of the input feature map to amplify important features and weaken background information, so that the effects of feature screening and enhancement are realized. The structure of the bottom-up propagation path is shown in fig. 6, and based on { P2, P3, P4, P5} feature maps obtained by the res net-FPN, feature information of P2 is transferred to N2, N2 is subjected to 3×3 convolution to downsample the width to the size of P3, and then added with P3 element by element to be sent to the CA-SA module to obtain N3, and so on, N4 and N5 are further extracted on the P4 and P5 feature maps, thereby obtaining { N2, N3, N4, N5} feature maps.
S2.3: training the improved Mask R-CNN model established in the step S2.2 through the data set in the step S2.1, and constructing a reinforcement instance segmentation model, wherein the training batch size is set to be 4, the initial learning rate is set to be 0.0005, and the training round (Epoch) is set to be 50, and the method specifically comprises the following steps:
s2.3.1: setting an evaluation index of an improved Mask R-CNN model as mAP calculated by a COCO2017 data set definition method, namely using the average value of the sum of all average accuracy rates (AP) under different cross ratio thresholds (0.50:0.05:0.95), wherein the calculation formula of the AP is specifically as follows:
wherein P represents the proportion predicted correctly among all predicted rebar targets, and R represents the proportion predicted as a positive sample among all actual correct rebar targets;
s2.3.2: respectively taking the reinforcement picture and the reinforcement Mask label in the training set as input and output to be transmitted into an improved Mask R-CNN model to obtain weight parameters of a reinforcement instance segmentation model; carrying out boundary frame and mask prediction on the reinforced bar pictures in the test set by using the obtained weight parameters, carrying out loss and mAP index calculation on the result and the reinforced bar mask label true value, and adjusting the weight parameters; and obtaining a weight parameter corresponding to the maximum mAP until training is finished, and using the weight parameter for subsequent reinforcement recognition.
S2.4: the RGB image of the steel bar acquired by the depth camera is input into the steel bar example segmentation model, and the prediction boundary box and the mask of the steel bar are obtained, and the result is shown in fig. 7 (a).
S3: based on the reinforcement recognition result, the reinforcement detection is carried out by utilizing a binocular vision technology, and a visual reinforcement quality acceptance result is output, and the method specifically comprises the following steps:
s3.1: based on the reinforcement recognition result, binarizing each mask image, extracting pixel point coordinates of the edges of each mask image by using an edge detection algorithm, and further calculating the pixel point coordinates of the center line of each mask by using a center axis conversion principle, wherein the upper left corner of the image is a pixel point coordinate origin (0, 0), and the right and downward directions are positive directions of a u axis and a v axis;
s3.2: calculating the normal vector of the center line of each mask by using a k nearest neighbor algorithm, dividing the mask with the horizontal component of the center line normal vector smaller than the vertical component into the up-down direction, and dividing the mask with the horizontal component of the center line normal vector larger than or equal to the vertical component into the left-right direction; then using the neutral line coordinates to automatically sequence the reinforcement masks according to the sequence from top to bottom and from left to right; calculating the average value of the coordinates of all pixel points (u, v) in each central line, sequencing the masks from the small to the large by using the average value of the v coordinates from top to bottom, and sequencing the masks from the small to the large by using the average value of the u coordinates from left to right;
s3.3: extracting paired pixel points of the edges by utilizing linear interpolation along the normal vector of the line pixel points in the mask to calculate the diameters of the steel bars, specifically, uniformly selecting 20 pixel points in each line, extending along the normal vector of the 20 pixel points to the edges of the two sides, extracting the paired pixel points of the edges by utilizing linear interpolation, and representing the diameter of each steel bar by using the average value of the connecting line lengths of the 20 paired pixel points; similarly, extending the normal vector of the pixel points in the mask center line to the adjacent mask center line, and extracting the paired pixel points in the mask center line for calculating the distance between the steel bars;
s3.4: aligning the RGB image of the steel bar acquired by the depth camera with the depth image, and acquiring the depth information of each pixel point in the RGB image; converting the paired pixel points of the edge and the center line extracted in the step S3.3 from a pixel coordinate system to a camera coordinate system by using a camera internal reference matrix; the specific formula of the camera internal reference matrix is as follows:
wherein M represents a camera reference matrix, (u) 0 ,v 0 ) D is the coordinate of the central point of the RGB image in the pixel coordinate system x 、d y For the length of a single pixel point in the x-axis and the y-axis, f is the focal length of the depth camera, f x 、f y The number of pixel points represented by each f length on the x axis and the y axis of the imaging plane is set;
s3.5: substituting the camera coordinates of the paired pixel points of the edge and the central line of the steel bar obtained in the step S3.4 into a space distance formula, calculating the actual diameter and the distance of the steel bar, and outputting a visualized steel bar quality acceptance result, as shown in fig. 7 (b); the space distance formula specifically comprises the following steps:
wherein D is the distance between any two points in space, x 1 、y 1 、z 1 Is the x, y, z coordinates of point 1, x 2 、y 2 、z 2 Is the x, y, z coordinates of point 2.
Further, it is to be understood that various changes and modifications of the present application may be made by those skilled in the art after reading the above description of the application, and that such equivalents are intended to fall within the scope of the application as defined in the appended claims.
Claims (9)
1. An intelligent reinforcement detection method based on convolutional neural network and binocular vision is characterized by comprising the following steps:
s1: acquiring RGB image and depth image data of the steel bar by using a depth camera;
s2: inputting RGB images of the reinforcing steel bars into a convolutional neural network for reinforcing steel bar identification, and obtaining a prediction boundary frame and a mask of the reinforcing steel bars;
s3: based on the reinforcement recognition result, reinforcement detection is performed by utilizing a binocular vision technology, and a visual reinforcement quality acceptance result is output.
2. The intelligent reinforcement detection method based on convolutional neural network and binocular vision according to claim 1, wherein in step S1, depth image data of the reinforcement is obtained through the following steps:
s1.1: performing stereo matching on left and right eye images of a depth camera to obtain a parallax image;
s1.2: according to the relation between depth and parallax, converting the parallax image into a depth image, wherein the calculation formula of the depth z is as follows:
wherein f is the focal length of the depth camera, b is the base line length of the depth camera, d is the parallax of the left and right eye images, and x l Is the abscissa of the projection point of the left-eye camera, x r Is the abscissa of the projected point of the right eye camera.
3. The intelligent reinforcement detection method based on convolutional neural network and binocular vision according to claim 1, wherein the step S2 specifically comprises the steps of:
s2.1: the method comprises the steps of collecting original pictures of reinforcing steel bars by using camera equipment, manufacturing reinforcing steel bar mask labels by using a manual labeling method, dividing a data set into a training set and a testing set, and amplifying the data set through data enhancement;
s2.2: pre-training on an improved Mask R-CNN model by using a public data set COCO2017, and initializing network parameters based on a migration learning principle;
s2.3: training the improved Mask R-CNN model established in the step S2.2 through the data set in the step S2.1, and constructing a reinforcement instance segmentation model;
s2.4: and inputting the RGB image of the steel bar acquired by the depth camera into a steel bar example segmentation model to acquire a prediction boundary frame and a mask of the steel bar.
4. The intelligent reinforcement detection method based on convolutional neural network and binocular vision according to claim 3, wherein in step S2.2, the improved Mask R-CNN model comprises an optimized feature extraction module, an RPN module, an ROI alignment module and an output branch;
the optimized feature extraction module is a CA-SA module which is formed by adding a bottom-up propagation path and a channel attention CA module and a space attention mechanism SA module into a feature pyramid structure ResNet-FPN based on a residual error network; in the CA module, after a feature map with height of H, width of W and channel number of C is input into a global average pooling layer, space dimensions W and H are compressed into a unit of 1, then the obtained 1 multiplied by C feature map is subjected to convolution operation, the sum of channels is 1 through softmax processing, the output at the moment is the attention mechanism weight of each channel, and the weight is multiplied by the input feature map on the channel correspondingly to obtain an output feature map; in the SA module, after the feature map is subjected to 1X 1 convolution and softmax processing, the channel dimension is compressed into a unit 1, the SA module acquires a weight matrix of the H X W dimension feature map on a two-dimensional plane, the weight matrix corresponds to the spatial attention mechanism weight of each pixel point, represents the importance degree of spatial position information, and endows the input feature map with the weight matrix to amplify important features and weaken background information, so that the effects of feature screening and enhancement are realized; the bottom-up propagation path specifically refers to: transmitting the characteristic information of P2 to N2, N2 to perform 3X 3 convolution on the { P2, P3, P4 and P5} characteristic diagram obtained based on ResNet-FPN, down-sampling the height and width to the size of P3, adding the height and width with P3 element by element, sending the obtained result to a CA-SA module to obtain N3, and further extracting N4 and N5 on the P4 and P5 characteristic diagrams so as to obtain { N2, N3, N4 and N5} characteristic diagrams;
generating a feature map by the image through the optimized feature extraction module; the RPN module further generates a strong priori anchor point frame for each point on the feature map, and obtains the classification score and the bounding box regression quantity of the anchor point frame through 1X 1 convolution, so that a group of better candidate frames are screened out and input into the ROI alignment module; the ROI alignment module transforms the feature map generated by the optimized feature extraction module and the candidate frames screened by the RPN module to the same dimension so as to meet the requirement of the follow-up full convolution network on the input features; and finally, inputting the characteristics obtained by the ROI alignment module into a full-connection layer, and respectively outputting the prediction category score, the bounding box regression quantity and the pixel mask of the object at the classification branch, the bounding box regression branch and the mask branch, thereby completing the whole detection and segmentation task.
5. The intelligent reinforcement detection method based on convolutional neural network and binocular vision according to claim 3, wherein the step S2.3 specifically comprises the steps of:
s2.3.1: setting an evaluation index of an improved Mask R-CNN model as mAP calculated by a COCO2017 data set definition method, namely using the average value of the sum of all average accuracy rates (AP) under different cross ratio thresholds (0.50:0.05:0.95), wherein the calculation formula of the AP is specifically as follows:
wherein P represents the proportion predicted correctly among all predicted rebar targets, and R represents the proportion predicted as a positive sample among all actual correct rebar targets;
s2.3.2: respectively taking the reinforcement picture and the reinforcement Mask label in the training set as input and output to be transmitted into an improved Mask R-CNN model to obtain weight parameters of a reinforcement instance segmentation model; carrying out boundary frame and mask prediction on the reinforced bar pictures in the test set by using the obtained weight parameters, carrying out loss and mAP index calculation on the result and the reinforced bar mask label true value, and adjusting the weight parameters; and obtaining a weight parameter corresponding to the maximum mAP until training is finished, and using the weight parameter for subsequent reinforcement recognition.
6. The intelligent reinforcement detection method based on convolutional neural network and binocular vision according to claim 1, wherein the step S3 specifically comprises the steps of:
s3.1: based on the reinforcement recognition result, binarizing each mask image, extracting pixel point coordinates of the edges of each mask image by using an edge detection algorithm, and further calculating the pixel point coordinates of the center line of each mask by using a center axis conversion principle;
s3.2: calculating the normal vector of the center line of each mask by using a k nearest neighbor algorithm, dividing the mask with the horizontal component of the center line normal vector smaller than the vertical component into the up-down direction, and dividing the mask with the horizontal component of the center line normal vector larger than or equal to the vertical component into the left-right direction; then using the neutral line coordinates to automatically sequence the reinforcement masks according to the sequence from top to bottom and from left to right;
s3.3: extending the normal vector of the line pixel points along the mask to the edges of both sides, extracting the paired pixel points of the edges by utilizing linear interpolation, and calculating the diameters of the steel bars; similarly, extending the normal vector of the pixel points in the mask center line to the adjacent mask center line, and extracting the paired pixel points in the mask center line for calculating the distance between the steel bars;
s3.4: aligning the RGB image of the steel bar acquired by the depth camera with the depth image, and acquiring the depth information of each pixel point in the RGB image; converting the paired pixel points of the edge and the center line extracted in the step S3.3 from a pixel coordinate system to a camera coordinate system by using a camera internal reference matrix;
s3.5: substituting the camera coordinates of the paired pixel points of the edge and the central line of the steel bar obtained in the step S3.4 into a space distance formula, calculating the actual diameter and the actual distance of the steel bar, and outputting a visualized steel bar quality acceptance result.
7. The intelligent reinforcement detection method based on convolutional neural network and binocular vision according to claim 6, wherein in step S3.4, the specific formula of the camera internal reference matrix is:
wherein M represents a camera reference matrix, (u) 0 ,v 0 ) D is the coordinate of the central point of the RGB image in the pixel coordinate system x 、d y For the length of a single pixel point in the x-axis and the y-axis, f is the focal length of the depth camera, f x 、f y For every f in the x-axis and y-axis of the imaging planeThe length represents the number of pixels.
8. The intelligent reinforcement detection method based on convolutional neural network and binocular vision according to claim 6, wherein in step S3.5, the spatial distance formula is specifically:
wherein D is the distance between any two points in space, x 1 、y 1 、z 1 Is the x, y, z coordinates of point 1, x 2 、y 2 、z 2 Is the x, y, z coordinates of point 2.
9. An intelligent reinforcement detection system based on convolutional neural network and binocular vision, which is characterized in that the system can execute the intelligent reinforcement detection method based on convolutional neural network and binocular vision as set forth in any one of claims 1-8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310578442.3A CN116703835A (en) | 2023-05-22 | 2023-05-22 | Intelligent reinforcement detection method and system based on convolutional neural network and binocular vision |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310578442.3A CN116703835A (en) | 2023-05-22 | 2023-05-22 | Intelligent reinforcement detection method and system based on convolutional neural network and binocular vision |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116703835A true CN116703835A (en) | 2023-09-05 |
Family
ID=87840174
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310578442.3A Pending CN116703835A (en) | 2023-05-22 | 2023-05-22 | Intelligent reinforcement detection method and system based on convolutional neural network and binocular vision |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116703835A (en) |
-
2023
- 2023-05-22 CN CN202310578442.3A patent/CN116703835A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109615611B (en) | Inspection image-based insulator self-explosion defect detection method | |
CN104484648B (en) | Robot variable visual angle obstacle detection method based on outline identification | |
CN104930985B (en) | Binocular vision 3 D topography measurement method based on space-time restriction | |
CN111062915A (en) | Real-time steel pipe defect detection method based on improved YOLOv3 model | |
CN107154014B (en) | Real-time color and depth panoramic image splicing method | |
CN101394573B (en) | Panoramagram generation method and system based on characteristic matching | |
Zhao et al. | Concrete dam damage detection and localisation based on YOLOv5s-HSC and photogrammetric 3D reconstruction | |
CN109470149B (en) | Method and device for measuring position and posture of pipeline | |
CN103295239A (en) | Laser-point cloud data automatic registration method based on plane base images | |
Kim et al. | Automated concrete crack evaluation using stereo vision with two different focal lengths | |
CN110223351B (en) | Depth camera positioning method based on convolutional neural network | |
Li et al. | Automatic bridge crack identification from concrete surface using ResNeXt with postprocessing | |
CN104574393A (en) | Three-dimensional pavement crack image generation system and method | |
CN112164048B (en) | Magnetic shoe surface defect automatic detection method and device based on deep learning | |
CN113393439A (en) | Forging defect detection method based on deep learning | |
CN113420619A (en) | Remote sensing image building extraction method | |
CN116630267A (en) | Roadbed settlement monitoring method based on unmanned aerial vehicle and laser radar data fusion | |
KR20110089299A (en) | Stereo matching process system, stereo matching process method, and recording medium | |
Kim et al. | The effective acquisition and processing of 3D photogrammetric data from digital photogrammetry for construction progress measurement | |
EP3825804A1 (en) | Map construction method, apparatus, storage medium and electronic device | |
CN110349209A (en) | Vibrating spear localization method based on binocular vision | |
CN110555385A (en) | welding seam characteristic point solving method based on variable step length curvature filtering | |
CN113920254B (en) | Monocular RGB (Red Green blue) -based indoor three-dimensional reconstruction method and system thereof | |
CN116703835A (en) | Intelligent reinforcement detection method and system based on convolutional neural network and binocular vision | |
CN115731390A (en) | Method and equipment for identifying rock mass structural plane of limestone tunnel |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |