CN117745708A

CN117745708A - Deep learning algorithm-based wood floor surface flaw detection method

Info

Publication number: CN117745708A
Application number: CN202311867866.8A
Authority: CN
Inventors: 鲁培龙; 段艺霖; 刘建
Original assignee: Shanghai Xinge Intelligent Technology Co ltd
Current assignee: Shanghai Xinge Intelligent Technology Co ltd
Priority date: 2023-12-29
Filing date: 2023-12-29
Publication date: 2024-03-22

Abstract

The invention discloses a method for detecting surface flaws of a wood floor based on a deep learning algorithm, which comprises the following steps: step 1: an industrial camera is adopted to collect a high-resolution image of the wood floor as an image input; step 2: extracting an effective area of the floor from the original image acquired in the step 1, extracting the upper and lower, left and right boundaries of the floor area, and throwing away an ineffective area; step 3: dividing the image into a plurality of equal parts uniformly according to the model specification of the incoming material product, dividing the extracted large target image into a plurality of uniform small target images; step 4: performing defect detection on the segmented small target images; step 5: summarizing the identification result of each small target floor; step 6: and outputting the defect detection and identification results of all the cut floors.

Description

Deep learning algorithm-based wood floor surface flaw detection method

[ field of technology ]

The invention belongs to the field of detection, and particularly relates to a method for detecting surface flaws of a wood floor based on a deep learning algorithm.

[ background Art ]

The defect detection of the wood floor is a key link of quality control in the production and manufacturing process of the wood floor. The existing detection mode mainly adopts manual visual inspection, and the existing defect detection mode has a plurality of defects: the method comprises the steps of low detection speed, large influence of experience of quality inspection workers on detection results, high labor cost, low detection accuracy and the like. With the high-speed development of computer technology and artificial intelligence, the algorithm based on machine vision image processing can greatly improve the problems of detection accuracy, non-uniform judgment standard, low detection speed and the like, and the existing problems and pain points can be well solved by applying the technology to floor detection.

Some machine vision image processing algorithm-based defect detection techniques are also proposed, which are two-stage detection performed by image classification and image segmentation techniques. The scheme that the two types of algorithms are combined is adopted in the mode, because the whole image detection segmentation algorithm consumes longer time, has relatively smaller surface defects and is smaller than the whole high-pixel image, the classification precision and the segmentation precision are not very high, the two algorithms are combined, the false detection rate and the omission rate are increased, and the algorithm accuracy is reduced.

[ invention ]

The invention aims to provide a method for detecting surface flaws of a wood floor based on a deep learning algorithm, which is used for solving the problems of high false detection rate and high omission rate caused by adopting two algorithms in combination in the prior art.

In order to achieve the above object, the method for detecting the surface flaws of the wood floor based on the deep learning algorithm comprises the following steps:

step 1: an industrial camera is adopted to collect a high-resolution image of the wood floor as an image input;

step 2: extracting an effective area of the floor from the original image acquired in the step one, extracting the upper and lower, left and right boundaries of the floor area, and throwing away an ineffective area;

step 3: dividing the image into a plurality of equal parts uniformly according to the model specification of the incoming material product, dividing the extracted large target image into a plurality of uniform small target images;

step 4: performing defect detection on the segmented small target images;

step 5: summarizing the identification result of each small target floor;

step 6: and outputting the defect detection and identification results of all the cut floors.

According to the main features, the step 2 specifically includes the following steps:

step 2.1: performing color space conversion on the acquired image, and converting the acquired image into a gray level image;

step 2.2: respectively carrying out transverse and longitudinal derivation on the gray level image to obtain a transverse gradient map and a longitudinal gradient map;

step 2.3: converting the data into a ui 8 through linear transformation by the gradient map obtained in the step 2;

step 2.4: respectively carrying out transverse projection and longitudinal projection on the gradient map in the step 2.3 to respectively obtain average values of a transverse axis and a longitudinal axis;

step 2.5: respectively selecting preset lengths of a horizontal axis and a vertical axis in the step 2.4 as a preprocessing value range of an upper boundary, a lower boundary, a left boundary and a right boundary, and setting a self-defined k1 value, wherein the value is arranged between the image edge and the extracted effective boundary;

step 2.6: according to the upper and lower boundaries of the preprocessing range and the k1 threshold, determining the value ranges of the upper boundary and the lower boundary, and realizing the following specific steps: let k0=max (0, x-k 1) first, where x represents the upper and lower bounds of the preprocessing range; the function of max is to prevent the boundary crossing, ensure the value range to be positive and in the image; then aaa10 = aaa [ k0: min (x+k1, a 1-1) ], wherein aaa is a lateral average value obtained by the step five lateral projection, and aaa10 represents the value range of the upper and lower boundaries in the preprocessing;

step 2.7: acquiring indexes of the maximum value in the current value range through argmax function according to the value ranges of the upper boundary and the lower boundary determined in the step 2.6, namely representing the y-direction pixel coordinate positions of the upper boundary and the lower boundary of the effective area to be extracted, and restoring the coordinate positions to the original image to obtain the boundary value of the effective area in the original image;

step 2.8: and (3) extracting the boundary value in the step (2.7), and outputting the abscissa and the ordinate of the boundary to extract the effective area.

According to the above main features, the step of performing defect detection in step 4 specifically includes the following steps:

step 4.1: the high-resolution images of the wood floor are acquired by an industrial camera to establish a floor training image library, and about 600 defective floor images are required to be acquired in real time;

step 4.2: manually labeling each image in the image library acquired in the step 4.1 to form a training image label library;

step 4.3: performing defect recognition model training on the training data set created in the step 4.1 and the step 4.2, and establishing a floor recognition model;

step 4.4: and acquiring a new image by using an industrial camera, calling a trained model to infer, and marking defect position and category information if defects exist.

According to the main features, the step 4.3 specifically includes the following steps:

4.3.1: the resolution of the sample library image acquired in step 4.1 is 2448 x2048, and the positive-negative sample number ratio of the data set is close to 1:1, the total number of positive and negative samples of the data set is M, wherein M is more than or equal to 600;

step 4.3.2: before entering the encoding and decoding, the image with the label is subjected to a size reduction operation, the image is finally scaled to 640x640, and then a small size picture of the size reduction is sent to an encoder for extracting depth characteristics of an input image and outputting a characteristic picture for subsequent classification or regression tasks;

step 4.3.3: the image in step 4.3.2 is processed by an encoder to generate semantic vectors, the semantic vectors are respectively processed by 2 operations, firstly, the semantic vectors are flattened into a logic vector logic through full connection operation to represent the score of each category, and the logic vector is processed by softmax conversion to obtain the probability distribution of each category so as to be used for multi-classification. Secondly, the semantic vector is subjected to full-connection pooling operation to generate a dense vector, the dimension of the vector is usually matched with the dimension of the coordinate, the final target is a regression position coordinate, the predicted coordinate and the real coordinate obtained by regression are subjected to calculation loss, and then a standard back propagation algorithm and a gradient descent optimization algorithm are used to minimize a loss function;

step 4.3.4: step 4.3.3, carrying out loss calculation on the probability distribution vector Q of the logic vector converted by softmax and the real distribution P of the single thermal coding of the image tag converted by ont-hot, selecting a cross entropy loss function, and minimizing cross entropy loss through an adam optimizer; in addition, carrying out loss calculation on the predicted position coordinate W and the real coordinate S regressed in the step 4.3.3, selecting a mean square error loss (Mean Squared Error, MSE), and minimizing the mean square error loss through an adam optimizer;

step 4.3.5: through a standard back propagation algorithm and a gradient descent optimizer, the neural network adjusts weights and biases to minimize classification loss and regression loss of the floor detection model, after 200 epochs are iterated, the two losses reach a minimum value, the back propagation gradient is not descended any more, the model tends to converge at this time, and model parameters are saved as a. Pt file, namely model training is completed.

According to the main characteristics, step 4.4 is specifically to collect a new floor image by using an industrial camera, call a model after training (the model comprises weights and offsets for optimizing a neural network) to generate semantic vectors through forward propagation of the model, the semantic vectors respectively complete regression prediction of position coordinates and probability prediction of category distribution, and take the position coordinates with the largest probability and prediction as reasoning prediction output of a defect target on the new floor image.

Compared with the prior art, the method and the device have the advantages that the one-stage algorithm is used in the scheme, so that the detection time is obviously shortened; by adopting the small target image detection scheme, the defect characteristics are amplified relative to the whole image, so that the defect characteristics are amplified, and the identification accuracy is remarkably improved. .

[ description of the drawings ]

Fig. 1 is a schematic flow chart of a method for detecting surface flaws of a wood floor based on a deep learning algorithm.

[ detailed description ] of the invention

Referring to fig. 1, an overall flow chart of a method for detecting surface flaws of a wood floor based on a deep learning algorithm according to the present invention is shown, wherein the method for detecting surface flaws of a wood floor based on a deep learning algorithm according to the present invention comprises the following steps:

step 2: extracting an effective area of the floor from the original image acquired in the step 1, extracting the upper and lower, left and right boundaries of the floor area, and throwing away an ineffective area;

step 3: dividing the image into a plurality of equal parts (such as 4 or 6 equal parts) uniformly according to the model specification of the incoming material product, and dividing the extracted large target image into a plurality of uniform small target images;

step 4: performing defect detection on the segmented small target images;

step 5: summarizing the identification result of each small target floor;

The effective boundary of the image is extracted in the step 2 through the following steps:

step 2.1: performing color space conversion on the acquired image (with the image resolution of 2448 (long) x2048 (high)) to convert the image into a gray image;

step 2.2: respectively deriving the gray level image transversely and longitudinally to obtain a transverse gradient image and a longitudinal gradient image, wherein the image processing operation can be specifically performed by using an opencv library in python, such as edge detection by using a Sobel operator: img1=cv2.sobel (gray, ddepth=cv2.cv_64f, dx=0, dy=1, ksize=3); wherein the meaning of each parameter is described as follows:

1. gray assuming that this is a gray scale image (i.e., a two-dimensional array) on which to apply

Sobel operator. If the original image is a color image, it is typically converted into a gray scale image before edge detection is performed.

2. ddepth=cv2.cv_64f-this parameter specifies the desired depth of the output image. The Sobel operator involves a derivative operation, using cv2.cv_64f to ensure that the result is a 64-bit floating point number image, which can represent negative values.

3. dx=0 and dy=1, these parameters determine the order of the derivatives to be taken in the x and y directions, respectively. Here, dx=0 means that only the derivative in the y direction (i.e., the vertical gradient) is calculated, and dy=1 means that the first derivative is specified; and vice versa, to calculate the derivative in the x-direction (i.e. the horizontal gradient). The meaning of the vertical gradient map is to calculate the image gradient in the y-direction (vertical direction), which is numerically represented by a larger value at the location where the gradient is larger (i.e., where the feature changes significantly, such as at the bright-dark junctions, boundaries, etc.), where it can be inferred that there are possible upper and lower boundaries; whereas the meaning of a horizontal gradient map is to calculate the image gradient in the x-direction (horizontal direction), which is numerically represented by a larger value at the location where the gradient is larger (i.e. where the feature changes significantly, such as at bright-dark junctions, boundaries, etc.), where the possible left and right boundaries can be inferred.

4. ksize=3 this parameter sets the size of the Sobel kernel (i.e. filter). The kernel is a small matrix for convolution. In this embodiment a 3x3 core is used for the Sobel operation.

The gradient map of edge detection is saved in variable img1 through the above steps.

Step 2.3: converting the gradient map obtained in step 2.2 into uint8, in particular img2=cv2.convertscaleabs (img 1), by linear transformation, negative values may be generated due to the Sobel operation, and values greater than 255 are also generated, and the source image is uint8, i.e. 8-bit unsigned number, so that the Sobel creates an insufficient number of image bits, and there is a truncation, the convertScaleAbs function ensures that these negative values are scaled and converted into appropriate positive values, ranging between [0,255], ensuring that the image can be displayed normally instead of a gray window. The final result of the edge detection is stored in variable img2, which is an 8-bit image suitable for further processing or display;

step 2.4: respectively carrying out transverse projection and longitudinal projection on the gradient map in the step 2.3 to respectively obtain average values of a transverse axis and a longitudinal axis: the specific adoption is as follows: aaa=np.mean (img 2, axis=1), where axis=1: this is a parameter of the np.mean function, specifying on which axis the calculation of the average is performed. Here axis=1 means that the average is calculated along the second axis (i.e. column) of the array, i.e. the operation is performed on each row. aaa will contain an average value for each row of img2, where the average value for that row will be greater because the value will be greater where the gradient is greater (where the feature changes significantly, such as extracting the boundary of the foreground);

step 2.5: the preset lengths of the horizontal axis and the vertical axis in step 2.4 (the preset length refers to the length of the approximate range including the effective area) are selected as the pre-processing value ranges of the upper, lower, left and right boundaries, and in this embodiment, a range of 5% -95% is selected, wherein the range is self-defined, and is determined according to the proportion of the actual boundary length in the image, and is self-defined according to specific situations. Taking the example of calculating the upper and lower boundaries, the size of the image height in this embodiment is equal to 2048, so that the length of the horizontal axis average obtained in step 2.5 is equal to 2048, i.e. the horizontal axis has 2048 row averages, where each row represents the average value of the row, and the larger the row average value, the description is of the possible upper and lower boundaries. The upper and lower boundary ranges of the effective area to be extracted in the image are approximately in the vicinity of [139,1921] by using a drawing tool, wherein [139,1921] refers to the pixel coordinate position in the y direction, and therefore, the range of 5% -95% of the height of the image can be taken as the preprocessing range, namely [102,1945] is reasonable. Here [102,1945] refers to the y-direction pixel coordinate position. And setting a self-defined k1 value, wherein the value is arranged between the image edge (the upper boundary y=0 and the lower boundary y=2048 of the source image) and the upper and lower boundaries of the extracted effective region, and in addition, k1 is required to be larger than the minimum distance between the pretreatment boundary and the boundary of the effective region to be extracted, so that the lower boundary is ensured not to be lost, and the boundary of the region to be extracted is ensured to be within the pretreatment range. The function of the variable threshold is to provide a variable threshold for the value range of the upper and lower boundaries to enter the next calculation, and the variable threshold can be customized according to actual conditions.

Step 2.6: the preprocessing range of the upper and lower boundaries determined in the step 2.5 is [102,1945], the self-defined k1 variable threshold is 70, and the value ranges of the upper boundary and the lower boundary are determined according to the upper and lower boundaries of the preprocessing range and the k1 threshold, and the specific implementation is as follows:

1. k0=max (0, x-k 1), where x represents the upper and lower bounds of the pretreatment range; the function max is used for preventing boundary crossing, ensuring that the value range is positive and is in the image. For the upper bound: k0 =max (0,102-70) =32, for the lower bound: k0 =max (0,1945-70) =1875.

2. aaa 10=aaa [ k0:min (x+k1, a 1-1) ], wherein aaa is the lateral average value obtained by the step five lateral projection, shape is (2048), and a1 is the lateral axis length len (a 1) =2048, i.e. image height. Therefore, the upper bound value range is determined to be in [32,102+70]; the lower bound must be in the range of [1875,1945+70].

Step 2.7: and (3) according to the value ranges of the upper boundary and the lower boundary determined in the step (2.6), acquiring the index of the maximum value in the current value range through an argmax function, namely representing the y-direction pixel coordinate positions of the upper boundary and the lower boundary of the effective area to be extracted, and restoring the coordinate positions to the original image to obtain the boundary value of the effective area in the original image. Taking the example of calculating the upper boundary value range: the argmax value in the upper boundary interval range is calculated, the result returned by the argmax () function is the index of the maximum value in the current value range, in this embodiment representing the index of the maximum gradient value in the value range of the upper boundary of the effective area to be extracted, i.e. the y-direction pixel coordinate position determined as the upper boundary of the effective area. a=np.argmax (aaa 10) +k0; the index corresponding to the maximum value of aaa10 represents the coordinate corresponding to the gradient maximum value in the range, the gradient maximum value represents the coordinate of the pixel in the y direction of the upper boundary at the boundary, and then k0 value offset is carried out to restore to the original image, wherein the value is the upper boundary coordinate value of the original image, and the left boundary and the right boundary are the same in principle.

The step of performing defect detection in the step 4 is specifically implemented by the following steps:

step 4.3: performing defect recognition model training on the training data set created in the step 4.1 and the step 4.2, and establishing a floor defect recognition model;

step 4.4: and acquiring a new image by using an industrial camera, calling a trained floor defect recognition model to infer, and marking defect position and category information if defects exist.

The specific implementation steps of the step 4.3 are as follows:

where Positive Samples (Positive Samples) generally refer to Samples containing an object of interest. In object detection, this means that a certain area or bounding box in the image contains the object to be detected. For example, in the target detection task of floor defects, any image area containing floor defects (including pits, cracks, black spots, dirt, scratch marks) may be marked as a positive sample.

Negative Samples (Negative Samples) generally refer to Samples that do not contain an object of interest. In object detection, this means that there is no object desired to be detected within a certain area or bounding box in the image. Continuing with the example of target detection of floor defects, any image area that does not contain a floor defect may be marked as a negative sample.

Step 4.3.2: the tagged image is subjected to a scale operation prior to entering the codec, and finally scaled to 640x640, and then the small scale of the scale is fed into the encoder for extracting depth features of the input image and outputting feature maps for subsequent classification or regression tasks.

The working principle of the encoder is as follows:

1. convolution layer (Convolutional Layer): convolution is the core operation of CNN. The convolution layer performs local feature extraction by sliding a filter (also called a convolution kernel) over the input data. This captures the spatial hierarchy in the input data. The convolution operation helps preserve spatial locality, reduces the number of parameters, and extracts key features.

2. Activation function (Activation Function): a nonlinear activation function, such as ReLU (Rectified Linear Unit), is applied to the output of the convolutional layer. The activation function introduces nonlinear characteristics to help the model learn complex mapping relationships.

3. Pooling Layer (Pooling Layer): the pooling layer is used for reducing the size of the feature map, reducing the calculation amount and having a certain invariance to translational and rotational changes. Common pooling operations include maximum pooling and average pooling.

4. Batch normalization (Batch Normalization): batch normalization is used to accelerate the training process and improve the stability of the model. It normalizes the input on each small batch of data, reducing internal covariate offset, helping to speed up convergence.

5. Full tie layer (Fully Connected Layer): the last part of the encoder at CNN is typically the fully connected layer. The fully connected layer flattens the output of the convolutional layer and connects to a dense layer for generating the final feature representation. The fully connected layer is typically used to map high-level features to output categories.

6. Dropout: to prevent overfitting, dropout layers are typically added between fully connected layers. Dropout randomly discards a portion of neurons during the training process, forcing the model to learn features more robustly.

The image passes through an encoder, extracts local features in the image feature space and outputs the extracted local features as feature vectors for subsequent classification or regression operations.

Step 4.3.3: the image in step 4.3.2 is processed by an encoder to generate semantic vectors, the semantic vectors are respectively processed by 2 operations, first, the semantic vectors are flattened into a logic vector logits through FC (fully connected operation) to represent the score of each category, and the logic vector is converted into probability distribution of each category through softmax so as to be used for multi-category. Second, semantic vectors are subjected to FM (full connected pooling) operations to generate dense vectors, the dimensions of which are usually matched with those of coordinates, because the final objective is regression position coordinates, the predicted coordinates and the true coordinates obtained by regression are subjected to calculation loss, and then a standard back propagation algorithm and a gradient descent optimization algorithm are used to minimize the loss function. Through back propagation, the network adjusts weights and biases to make the model output more nearly real position coordinates; wherein the Softmax conversion is to input the original output vector into the Softmax function to obtain the probability distribution of each category. The softmax function ensures the nature of the probability distribution, i.e. the probability value for each class is between 0 and 1, and the sum of all probability values is equal to 1.

Step 4.3.4: step 4.3.3 the probability distribution vector Q of the logic vector after softmax conversion and the true distribution P of the single thermal coding of the image tag after ont-hot conversion are subjected to Loss calculation, wherein a Cross-Entropy Loss function (Cross-entry Loss), also called Log Loss, is selected, the difference between the two probability distributions can be well measured, and the Cross-Entropy Loss is minimized through an adam optimizer; in addition, the predicted position coordinate W and the real coordinate S regressed in the step 4.3.3 are subjected to loss calculation, wherein the mean square error loss (Mean Squared Error, MSE) is selected, and the mean square error loss is minimized through an adam optimizer;

step 4.3.5: through a standard back propagation algorithm and a gradient descent optimizer, the neural network adjusts weights and biases, minimizes classification Loss (Cross-Entropy Loss) and regression Loss (Mean Squared Error, MSE) of the floor detection model, and after 200 epochs are iterated, the two losses reach the minimum value, the back propagation gradient is not descended any more, the model tends to converge at this time and model parameters are saved as a pt file, namely model training is completed;

wherein the cross entropy loss function is:

wherein:

n is the number of samples.

·y _i Is a real tag and takes a value of 0 or 1.

·p _i Is the model's predicted probability that the sample belongs to category 1.

The mean square error loss function is:

wherein y is _i Is the actual position coordinates of the object to be measured,is the predicted value of the model and n is the dimension of the coordinates.

In step 4.4, specifically, a new floor image is acquired by using an industrial camera, a trained model (the model comprises weights and offsets for optimizing a neural network) is called to generate semantic vectors through forward propagation of the model, the semantic vectors respectively complete regression prediction of position coordinates and probability prediction of category distribution, and the position coordinates with the largest probability and prediction are taken as reasoning prediction output of defect targets on the new floor image.

Compared with the prior art, the method and the device have the advantages that the one-stage algorithm is used in the scheme, so that the detection time is obviously shortened; by adopting the small target image detection scheme, the defect characteristics are amplified relative to the whole image, so that the defect characteristics are amplified, and the identification accuracy is remarkably improved.

It will be understood that equivalents and modifications will occur to those skilled in the art in light of the present invention and their spirit, and all such modifications and substitutions are intended to be included within the scope of the present invention as defined in the following claims.

Claims

1. The method for detecting the surface flaws of the wood floor based on the deep learning algorithm is characterized by comprising the following steps:

step 4: performing defect detection on the segmented small target images;

step 5: summarizing the identification result of each small target floor;

2. The method for detecting surface flaws of a wood floor based on a deep learning algorithm as claimed in claim 1, wherein the step 2 specifically comprises the steps of:

3. The method for detecting the surface flaws of the wood floor based on the deep learning algorithm as claimed in claim 2, wherein: in step 2.2, respectively performing transverse and longitudinal derivation on the gray level image to obtain a transverse gradient map and a longitudinal gradient map, wherein the edge detection is performed by using a Sobel operator: img1=cv2.sobel (gray, ddepth=cv2.cv_64f, dx=0, dy=1, ksize=3); wherein the meaning of each parameter is described as follows: 1. gray, assuming that this is a gray scale image; 2. ddepth = cv2.cv_64f; 3. dx=0 means that only the derivative in the y direction is calculated, and dy=1 means that the first derivative is specified; 4. ksize=3 means that Sobel operation is performed using a core of 3×3.

4. The method for detecting the surface flaws of the wood floor based on the deep learning algorithm as claimed in claim 3, wherein: step 2.3 converting the gradient map obtained in step 2.2 into uint8 by linear transformation specifically adopts: img2=cv2.convertscaleabs (img 1).

5. The method for detecting the surface flaws of the wood floor based on the deep learning algorithm as claimed in claim 4, wherein the method comprises the following steps: in the step 2.4, the gradient map in the step 2.3 is respectively projected transversely and longitudinally to obtain average values of a transverse axis and a longitudinal axis, and specifically: aaa=np.mean (img 2, axis=1), where axis=1 means that the average is calculated along the second axis of the array, i.e. operating on each row.

6. The method for detecting the surface flaws of the wood floor based on the deep learning algorithm as claimed in claim 5, wherein the method comprises the following steps: and 2.5, respectively selecting the preset lengths of the transverse axis and the longitudinal axis in the step 2.4 as the pre-treatment value ranges of the upper, lower, left and right boundaries, wherein the preset length is 5% -95%.

7. The method for detecting the surface flaws of the wood floor based on the deep learning algorithm as claimed in claim 6, wherein: in the step 2.5, the value of k1 is defined as 70.

8. The method for detecting the surface flaws of the wood floor based on the deep learning algorithm as claimed in claim 7, wherein: the step of performing defect detection in the step 4 specifically includes the following steps:

9. The method for detecting the surface flaws of the wood floor based on the deep learning algorithm as claimed in claim 8, wherein: step 4.3 specifically comprises the following steps:

step 4.3.3: 4.3.2, generating semantic vectors through an encoder, wherein the semantic vectors are respectively subjected to 2 operations, firstly, flattening the semantic vectors into a logic vector logic through full connection operation to represent the score of each category, and obtaining probability distribution of each category through softmax conversion of the logic vectors so as to be used for multi-classification; secondly, the semantic vector is subjected to full-connection pooling operation to generate a dense vector, the dimension of the vector is usually matched with the dimension of the coordinate, the final target is a regression position coordinate, the predicted coordinate and the real coordinate obtained by regression are subjected to calculation loss, and then a standard back propagation algorithm and a gradient descent optimization algorithm are used to minimize a loss function;

step 4.3.4: step 4.3.3, carrying out loss calculation on the probability distribution vector Q of the logic vector converted by softmax and the real distribution P of the single thermal coding of the image tag converted by ont-hot, selecting a cross entropy loss function, and minimizing cross entropy loss through an adam optimizer; in addition, carrying out loss calculation on the predicted position coordinate W and the real coordinate S regressed in the step 4.3.3, selecting a mean square error loss, and minimizing the mean square error loss through an adam optimizer;

10. The method for detecting the surface flaws of the wood floor based on the deep learning algorithm as claimed in claim 8, wherein: and step 4.4, acquiring a new floor image by using an industrial camera, calling a model after training (the model comprises weights and offsets for optimizing a neural network) to generate semantic vectors through forward propagation of the model, respectively completing regression prediction of position coordinates and probability prediction of category distribution by the semantic vectors, and taking the position coordinates with the largest probability and prediction as reasoning prediction output of a defect target on the new floor image.