CN115170456A

CN115170456A - Detection method and related equipment

Info

Publication number: CN115170456A
Application number: CN202110300822.1A
Authority: CN
Inventors: 庄琰; 蔡佳; 周磊
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2021-03-19
Filing date: 2021-03-19
Publication date: 2022-10-11

Abstract

The application provides a detection method and related equipment. Relate to the artificial intelligence field, concretely relates to computer vision field. The method comprises the following steps: firstly, acquiring a first multi-scale feature map of a product to be detected, wherein the first multi-scale feature map comprises multi-dimensional features of the product to be detected in different scale spaces; then, according to the first multi-scale feature map and the reconstructed feature map, an abnormal value of each pixel point in the first multi-scale feature map can be determined, and the abnormal value of each pixel point represents the product abnormal degree of the pixel point at the corresponding position of the product to be detected; therefore, the detection result of the product to be detected can be determined according to the abnormal value of each pixel point in the first multi-scale feature map. The method and the device can determine the specific defect position of the product to be detected.

Description

Detection method and related equipment

Technical Field

The embodiment of the invention relates to the field of computer vision, in particular to a detection method and related equipment.

Background

Computer vision is an integral part of various intelligent/autonomous systems in various application fields, such as manufacturing, inspection, document analysis, medical diagnosis, military and the like, and is a study on how to use cameras/video cameras and computers to acquire data and information of a photographed object which are required by us. In a descriptive sense, a computer is provided with eyes (camera/camcorder) and a brain (algorithm) to recognize, track, measure, etc. a target instead of human eyes, thereby enabling the computer to perceive an environment. Because perception can be viewed as extracting information from sensory signals, computer vision can also be viewed as the science of how to make an artificial system "perceive" from images or multidimensional data. Generally, computer vision is to use various imaging systems to obtain input information instead of visual organs, and then use computer to process and interpret the input information instead of brain. The ultimate research goal of computer vision is to make a computer have the ability to adapt to the environment autonomously by visually observing and understanding the world like a human.

Image recognition, which is a technique for processing, analyzing and understanding images by using a computer to recognize various different patterns of targets and objects, is a practical application of applying a deep learning algorithm. The traditional image identification process comprises four steps: image acquisition, image preprocessing, feature extraction and image recognition.

The product quality is one of the most important production indexes in the manufacturing industry, and in order to ensure the product quality, the defect detection of the product becomes an indispensable procedure in the production process of the product. The product defect detection not only meets the requirements on product quality, but also controls production cost, detects defective workpieces at the early point of the production line, and avoids waste caused by the subsequent process flow. With the increasing degree of industrial modernization, the traditional manual defect detection mode is increasingly uncomfortable in the production operation process due to low defect detection efficiency, poor detection rate and high cost. Therefore, various algorithms for detecting visual defects of products by using an image technology have appeared; in particular, the existing single-class classification technology cannot locate the specific defect position of the product, and therefore, it is urgently needed to improve the above technical problem.

Disclosure of Invention

The application provides a detection method and related equipment, which can determine the specific defect position of a product and is beneficial to further analyzing the defects of the product by workers.

In a first aspect, a detection method is provided, which includes the following steps:

acquiring a first multi-scale feature map of a product to be detected, wherein the first multi-scale feature map comprises multi-dimensional features of the product to be detected in different scale spaces; determining an abnormal value of each pixel point in the first multi-scale feature map according to the first multi-scale feature map and the reconstruction feature map, wherein the abnormal value of each pixel point is used for representing the product abnormal degree of the pixel point at the corresponding position of the product to be detected; the reconstructed feature map comprises multi-dimensional features of t second multi-scale feature maps of normal products; and determining the detection result of the product to be detected according to the abnormal value of each pixel point in the first multi-scale characteristic diagram.

In the detection method in the embodiment of the application, a first multi-scale feature map of a product to be detected is obtained, and the first multi-scale feature map comprises multi-dimensional features (the multi-dimensional features can be understood as multi-channel features) of the product to be detected in different scale spaces; then, according to the first multi-scale feature map and the reconstructed feature map, an abnormal value of each pixel point in the first multi-scale feature map can be determined, and the abnormal value of each pixel point is used for representing the product abnormal degree of the pixel point at the corresponding position of the product to be detected; therefore, the detection result of the product to be detected can be determined according to the abnormal value of each pixel point in the first multi-scale feature map. According to the abnormal value of each pixel point of the first multi-scale characteristic diagram, the detection result and the specific defect position of the product to be detected can be determined, and workers can further analyze the defects of the product to be detected.

In some possible embodiments, determining a detection result of the product to be detected according to the abnormal value of each pixel point in the first multi-scale feature map includes the following steps: and if the abnormal value of any pixel point in the first multi-scale characteristic diagram is larger than the preset abnormal value, determining that the product to be detected is a defective product.

In the detection method of the embodiment of the application, when the detection result of the product to be detected is determined according to the abnormal value of each pixel point of the first multi-scale feature map, when the abnormal value of any pixel point in the first multi-scale feature map is determined to be greater than the set abnormal value, the product to be detected can be determined to be a defective product.

In some possible embodiments, determining a detection result of the product to be detected according to the abnormal value of each pixel point in the first multi-scale feature map includes the following steps: acquiring the number of pixel points of which the abnormal values are larger than the preset abnormal values in the first multi-scale characteristic graph; and if the number is larger than the preset number, determining that the product to be detected is a defective product.

In the detection method of the embodiment of the application, when the detection result of the product to be detected is determined according to the abnormal value of each pixel point of the first multi-scale feature map, the number of the pixel points of which the abnormal values are greater than the preset abnormal value in the first multi-scale feature map may be obtained first, and when the number is determined to be greater than the preset number, the product to be detected may be determined to be a defective product.

In particular, in the detection method according to the embodiment of the present application, when determining the detection result of the product to be detected according to the abnormal value of each pixel point of the first multi-scale feature map, there may be other determination methods, and the other determination methods should be within the protection scope of the present application.

In some possible embodiments, the detection method further comprises:

and when the product to be detected is a defective product, determining the pixel points with the abnormal values larger than the preset abnormal values in the first multi-scale characteristic diagram in the mapping area of the product to be detected as the defective area of the product to be detected.

In the detection method of the embodiment of the application, when the product to be detected is determined to be a defective product, since the first multi-scale feature map has a certain mapping relationship with the product to be detected, the mapping region of the pixel point, in the product to be detected, of which the abnormal value is greater than the preset abnormal value in the first multi-scale feature map is the defective region of the product to be detected.

In some possible embodiments, obtaining the first multi-scale feature map of the product to be tested includes the following steps:

acquiring a plurality of first characteristic diagrams of a product to be detected, wherein the sizes of the first characteristic diagrams are different; carrying out size unified processing on the multiple first feature maps according to a preset size to obtain multiple second feature maps, wherein the multiple second feature maps have the same size; splicing the plurality of second feature maps along the dimension of the channel to obtain a third multi-scale feature map; different weights are applied to different channels of the third multi-scale feature map based on an attention mechanism to obtain the first multi-scale feature map.

In the detection method of the embodiment of the application, a method for obtaining a first multi-scale feature map is provided, wherein for a plurality of first feature maps with different sizes of a product to be detected, size unified processing is performed first, then splicing is performed along channel dimensions to obtain a third multi-scale feature map, and different weights are applied to different channels of the third multi-scale feature map based on an attention mechanism, so that the first multi-scale feature map can be obtained.

acquiring a plurality of third feature maps of a product to be detected, wherein the third feature maps have different sizes and the same channel number; carrying out size unified processing on the plurality of third feature maps according to a preset size to obtain a plurality of fourth feature maps, wherein the sizes of the plurality of fourth feature maps are the same; and applying different weights to the fourth feature maps based on the attention mechanism so as to obtain the first multi-scale feature map through weighted fusion.

In the detection method according to the embodiment of the present application, another method for obtaining a first multi-scale feature map is provided, in which, for a plurality of third feature maps with different sizes and the same number of channels of a product to be detected, size-unification processing is performed to obtain a plurality of fourth feature maps, and different weights are assigned to the plurality of fourth feature maps based on an attention mechanism, so that the plurality of fourth feature maps are subjected to weighting fusion to obtain the first multi-scale feature map.

acquiring a plurality of fifth characteristic diagrams of a product to be detected, wherein the sizes of the fifth characteristic diagrams are different; carrying out size unified processing on the plurality of fifth feature maps according to a preset size to obtain a plurality of sixth feature maps, wherein the sizes of the plurality of sixth feature maps are the same; respectively applying different weights to different channels of the sixth feature maps based on an attention mechanism to obtain seventh feature maps; and applying different weights to the seventh feature maps based on the attention mechanism, and splicing the seventh feature maps along the channel dimension to obtain a first multi-scale feature map.

In the detection method of the embodiment of the application, another method for obtaining a first multi-scale feature map is provided, wherein for a plurality of fifth feature maps with different sizes of a product to be detected, size unified processing is performed to obtain a plurality of sixth feature maps; respectively applying different weights to different channels of the sixth feature maps based on an attention mechanism to obtain seventh feature maps; and then, applying different weights to the multiple seventh feature maps based on the attention mechanism, and splicing the multiple seventh feature maps along the channel dimension to obtain a first multi-scale feature map.

In some possible embodiments, determining an abnormal value of each pixel point in the first multi-scale feature map according to the first multi-scale feature map and the reconstructed feature map includes the following steps:

respectively carrying out pixel-by-pixel decomposition on the first multi-scale feature map and the reconstruction feature map to obtain M first position features and M reconstruction feature centers; and determining an abnormal value of a corresponding pixel point in the first multi-scale feature map according to the first position feature and the center of the reconstruction feature of the same pixel point position.

In the detection method of the embodiment of the application, when the abnormal value of each pixel point in the first multi-scale feature map is determined, the first multi-scale feature map and the reconstruction feature map may be decomposed pixel by pixel to obtain M first position features of the first multi-scale feature map and M reconstruction feature centers of the reconstruction feature map; then, according to the first position feature and the center of the reconstruction feature of the same pixel point position, an abnormal value of a pixel point at a corresponding position in the first multi-scale feature map can be calculated. In some possible embodiments, the outlier comprises a euclidean or cosine distance between the first location feature of the pixel point and the center of the corresponding reconstructed feature.

In some possible embodiments, the detection method further comprises the steps of:

acquiring a reconstruction characteristic diagram, wherein the acquiring of the reconstruction characteristic diagram comprises the following steps: acquiring t second multi-scale feature maps of the normal product, wherein the second multi-scale feature maps contain multi-dimensional features of the normal product in different scale spaces; respectively carrying out pixel-by-pixel decomposition on the t second multi-scale feature maps to obtain t multi-scale feature vector sets, wherein the multi-scale feature vector sets comprise multi-scale feature vectors corresponding to all pixel points of the second multi-scale feature maps, and the element number of the multi-scale feature vector sets is the pixel point number M of the second multi-scale feature maps; respectively carrying out feature reconstruction on t multi-scale feature vectors with the same pixel point position in the t multi-scale feature vector sets to obtain a reconstructed feature center of a corresponding pixel point; and (4) obtaining a reconstruction characteristic diagram according to M reconstruction characteristic centers.

In the detection method of the embodiment of the application, a method for obtaining a reconstructed feature map is provided, wherein t second multi-scale feature maps of a normal product are obtained first; respectively carrying out pixel-by-pixel decomposition on the t second multi-scale feature maps to obtain t multi-scale feature vector sets, wherein the element number of each multi-scale feature vector set is M; then, performing feature reconstruction on t multi-scale feature vectors with the same pixel point position in the t multi-scale feature vector sets to obtain M reconstructed feature centers; and finally, obtaining a reconstruction feature map by aggregation according to the M reconstruction feature centers.

outputting a thermodynamic diagram corresponding to the product to be tested according to the abnormal value; the thermodynamic diagram is marked by dark and light colors, and the lighter the color of a pixel point in the thermodynamic diagram is, the smaller an abnormal value corresponding to the pixel point is represented; the darker the color of the pixel point in the thermodynamic diagram is, the larger the abnormal value corresponding to the pixel point is.

In the detection method of the embodiment of the application, because the first multi-scale feature map has a certain mapping relation with the product to be detected, the thermodynamic diagram corresponding to the product to be detected can be output according to the abnormal value of each pixel point in the first multi-scale feature map so as to visually display the defect detection result of the product to be detected to a user, and the user can know the defect condition of the product to be detected by looking up the thermodynamic diagram, including whether the product has a defect, the specific defect position of the product and the like.

In a second aspect, there is provided a detection apparatus comprising means for performing the method of the first aspect.

In a third aspect, there is provided a detection apparatus comprising a processor and a memory, wherein the processor is connected to the memory, wherein the memory is used for storing program codes, and the processor is used for calling the program codes to execute the detection method according to the first aspect.

In a fourth aspect, a detection system is provided, which includes a camera and the detection device of the third aspect, where the camera is configured to obtain a picture of a product to be detected, and send the picture of the product to be detected to the detection device, and the detection device further performs a step of obtaining a first multi-scale feature map according to the picture of the product to be detected.

In a fifth aspect, a computer-readable storage medium is provided, the computer-readable storage medium storing a computer program, the computer program being executed by a processor to implement the detection method according to the first aspect.

In a sixth aspect, there is provided a computer program product containing instructions which, when run on a computer, cause the computer to perform the method of the first aspect described above.

In a seventh aspect, a chip is provided, where the chip includes a processor and a data interface, and the processor reads instructions stored on a memory through the data interface to execute the method in the first aspect.

Optionally, as an implementation manner, the chip may further include a memory, where instructions are stored in the memory, and the processor is configured to execute the instructions stored in the memory, and when the instructions are executed, the processor is configured to execute the method in the first aspect.

In an eighth aspect, an electronic device is provided, which includes the detection apparatus in the second aspect.

Drawings

The drawings used in the embodiments of the present application are described below.

FIG. 1 is a schematic diagram of a defect detection scenario provided by an embodiment of the present application;

fig. 2a and 2b are schematic diagrams of product pictures provided by an embodiment of the present application;

fig. 3a and 3b are schematic diagrams of thermodynamic diagrams provided by an embodiment of the present application;

FIG. 4 is a schematic structural diagram of a system architecture provided in an embodiment of the present application;

fig. 5 is a schematic diagram of a chip hardware structure according to an embodiment of the present disclosure;

FIG. 6 is a flow chart of a detection method provided by an embodiment of the present application;

FIG. 7 is a schematic structural diagram of a defect detection model provided in an embodiment of the present application;

fig. 8 is a specific flowchart of a detection method provided in an embodiment of the present application;

FIG. 9 is a flow chart of a channel attention algorithm provided by an embodiment of the present application;

FIG. 10 is a schematic structural diagram of a detection apparatus according to an embodiment of the present disclosure;

FIG. 11 is a schematic structural diagram of a training apparatus for a defect detection model according to an embodiment of the present disclosure;

fig. 12 is a schematic structural diagram of a detection apparatus according to an embodiment of the present application.

Detailed Description

The technical solution in the present application will be described below with reference to the accompanying drawings.

The detection method provided by the embodiment of the application can be applied to a scene in which whether a product has defects or not needs to be identified, wherein the product refers to various tangible products, including hard products (such as various product parts, for example, screws, tiles and the like), flexible products (such as printed products, knitted products, leather products and the like) and the like. The following takes flexible products as an example, and briefly introduces a defect detection scene of printed matters and leather products.

Referring to fig. 1, fig. 1 is a schematic diagram of a defect detection scenario provided in an embodiment of the present application; after a flexible product is produced, when it is required to detect whether the product has a defect, an industrial camera may be used to acquire a picture of the flexible product to obtain a product picture, as shown in fig. 2a and 2b, fig. 2a and 2b are schematic diagrams of the product picture provided in the embodiment of the present application, where the product in fig. 2a is a printed product, and the product in fig. 2b is a leather product. Next, the product picture is input into a hardware computing platform, which is a device for executing the detection method according to the embodiment of the present application. After the product picture is subjected to relevant processing by the hardware computing platform, not only a detection result (i.e., whether the product is defective or not) of the product can be obtained, but also a thermodynamic diagram corresponding to the product can be obtained, as shown in fig. 3a and 3b, which are schematic diagrams of the thermodynamic diagrams provided by the embodiment of the present application, where fig. 3a is the thermodynamic diagram corresponding to the product of fig. 2a, and fig. 3b is the thermodynamic diagram corresponding to the product of fig. 2b, where colors with different shades in the thermodynamic diagram represent abnormal degrees of different positions of the workpiece, and specific defect positions of the product can be directly determined by looking at the thermodynamic diagram of the product.

Particularly, when a hardware computing platform processes a product picture, a multi-scale feature map of the product picture is obtained first, and an abnormal value of each pixel point in the multi-scale feature map can be determined by using the multi-scale feature map and a reconstructed feature map obtained in advance according to a product being produced, wherein the abnormal value represents the product abnormal degree of the pixel point at a corresponding position of the product; determining the defect detection result of the product by using the obtained abnormal value; according to the abnormal value of each pixel point in the first multi-scale characteristic diagram, the detection result of the product image and the specific defect position of the product can be determined, and workers can further analyze the defects of the product to be detected. Moreover, thermodynamic diagrams corresponding to the product pictures can be obtained by utilizing the abnormal values, and workers can quickly determine the defect positions of the products according to the thermodynamic diagrams.

Since the embodiments of the present application relate to the application of a large number of neural networks, for the convenience of understanding, the related terms and related concepts such as neural networks related to the embodiments of the present application will be described below.

(1) Neural network

The neural network may be composed of neural units, the neural units may refer to operation units with xs and intercept 1 as inputs, and the output of the operation units may be:

wherein s =1, 2, … … n, n is a natural number greater than 1, ws is the weight of xs, and b is the bias of the neural unit. f is an activation function (activation functions) of the neural unit for introducing a nonlinear characteristic into the neural network to convert an input signal in the neural unit into an output signal. The output signal of the activation function may be used as an input to the next convolutional layer. The activation function may be a sigmoid function. A neural network is a network formed by a number of the above-mentioned single neural units joined together, i.e. the output of one neural unit may be the input of another neural unit. The input of each neural unit can be connected with the local receiving domain of the previous layer to extract the characteristics of the local receiving domain, and the local receiving domain can be a region composed of a plurality of neural units.

(2) Deep neural network

Deep Neural Networks (DNNs), also known as multi-layer Neural networks, can be understood as Neural networks having many hidden layers, where "many" has no particular metric. From the division of DNNs by the location of different layers, neural networks inside DNNs can be divided into three categories: input layer, hidden layer, output layer. Typically, the first layer is the input layer, the last layer is the output layer, and the number of layers in between are all hidden layers. The layers being fully interconnected, i.e.That is, any neuron at the i-th layer must be connected to any neuron at the i + 1-th layer. Although DNN appears complex, it is not really complex in terms of the work of each layer, simply the following linear relational expression:

wherein the content of the first and second substances,

is the input vector of the input vector,

is the output vector of the output vector,

is an offset vector, W is a weight matrix (also called coefficient), and α () is an activation function. Each layer is only for the input vector

Obtaining the output vector through such simple operation

Due to the large number of DNN layers, the coefficient W and the offset vector

The number of the same is large. The definition of these parameters in DNN is as follows: taking coefficient W as an example: assume that in a three-layer DNN, the linear coefficients of the 4 th neuron of the second layer to the 2 nd neuron of the third layer are defined as

Superscript 3 represents the number of layers in which the coefficient W lies, and the subscripts correspond to the third layer index 2 at the output and the second layer index 4 at the input. The summary is that: the coefficients of the kth neuron of the L-1 th layer to the jth neuron of the L-1 th layer are defined as

Note that the input layer is without the W parameter. In deep neural networks, more hidden layers make the network more able to depict complex situations in the real world. Theoretically, the more parameters the higher the model complexity, the larger the "capacity", which means that it can accomplish more complex learning tasks. The final goal of the process of training the deep neural network, i.e., learning the weight matrix, is to obtain the weight matrix (formed by a number of layers of vectors W) of all layers of the deep neural network that has been trained.

(3) Convolutional neural network

A Convolutional Neural Network (CNN) is a deep neural Network with a Convolutional structure. The convolutional neural network includes a feature extractor consisting of convolutional layers and sub-sampling layers. The feature extractor may be viewed as a filter and the convolution process may be viewed as convolving an input image or convolved feature plane (feature map) with a trainable filter. The convolutional layer is a neuron layer for performing convolutional processing on an input signal in a convolutional neural network. In convolutional layers of convolutional neural networks, one neuron may be connected to only a portion of the neighbor neurons. In a convolutional layer, there are usually several characteristic planes, and each characteristic plane may be composed of several neural units arranged in a rectangular shape. The neural units of the same feature plane share weights, where the shared weights are convolution kernels. Sharing weights may be understood as the way image information is extracted is location independent. The underlying principle is: the statistics of one part of the image are the same as the other parts. Meaning that image information learned in one part can also be used in another part. The same learned image information can be used for all positions on the image. In the same convolution layer, a plurality of convolution kernels can be used to extract different image information, and generally, the greater the number of convolution kernels, the more abundant the image information reflected by the convolution operation.

The convolution kernel can be initialized in the form of a matrix of random size, and can be learned to obtain reasonable weights in the training process of the convolutional neural network. In addition, sharing weights brings the direct benefit of reducing connections between layers of the convolutional neural network, while reducing the risk of overfitting.

(4) Loss function

In the process of training the deep neural network, because the output of the deep neural network is expected to be as close to the value really expected to be predicted as possible, the weight vector of each layer of the neural network can be updated according to the difference between the predicted value of the current network and the really expected target value (of course, an initialization process is usually carried out before the first updating, namely parameters are preset for each layer in the deep neural network), for example, if the predicted value of the network is high, the weight vector is adjusted to be slightly lower, and the adjustment is carried out continuously until the deep neural network can predict the really expected target value or the value which is very close to the really expected target value. Therefore, it is necessary to define in advance "how to compare the difference between the predicted value and the target value", which are loss functions (loss functions) or objective functions (objective functions), which are important equations for measuring the difference between the predicted value and the target value. Taking the loss function as an example, if the higher the output value (loss) of the loss function indicates the larger the difference, the training of the deep neural network becomes the process of reducing the loss as much as possible.

(5) Back propagation algorithm

The convolutional neural network can adopt a Back Propagation (BP) algorithm to correct the size of parameters in the initial super-resolution model in the training process, so that the reconstruction error loss of the super-resolution model is smaller and smaller. Specifically, error loss occurs when an input signal is transmitted in a forward direction until the input signal is output, and parameters in an initial super-resolution model are updated by reversely propagating error loss information, so that the error loss is converged. The back propagation algorithm is a back propagation motion with error loss as a dominant factor, aiming at obtaining the optimal parameters of the super-resolution model, such as a weight matrix.

(6) Pixel value

The pixel value of the image may be a Red Green Blue (RGB) color value and the pixel value may be a long integer representing a color. For example, the pixel value is 256 Red +100 Green +76Blue, where Blue represents the Blue component, green represents the Green component, and Red represents the Red component. In each color component, the smaller the numerical value, the lower the luminance, and the larger the numerical value, the higher the luminance. For a grayscale image, the pixel values may be grayscale values.

(7) Image features

The image features are used to make the image understandable by a computer, and the image features generally have scale invariance and rotation invariance, and after the features are detected, the features can be extracted from the image, and the process may require many computers for image processing, and the result is called feature description or feature vector. Common image features include color features, texture features, shape features, and spatial relationship features. In addition, the feature of the image can be understood as another form expression obtained by subjecting the original data to a certain mathematical operation or rule. Therefore, the image data (mainly, two-dimensional images) is expressed in another form, i.e., a feature map, through a certain mathematical operation or rule.

The features may be further divided into global features and local features according to the regions of feature extraction, wherein the global features are features extracted from the entire image. The method is widely applied to the field of image retrieval, such as image color histogram. And a local feature is a feature extracted from a local region of the image (this local region is often a pixel in the image and its surrounding neighborhood).

(8) Convolutional layer

The convolutional layer may comprise a plurality of convolution operators, also called convolutional kernels, whose role in image processing is equivalent to a filter for extracting specific information from the input image matrix, and the convolution operator may be essentially a weight matrix, which is usually predefined, and during the convolution operation on the image, the weight matrix is usually processed on the input image pixel by pixel (or two pixels by two pixels … …, which depends on the value of the step size stride), so as to complete the task of extracting specific features from the image. The size of the weight matrix should be related to the size of the image, and it should be noted that the depth dimension (depth dimension) of the weight matrix is the same as the depth dimension of the input image, and the weight matrix extends to the entire depth of the input image during the convolution operation. Thus, convolving with a single weight matrix will produce a single depth dimension of the convolved output, but in most cases not a single weight matrix is used, but a plurality of weight matrices of the same size (row by column), i.e. a plurality of matrices of the same type, are applied. In the field of image processing, dimensions are also understood to be scale spaces. The output of each weight matrix is stacked to form the depth dimension of the convolution image, where the dimension is defined by "a plurality" as described above, and the convolution image output at this time is a feature map, and may also be a multi-dimensional feature map. Different weight matrices may be used to extract different features in the image, e.g., one weight matrix to extract image edge information, another weight matrix to extract a particular color of the image, yet another weight matrix to blur unwanted noise in the image, etc. The plurality of weight matrices have the same size (row × column), the feature maps extracted by the plurality of weight matrices having the same size also have the same size, and the extracted feature maps having the same size are combined to form the output of the convolution operation. Briefly, when performing feature extraction on an image by using N convolution kernels of the same size, a set of feature maps obtained by combining the N feature maps (the N feature maps have the same size) may be obtained, or a feature map having N channels (or N dimensions) may be obtained after performing feature extraction on an image, where each channel corresponds to one feature map, and the number of channels in the feature map is the same as the number of convolution kernels.

When an image is processed using multiple sets of convolution kernels (each set of convolution kernels includes N convolution kernels, with different sets of convolution kernels being of different sizes), multiple feature maps of different sizes can be obtained, each feature map having N channels.

When the convolutional neural network has a plurality of convolutional layers, the initial convolutional layer often extracts more general features, and the general features can also be called as low-level features; as the depth of the convolutional neural network increases, the features extracted by the convolutional layer further back become more complex, such as features with high-level semantics, and the features with higher semantics are more suitable for the problem to be solved.

(9) Pooling layer

During image processing, the only purpose of the pooling layer is to reduce the spatial size of the image. The pooling layer may include an average pooling operator and/or a maximum pooling operator for sampling the input image to smaller sized images. The average pooling operator may calculate pixel values in the image over a certain range to produce an average as a result of the average pooling. The max pooling operator may take the pixel with the largest value in a particular range as a result of the max pooling. In addition, just as the size of the weight matrix used in the convolutional layer should be related to the image size, the operators in the pooling layer should also be related to the image size. The size of the image output after the processing by the pooling layer may be smaller than the size of the image input to the pooling layer, and each pixel point in the image output by the pooling layer represents an average value or a maximum value of a corresponding sub-region of the image input to the pooling layer.

The system architecture provided by the embodiments of the present application is described below.

Referring to fig. 4, a system architecture 100 is provided in accordance with an embodiment of the present invention. As shown in the system architecture 100, the data acquisition device 160 is configured to acquire training data, which in the embodiment of the present application includes multiple pictures of normal (non-defective) products; and stores the training data into the database 130, and the training device 120 trains based on the training data maintained in the database 130 to obtain the defect detection model 101, mainly to obtain model parameters for feature extraction and to reconstruct a feature map. In the following, how the training device 120 obtains the defect detection model 101 based on the training data will be described in more detail in the second embodiment, where the defect detection model 101 can be used to implement the detection method provided in the embodiment of the present application, that is, after performing relevant preprocessing on the picture of the product to be detected, the picture is input into the defect detection model 101, and then the detection result and/or the thermodynamic diagram of the product to be detected can be obtained. It should be noted that, in practical applications, the training data maintained in the database 130 may not necessarily all come from the acquisition of the data acquisition device 160, and may also be received from other devices, such as a camera. It should be noted that, the training device 120 does not necessarily perform the training of the defect detection model 101 based on the training data maintained by the database 130, and it is also possible to obtain the training data from the cloud or other places for performing the model training, for example, the training device 120 performs the model training directly according to multiple normal product pictures collected by the industrial camera, and the above description should not be taken as a limitation to the embodiment of the present application.

The defect detection model 101 obtained by training according to the training device 120 may be applied to different systems or devices, for example, the execution device 110 shown in fig. 4, where the execution device 110 may be a terminal, such as a mobile phone terminal, a tablet computer, a notebook computer, a desktop computer, an AR/VR, a vehicle-mounted terminal, and may also be a server or a cloud. In fig. 4, the execution device 110 is configured with an I/O interface 112 for performing data interaction with an external device, and a user may input data to the I/O interface 112 through a client device 140, where the input data may include a picture of a product to be detected in the embodiment of the present application, so that the defect detection model 101 performs defect detection on the product to be detected, and in this case, the client device 140 may be an image acquisition device, such as a camera.

Optionally, the system further includes a preprocessing module 113, where the preprocessing module 113 is configured to perform preprocessing according to input data (such as the picture of the product to be tested) received by the I/O interface 112, and the preprocessed data enters the calculating module 111. In this embodiment, the preprocessing module 113 may be configured to perform at least one of filtering, enhancing, denoising, and the like on the to-be-detected product picture to obtain the to-be-detected product picture meeting the requirement.

In the process that the execution device 110 preprocesses the input data or in the process that the calculation module 111 of the execution device 110 executes the calculation or other related processes, the execution device 110 may call data, codes or the like in the data storage system 150 for corresponding processes, or store data, instructions or the like obtained by corresponding processes in the data storage system 150.

Finally, the I/O interface 112 returns the processing result, such as the detection result and/or the thermodynamic diagram of the product under test obtained by the calculating module 111, to the client device 140 for providing to the user, where the client device 140 may be a display.

In the case shown in fig. 4, the user may manually give the input data, which may be operated through an interface provided by the I/O interface 112. Alternatively, the client device 140 may automatically send the input data to the I/O interface 112, and if the client device 140 is required to automatically send the input data to obtain authorization from the user, the user may set the corresponding permissions in the client device 140. The user can view the result output by the execution device 110 at the client device 140, and the specific presentation form can be display, sound, action, and the like. The client device 140 may also be used as a data collection terminal, which collects the input data of the I/O interface 112 and the output result of the I/O interface 112 shown in fig. 4 as new sample data and stores the new sample data in the database 130. Of course, the input data of the I/O interface 112 and the output result of the I/O interface 112 shown in fig. 4 may be directly stored in the database 130 as new sample data by the I/O interface 112 without being collected by the client device 140.

It should be noted that fig. 4 is only a schematic diagram of a system architecture provided by an embodiment of the present invention, and the position relationship between the devices, modules, and the like shown in fig. 4 does not constitute any limitation, for example, in fig. 4, the data storage system 150 is an external memory with respect to the execution device 110, and in other cases, the data storage system 150 may also be disposed in the execution device 110. In addition, the training device 120 and the performing device 110 may be the same device.

A hardware structure of a chip provided in an embodiment of the present application is described below.

Fig. 5 is a hardware structure of a chip according to an embodiment of the present invention, where the chip includes a neural network processor 50. The chip may be provided in the execution device 110 as shown in fig. 4 to complete the calculation work of the calculation module 111. The chip may also be disposed in the training apparatus 120 as shown in fig. 4 to complete the training work of the training apparatus 120 and output the defect detection model 101.

The neural network processor 50 is mounted as a coprocessor on a Host CPU (Host CPU), and tasks are assigned by the Host CPU. The core part of the neural network processor 50 is an arithmetic circuit 503, and the controller 504 controls the arithmetic circuit 503 to extract data in the memory (the weight memory 502 or the input memory 501) and perform arithmetic.

In some implementations, the arithmetic circuit 503 internally includes a plurality of processing units (PEs). In some implementations, the operational circuitry 503 is a two-dimensional systolic array. The arithmetic circuit 503 may also be a one-dimensional systolic array or other electronic circuit capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuitry 503 is a general-purpose matrix processor.

For example, assume that there is an input matrix A, a weight matrix B, and an output matrix C. The arithmetic circuit fetches the data corresponding to the matrix B from the weight memory 502 and buffers it in each PE in the arithmetic circuit 503. The arithmetic circuit 503 takes the data of the matrix a from the input memory 501 and performs matrix arithmetic with the matrix B, and partial results or final results of the obtained matrix are stored in the accumulator 508.

The vector calculation unit 507 may further process the output of the operation circuit, such as vector multiplication, vector addition, exponential operation, logarithmic operation, magnitude comparison, and the like. For example, the vector calculation unit 507 may be used for network calculation of non-convolution/non-FC layers in a neural network, such as Pooling (Pooling), batch Normalization (Batch Normalization), local Response Normalization (Local Response Normalization), and the like.

In some implementations, vector calculation unit 507 stores the vector of processed outputs to unified memory 506. For example, the vector calculation unit 507 may apply a non-linear function to the output of the arithmetic circuit 503, such as a vector of accumulated values, to generate the activation value. In some implementations, the vector calculation unit 507 generates normalized values, combined values, or both. In some implementations, the vector of processed outputs can be used as activation inputs to the arithmetic circuitry 503, for example for use in subsequent layers in a neural network.

The unified memory 506 is used to store input data as well as output data.

The weight data directly passes through a Memory cell Access Controller 505 (DMAC) to transfer input data in the external Memory to the input Memory 501 and/or the unified Memory 506, store the weight data in the external Memory in the weight Memory 502, and store data in the unified Memory 506 in the external Memory.

A Bus Interface Unit 510 (BIU) for implementing the interaction between the main CPU, the DMAC, and the instruction fetch memory 509 through a Bus.

An instruction fetch memory 509 (instruction fetch buffer) connected to the controller 504 for storing instructions used by the controller 504;

the controller 504 is configured to call the instruction cached in the instruction storage 509 to implement controlling the working process of the operation accelerator.

Generally, the unified Memory 506, the input Memory 501, the weight Memory 502, and the instruction fetch Memory 509 are On-Chip memories, the external Memory is a Memory outside the NPU, and the external Memory may be a Double Data Rate Synchronous Dynamic Random Access Memory (DDR SDRAM), a High Bandwidth Memory (HBM), or other readable and writable memories.

The program algorithm in fig. 4 may be performed by the main CPU and the neural network processor 50 in cooperation.

A detection method provided in the embodiments of the present application is described below.

Example one

Fig. 6 is a detection method 600 according to an embodiment of the present invention, which includes the following steps:

step 601, obtaining a first multi-scale feature map of a product to be detected, wherein the first multi-scale feature map comprises multi-dimensional features of the product to be detected in different scale spaces;

specifically, the multi-scale feature map in the present application refers to a feature map of multi-dimensional features fused with different scale spaces. Optionally, a to-be-detected product picture of the to-be-detected product may be obtained first, and then a feature map of the to-be-detected product picture in different scale spaces is obtained, where the feature map is a multi-dimensional feature map, that is, a multi-channel feature map; and then fusing the feature maps of the different scale spaces to obtain a first multi-scale feature map, wherein the first multi-scale feature map comprises information contained in the feature maps of the different scale spaces. The number of the scale spaces is more than two, and the specific numerical value can be set according to the actual situation.

Step 602, determining an abnormal value of each pixel point in the first multi-scale feature map according to the first multi-scale feature map and the reconstructed feature map, wherein the abnormal value of each pixel point is used for representing the product abnormal degree of the pixel point at the corresponding position of the product to be detected; the reconstructed feature map comprises multi-dimensional features of t second multi-scale feature maps of normal products; the reconstruction characteristic graph comprises M reconstruction characteristic centers, the M reconstruction characteristic centers are obtained by performing characteristic reconstruction on t second multi-scale characteristic graphs obtained based on t normal product pictures, and M is equal to the number of pixel points in the second multi-scale characteristic graphs;

specifically, the reconstructed feature map is obtained by processing in advance according to a normal picture of the product to be detected, and in fact, the reconstructed feature map is obtained in a training stage of the defect detection model, and the details refer to the description of the second embodiment. Similarly, the reconstructed feature map refers to a feature map of the multi-dimensional features into which t second multi-scale feature maps are fused. In addition, the number of pixel points of the first multi-scale feature map is also M, and M abnormal values of the first multi-scale feature map can be determined according to the first multi-scale feature map and the reconstructed feature map.

Step 603, determining the detection result of the product to be detected according to the abnormal value of each pixel point in the first multi-scale feature map.

Specifically, whether the product to be detected has defects or not can be determined according to the M abnormal values, namely, the detection result of the product to be detected is divided into the product defect and the product defect. In addition, because a mapping relation exists between each pixel point in the first multi-scale characteristic diagram and the product to be detected, the position of the defect in the product to be detected can be determined by using the abnormal value of each pixel point and the mapping relation, so that defect positioning is realized, and workers are helped to further analyze the defect of the product to be detected.

In the detection method of the embodiment of the application, the first multi-scale feature map of the image of the product to be detected, which is obtained through the multi-scale fusion algorithm, is utilized (the first multi-scale feature map comprises multi-dimensional features under different scales, the small-scale features represent features of a large image area, and the large-scale features represent features of a small image area), and the abnormal value of each pixel point of the first multi-scale feature map is determined, so that the detection of small-size defects of the product to be detected can be effectively realized, and the product detection effect is improved. In addition, the reconstructed characteristic diagram is obtained by processing t normal product pictures, so that the detection method disclosed by the embodiment of the invention belongs to a single-class detection algorithm, the training of a defect detection model can be completed by utilizing the normal product pictures, the product defect detection is further realized, and the realization of the product defect detection method is easy.

In some possible embodiments, the detection method 600 may further include:

and acquiring a picture of a product to be detected of the product to be detected, and preprocessing the picture of the product to be detected. The preprocessing can include at least one of filtering, enhancing, denoising and the like, so as to obtain a to-be-detected product picture meeting the requirement. The preprocessing aims to remove useless information in the picture of the product to be detected, and the efficiency of detecting the product defects is improved.

Referring to fig. 7, fig. 7 is a schematic structural diagram of a defect detection model provided in an embodiment of the present application; the defect detection model provided by the embodiment of the application can comprise: the system comprises a feature extraction module 701, a feature selection module 702 and a defect detection module 703, and the detailed description process of the defect detection model refers to the following detailed description of fig. 8.

In some possible embodiments, in step 601, the defect detection model performs feature extraction on the to-be-detected product picture to obtain a plurality of feature maps with different scales (it should be understood that the "feature map" here is a feature map with more than one channel, and the "plurality" is more than two channels). Specifically, referring to fig. 7, a feature extraction module 701 in the defect detection model performs feature extraction on a picture of a product to be detected, and the feature extraction module 701 may be implemented by using a feature extraction neural network as an example. With further reference to fig. 8, fig. 8 is a specific flowchart of a detection method provided in the embodiments of the present application; illustratively, a product to be detected takes a printed matter as an example, an input image of the product to be detected is an image 801 of the printed matter to be detected, the feature extraction module 802 includes more than two feature extraction layers (e.g., feature extraction layer 1, feature extraction layer 2, feature extraction layer 3, … … feature extraction layer n), each feature extraction layer can obtain a feature map of one scale of the image 801 of the printed matter to be detected, for example, the image of the printed matter to be detected can obtain one feature map 806 (the number of channels of the feature map 806 is 6) of the image 801 of the printed matter to be detected through the feature extraction layer 1, and then one feature map 808 (the number of channels of the feature map 808 is 2) of the image 801 of the printed matter to be detected can be obtained through the feature extraction layer 2 (the number of channels of the feature map 807 is 3) and continuously through the feature extraction layer 3, and by analogy, n feature maps of the image 801 to be detected can be obtained. The feature selection module 803 of the defect detection model fuses the n feature maps to obtain a first multi-scale feature map 809. The defect detection module 804 of the defect detection model may determine the defect result of the to-be-detected printed matter by using the first multi-scale feature map 809 and the reconstructed feature map corresponding to the printed matter (the reconstructed feature map is obtained by using normal image processing of the printed matter). In addition, the defect detection model can also output a thermodynamic diagram 805 of the to-be-detected printed matter, and a user can know whether the to-be-detected printed matter is defective or not and the specific position of the defect by looking at the thermodynamic diagram 805.

In some possible embodiments, in obtaining the first multi-scale feature map, step 601 includes the following steps:

step A1, a plurality of first characteristic diagrams of a product to be detected are obtained, and the sizes of the first characteristic diagrams are different.

Specifically, a picture of a product to be detected may be obtained first, and then feature extraction in different scale spaces is performed on the picture of the product to be detected, so as to obtain a plurality of first feature maps of the product to be detected, where the plurality of first feature maps have different sizes, and the number of channels of the plurality of first feature maps may be the same or different, and is not particularly limited.

And step A2, performing size unified processing on the multiple first feature maps according to a preset size to obtain multiple second feature maps, wherein the multiple second feature maps have the same size.

Specifically, in this embodiment, the first feature map is scaled to obtain the second feature maps with the preset size, at this time, the sizes of the plurality of second feature maps are all the preset size, and the specific numerical value of the preset size may be set according to actual needs. The characteristic diagram scaling method can be realized by adopting the existing processing method.

And step A3, splicing the plurality of second characteristic graphs along the channel dimension to obtain a third multi-scale characteristic graph.

Specifically, the multiple second feature maps are vertically spliced along the channel dimension to obtain a third multi-scale feature map, and the channel number of the third multi-scale feature map is the sum of the channel numbers of the multiple second feature maps. The order of the second feature map is not particularly limited.

And step A4, applying different weights to different channels of the third multi-scale feature map based on the attention mechanism to obtain a first multi-scale feature map.

In particular, the attention mechanism is a mechanism for simulating human attention in deep learning, and generally some weights are obtained according to some information learning to enhance information useful for tasks and suppress information not useful for tasks. In this embodiment, the feature map of the channel that is useful for defect detection in the third multi-scale feature map is enhanced, and the feature map of the channel that does not contribute much to defect detection is suppressed, that is, different weights are assigned to different channels, the sum of the weights of all channels is 1, that is, the weight of the channel may be any value between 0 and 1, and the weight of one channel may be 0 but not 1, so as to fully utilize the feature maps of different sizes to perform defect detection, thereby achieving small-size defect detection.

In the detection method of the embodiment of the application, a method for obtaining a first multi-scale feature map is provided, wherein for a plurality of first feature maps with different sizes of a to-be-detected product picture, size unified processing is performed first, then splicing is performed along channel dimensions to obtain a third multi-scale feature map, different weights are applied to different channels of the third multi-scale feature map based on an attention mechanism, and the first multi-scale feature map can be obtained.

Further, weights are assigned to different channels of the third multi-scale feature map based on an attention mechanism, which may be specifically implemented by an attention mechanism algorithm such as a channel attention (channel attention) algorithm, and refer to fig. 9, where fig. 9 is a flowchart of a channel attention algorithm provided in this embodiment of the present application; among them, the model of the channel attention algorithm includes a GP (global pooling layer) 902, an FC (fully connected layer) 903, an FC (fully connected layer) 904, an S (softmax layer) 905, and a weighting layer 906. The global pooling layer is used for taking the maximum value from the feature map of each channel and outputting the maximum value so as to reduce the size of the feature map. Assuming that the size of the input third multi-scale feature map 901 is (32, 32, 64), after passing through the global pooling layer, a feature map with a size of (1,1, 64) will be obtained. While the role of the fully connected layer is to achieve dimensional changes, FC903 now serves to change a high-dimensional feature map to a low-dimensional feature map, for example, a feature map of size (1,1, 64) to a feature map of size (1, 32). The FC904 converts a low-dimensional feature map into a high-dimensional feature map, for example, a feature map of size (1, 32) into a feature map of size (1,1, 64). In addition, the softmax layer has the function of assigning weights to different channels of the feature map, the sum of the weights of all the channels is 1, and the weight of one channel can be 0 but not 1. Finally, each channel of the third multi-scale feature map is multiplied by a corresponding weight through processing by the weighting layer 906, so as to associate the feature map of each channel with the weight, and obtain the first multi-scale feature map 907. In fig. 9, the shades of the feature maps of different channels in the first multi-scale feature map 907 represent different weights.

In other possible embodiments, in obtaining the first multi-scale feature map, step 601 includes the following steps:

step B1, acquiring a plurality of third characteristic diagrams of the product to be detected, wherein the plurality of third characteristic diagrams are different in size and the number of channels is the same;

specifically, a picture of the product to be detected may be obtained first, and then feature extraction in different scale spaces is performed on the picture of the product to be detected to obtain a plurality of first feature maps, where the plurality of first feature maps have different sizes, and the number of channels of the plurality of first feature maps may be the same or different, and is not particularly limited. When the number of channels of the plurality of first characteristic diagrams is the same, the first characteristic diagram is the third characteristic diagram. When the number of channels of the first feature maps is different, the number of channels of the first feature maps needs to be changed, and the number of channels of the first feature maps is unified to obtain a plurality of third feature maps with the same number of channels. The unified channel number can be a preset channel number, and the specific numerical value can be set according to actual conditions. More specifically, the first profile may be input into a 1x1 convolutional layer to output a third profile for a predetermined number of channels.

Step B2, carrying out size unified processing on the plurality of third feature maps according to a preset size to obtain a plurality of fourth feature maps, wherein the sizes of the plurality of fourth feature maps are the same;

specifically, the third feature map is scaled to obtain a fourth feature map with a preset size. Wherein, the specific numerical value of the preset size can be set according to the actual situation.

And step B3, applying different weights to the plurality of fourth feature maps based on the attention mechanism so as to obtain a first multi-scale feature map through weighted fusion.

Specifically, weights are then assigned to the plurality of fourth feature maps based on the attention mechanism, wherein the weights of the feature maps of different channels in the fourth feature maps are the same. And performing weighted fusion on pixel points of the same position of each feature map in the plurality of fourth feature maps to obtain a fused pixel value so as to obtain a first multi-scale feature map, wherein at the moment, the number of channels of the first multi-scale feature map is a preset number of channels. Assuming that the plurality of fourth feature maps are F1, F2 and F3, and the weights set based on the attention mechanism are a1, a2 and a3, respectively, the first multi-scale feature map F = a1 × F1+ a2 × F2+ a3 × F3.

In the detection method according to the embodiment of the present application, another method for obtaining a first multi-scale feature map is provided, in which, for a plurality of third feature maps with different sizes and the same number of channels of a to-be-detected product picture, size-unified processing is performed first to obtain a plurality of fourth feature maps, and then different weights are assigned to the plurality of fourth feature maps based on an attention mechanism, so that the plurality of fourth feature maps are subjected to weighted fusion to obtain the first multi-scale feature map.

In still other possible embodiments, in obtaining the first multi-scale feature map, step 601 includes the following steps:

step C1, acquiring a plurality of fifth feature maps of the to-be-detected product picture, wherein the sizes of the fifth feature maps are different;

specifically, a to-be-detected product picture may be obtained first, and then feature extraction in different scale spaces is performed on the to-be-detected product picture to obtain a plurality of fifth feature maps, where the sizes of the plurality of fifth feature maps are different, and the number of channels of the plurality of fifth feature maps may be the same or different, and is not particularly limited.

Step C2, carrying out size unified processing on the plurality of fifth feature maps according to a preset size to obtain a plurality of sixth feature maps, wherein the sizes of the plurality of sixth feature maps are the same;

specifically, the fifth feature map is then converted into a sixth feature map with a preset size, and the sizes of the sixth feature maps are all preset sizes.

Step C3, applying different weights to different channels of the multiple sixth feature maps respectively based on the attention mechanism to obtain multiple seventh feature maps;

specifically, the seventh feature map is obtained by applying different weights to different channels of the sixth feature map based on the attention mechanism, and thus, a plurality of seventh feature maps can be obtained. The purpose of this processing is to preferably select data of a channel useful for defect detection and suppress data of a channel disadvantageous to defect detection.

And C4, applying different weights to the multiple seventh feature maps based on the attention mechanism, and splicing the multiple seventh feature maps along the channel dimension to obtain a first multi-scale feature map.

Specifically, different weights are applied to the plurality of seventh feature maps again based on the attention mechanism, in order to preferentially select data of the seventh feature map that is useful for defect detection among the plurality of seventh feature maps, and suppress data of the seventh feature map that is disadvantageous for defect detection among the plurality of seventh feature maps. And finally, vertically splicing the plurality of seventh feature maps along the dimension of the channel (the plurality of seventh feature maps at this time are the plurality of seventh feature maps with different weights) to obtain a first multi-scale feature map. And the channel number of the first multi-scale feature map at this time is the sum of the channel numbers of the seventh feature maps.

In the detection method of the embodiment of the application, another method for obtaining a first multi-scale feature map is provided, wherein for a plurality of fifth feature maps with different sizes of a to-be-detected product picture, size unified processing is performed to obtain a plurality of sixth feature maps; respectively applying different weights to different channels of the sixth feature maps based on an attention mechanism to obtain seventh feature maps; and then, applying different weights to the multiple seventh feature maps based on the attention mechanism, and splicing the multiple seventh feature maps along the channel dimension to obtain a first multi-scale feature map.

In some possible embodiments, the detection method 600 further comprises the steps of:

acquiring a reconstruction characteristic diagram, wherein the acquiring of the reconstruction characteristic diagram comprises the following steps:

acquiring t second multi-scale feature maps of the normal product, wherein the second multi-scale feature maps contain multi-dimensional features of the normal product in different scale spaces; respectively carrying out pixel-by-pixel decomposition on the t second multi-scale feature maps to obtain t multi-scale feature vector sets, wherein the multi-scale feature vector sets comprise multi-scale feature vectors corresponding to all pixel points of the second multi-scale feature maps, and the element number of the multi-scale feature vector sets is the pixel point number M of the second multi-scale feature maps; respectively carrying out feature reconstruction on t multi-scale feature vectors with the same pixel point position in the t multi-scale feature vector sets to obtain a reconstruction feature center of a corresponding pixel point; and (4) obtaining a reconstruction characteristic diagram according to M reconstruction characteristic centers.

Specifically, the method for acquiring the second multi-scale feature map may refer to the method for acquiring the first multi-scale feature map, and the methods for acquiring the second multi-scale feature map and the first multi-scale feature map are the same and are not described in detail. Specifically, a process of obtaining the reconstructed feature map is a training process of the defect detection model, wherein t normal product pictures can be obtained first, and the t normal product pictures are pictures of normal products, and all of the t normal product pictures belong to normal sample pictures, but the t normal product pictures are slightly different in product structure. The't normal product pictures' are a training set of the defect detection model, the specific training process can refer to the record of the second embodiment, the reconstructed feature map obtained through training can summarize the features of the t normal product pictures, and single-class defect detection can be performed according to the reconstructed feature map. While the first embodiment may be understood as the application stage of the defect detection model (e.g., the stage performed by the execution device 110 shown in fig. 4). Particularly, different products have different reconstruction characteristic diagrams, and for a product to be detected, a normal picture of the product can be used for obtaining the corresponding reconstruction characteristic diagram, so that defect detection is performed on the product to be detected according to the reconstruction characteristic diagram.

In some possible embodiments, step 602 includes the steps of:

respectively carrying out pixel-by-pixel decomposition on the first multi-scale feature map and the reconstruction feature map to obtain M first position features and M reconstruction feature centers; and determining an abnormal value of a corresponding pixel point in the first multi-scale characteristic graph according to the first position characteristic of the same pixel point position and the reconstructed characteristic center.

In the detection method of the embodiment of the application, when the abnormal value of each pixel point in the first multi-scale feature map is determined, the first multi-scale feature map and the reconstruction feature map may be decomposed pixel by pixel to obtain M first position features of the first multi-scale feature map and M reconstruction feature centers of the reconstruction feature map; then, according to the first position feature and the center of the reconstruction feature of the same pixel point position, an abnormal value of a pixel point at a corresponding position in the first multi-scale feature map can be calculated. In some possible embodiments, the outlier includes a euclidean distance or a cosine distance between the first position feature of the pixel point and the corresponding center of the reconstructed feature, and other characterizing quantities that can represent the degree of feature difference between the first position feature and the corresponding center of the reconstructed feature belong to the outlier, which is within the scope of the present application.

Further, referring to fig. 8, in the embodiment of the present application, it is assumed that the size of the first multi-scale feature map 809 is (H, W, C), where H, W are the feature map sizes, and C is the number of channels. After pixel-by-pixel decomposition is performed on the first multi-scale feature map 809, hxw (where M = hxw) first location features 810 can be obtained, and hxw outliers can be obtained by calculating outliers for each of the first location features 810. In particular, in the embodiment of the present application, an abnormal value map 812 can be obtained according to H × W abnormal values, the resolution of the abnormal value map 812 is H × W, and the abnormal value map 812 can be used for determination in subsequent determination of the detection result of the product to be detected. By analyzing the abnormal value of each pixel point on the abnormal value map 812, the position of the defect in the image can be obtained, which is beneficial for workers to further analyze the defect of the product.

In some possible embodiments, step 603 comprises the steps of:

and if the abnormal value of any pixel point in the first multi-scale characteristic diagram is larger than the preset abnormal value, determining that the product to be detected is a defective product.

Specifically, in this embodiment, when determining the detection result of the product to be detected, one possible determination manner is: and if the abnormal value of any pixel point in the first multi-scale characteristic diagram is larger than the preset value, the defect of the product to be detected can be determined, and the product belongs to a defective product. The specific value of the preset abnormal value may be set according to an actual situation, and may be set as an empirical value, or the preset abnormal value in the present application may be a maximum value among all abnormal values in a training set of the defect detection model. For example, referring to fig. 8, a black area in the first multi-scale feature map 809 is a defective area, and when the abnormal value of the first position feature 811 is determined to be greater than the preset abnormal value, the product to be measured may be determined to be a defective product.

In other possible embodiments, step 603 includes the steps of:

acquiring the number of pixel points of which the abnormal values are larger than the preset abnormal values in the first multi-scale characteristic graph; and if the number is larger than the preset number, determining that the product to be detected is a defective product.

Specifically, the specific numerical value of the preset number may be set according to actual needs, and is not particularly limited. In the detection method of the embodiment of the application, when the detection result of the product to be detected is determined according to the abnormal value of each pixel point of the first multi-scale feature map, the number of the pixel points of which the abnormal values are greater than the preset abnormal value in the first multi-scale feature map may be obtained first, and when the number is determined to be greater than the preset number, the product to be detected may be determined to be a defective product.

In still other possible embodiments, step 603 includes the steps of:

and if the maximum abnormal value in the first multi-scale characteristic diagram is larger than the preset abnormal value, determining that the product to be detected is a defective product.

Specifically, when the detection result of the product to be detected is determined according to the abnormal value, the maximum value of the abnormal values in all pixel points of the first multi-scale feature map of the product to be detected can be determined, and if the maximum abnormal value is greater than a preset abnormal value, the product to be detected can be determined to be a defective product.

In some possible embodiments, the detection method 600 further comprises:

and when the product to be detected is a defective product, determining the pixel points with the abnormal values larger than the preset abnormal values in the first multi-scale characteristic diagram as the defective area of the product to be detected in the mapping area of the product to be detected.

Specifically, when the product to be detected is determined to be a defective product, because the first multi-scale feature map and the product to be detected have a certain mapping relationship, the mapping region of the pixel point, in the product to be detected, of which the abnormal value is greater than the preset abnormal value in the first multi-scale feature map is the defective region of the product to be detected. In fact, the defect area of the product to be detected can be determined according to the pixel point mapping relationship existing between the first multi-scale feature map and the picture of the product to be detected.

Specifically, because the first multi-scale feature map and the to-be-detected product picture have a certain mapping relationship, a thermodynamic diagram corresponding to the to-be-detected product picture can be output according to an abnormal value of each pixel point in the first multi-scale feature map, the thermodynamic diagram can visually display a defect detection result of the to-be-detected product to a user, and the user can know the defect condition of the to-be-detected product including whether the product has a defect, the specific defect position of the product and the like by looking up the thermodynamic diagram. Referring to fig. 8, in a specific application, abnormal values corresponding to all pixel points on the picture of the product to be detected may be determined according to a pixel point mapping relationship between the abnormal value map 812 and the picture of the product to be detected (each pixel point in the first multi-scale feature map may be mapped to a certain number of pixel points in the picture of the product to be detected), and then a thermodynamic diagram 805 may be generated according to the abnormal values, so as to represent the size of the abnormal values by the shade of color.

The detection method is a defect detection method based on single-class detection, under the condition that only normal workpiece pictures are used, feature maps with various scales are fused, and an SVDD algorithm is used for each position feature in the fused feature maps, so that not only can a defect detection result of a product to be detected be obtained, but also the defect position can be detected, and meanwhile, the detection effect on small defects is better.

The method 600 may be specifically executed by the execution device 110 shown in fig. 4, the picture of the product to be tested in the method 600 may be input data given by the client device 140 shown in fig. 4, the preprocessing module 113 in the execution device 110 may be used to execute a preprocessing step on the picture of the product to be tested, and the computing module 111 in the execution device 110 may be used to execute the method 600.

Optionally, the method 600 may be processed by a CPU, or may be processed by the CPU and a GPU together, or may use, instead of the GPU, other processors suitable for neural network computation, which is not limited in this application.

Example two:

referring to fig. 4, the training process of the defect detection model may be executed in the training device 120, or may be executed in advance by other functional modules before the training device 120, that is, the training data received or obtained from the database 130 is preprocessed, such as filtered and denoised, to obtain preprocessed training data, which is used as the input of the training device 120.

Optionally, the training process of the defect detection model may be processed by the CPU, or may be processed by the CPU and the GPU together, or may use other processors suitable for neural network computation instead of the GPU, which is not limited in this application.

The following describes the training phase of the defect detection model:

s1: collecting pictures of normal samples of a product to be tested as a training set St, wherein the pictures comprise Nt normal sample pictures; where Nt is not less than 1, typically a value greater than 10.

S2: inputting each picture of the training set St into a pre-training neural network (for example, wideResnet50, efficientNet, etc.), taking extracting feature maps of three scales as an example, different feature extraction layers in the pre-training neural network output feature maps of different scales: ft = (fta, ftb, ftc).

S3: scaling feature maps (fta, ftb, ftc) of different scales to the same scale, then splicing along channel dimensions to obtain a combined multi-scale feature map, applying weights to different channels of the multi-scale feature map by using a channel optimization algorithm to obtain a second fused multi-scale feature map Ft, wherein the size of the second fused multi-scale feature map Ft is (H, W, C). And obtaining a second multi-scale feature map Ft for each picture of the training set, wherein Nt second multi-scale feature maps Ft are obtained in total.

S4: and splitting the second multi-scale feature map into a plurality of 1-dimensional features according to positions, wherein each feature comprises corresponding position information, applying a feature reconstruction algorithm to each position feature to obtain a reconstruction feature center, and aggregating all the reconstruction feature centers again to obtain the reconstruction feature map. The feature center refers to a central representation of a plurality of data features, that is, a new feature found by performing distance calculation through a defined rule so that the sum of distances from the new feature to each feature is minimum, wherein the distance may be an euclidean distance or a cosine distance or another parameter representing the feature difference degree.

Specifically, the initialized reconstructed feature map Fc is the mean of all the second multi-scale feature maps Ft of the training set, and has the size of (H, W, C). Fc and Ft are decomposed into H x W multi-scale feature vectors Ft and Fc of size (1, c), respectively, each representing a feature of a corresponding location. And calculating Euclidean distance L for the multi-scale feature vectors ft and Fc at the position of each pixel point, summing the Euclidean distances at all the positions to obtain L as a loss function, and optimizing Fc by adopting an SVDD (space vector data) algorithm to obtain an optimized reconstructed feature map Fc' with the size of (H, W, C).

It is worth noting that, for step S3, the present embodiment also provides an alternative step:

assuming that the number of channels of the feature maps with different scales obtained in step S2 is different, the feature maps (fta, ftb, ftc) with different scales need to pass through the respectively connected 1 × 1 convolutional layers, so as to obtain the feature maps with the same number of channels and different scales; then, scaling the feature maps with the same number of channels and different scales to the same scale to obtain f 'ta, f' tb, f 'tc, and calculating a fused weighted feature map Ft by learnable parameters a, b, C, wherein Ft = a & ltf' ta + b & ltf 'tb + C & ltf' tc, and the size is (H, W, C), namely, a second multi-scale feature map Ft.

In addition, as for step S3, the steps as described above in step C2 to step C4 may also be adopted instead.

After the processes of steps S1 to S4 are performed, the training of the defect detection model is completed. And detecting the picture of the product to be detected by using the defect detection model obtained by training. In addition, in order to check the training result, the trained defect detection model may be tested, that is, the testing stage of the model specifically includes the following steps:

s1: and constructing a test set Se, wherein the test set Se comprises a part of normal sample pictures and a part of defect sample pictures.

S2: each picture of the test set Se is input into a defect detection model, and feature maps with different scales can be output: fe = (fea, feb, fec).

S3: scaling feature maps (fea, feb, fec) of different scales to the same scale, splicing along channel dimensions to obtain a combined multi-scale feature map, applying weights to different channels by using a channel optimization algorithm to obtain a fused first multi-scale feature map Fe with the size of (H, W, C).

S4: and decomposing the reconstructed feature map Fc' obtained by training and the first multi-scale feature map Fe into H & W feature vectors Fe and Fc with the size of (1, C), wherein each feature vector represents the feature of the corresponding position. For the features fe and fc of each pixel point position, the Euclidean distance l is calculated as an abnormal value, and an abnormal value map D with the size of (H, W, 1) is obtained.

S5: and finding the maximum abnormal value dmax according to the abnormal value graph D in the previous step, judging whether dmax is larger than a preset abnormal value theta, if dmax is larger than theta, the sample is a defective sample, otherwise, the sample is a normal sample. Meanwhile, the defect detection model can also output abnormal values as an abnormal area thermodynamic diagram, and a highlight area in the thermodynamic diagram is an abnormal area of the product. In the testing process, the testing result of the model can be manually checked to determine the accuracy of the defect detection model, and when the accuracy of the model meets the requirement, the tested defect detection model can be used for actual product defect detection.

Based on the above-mentioned detection method embodiment, referring to fig. 10, fig. 10 is a schematic structural diagram of a detection apparatus provided in the embodiment of the present application; an embodiment of the present application further provides a detection apparatus, including:

the acquiring module 131 is configured to acquire a first multi-scale feature map of a product to be detected, where the first multi-scale feature map includes multi-dimensional features of the product to be detected in different scale spaces;

the determining module 132 is configured to determine an abnormal value of each pixel point in the first multi-scale feature map according to the first multi-scale feature map and the reconstructed feature map, where the abnormal value of each pixel point is used to represent a product abnormal degree of the pixel point at a corresponding position of the product to be detected; the reconstruction characteristic graph comprises the characteristics of t second multi-scale characteristic graphs of normal products, the reconstruction characteristic graph comprises M reconstruction characteristic centers, the M reconstruction characteristic centers are obtained by performing characteristic reconstruction on the t second multi-scale characteristic graphs obtained based on the t normal product pictures, and M is equal to the number of pixel points in the second multi-scale characteristic graphs; and determining the detection result of the product to be detected according to the abnormal value of each pixel point in the first multi-scale characteristic diagram.

The detection device in the embodiment of the application can be used for positioning the defect position of the product and helping a worker to analyze the defective product.

In some possible embodiments, the determining module 132 is specifically configured to: and if the abnormal value of any pixel point in the first multi-scale characteristic diagram is larger than the preset abnormal value, determining that the product to be detected is a defective product.

In some possible embodiments, the determining module 132 is specifically configured to: acquiring the number of pixel points of which the abnormal values are larger than the preset abnormal values in the first multi-scale characteristic graph; and if the number is larger than the preset number, determining that the product to be detected is a defective product.

In some possible embodiments, the determining module 132 is further configured to: and when the product to be detected is a defective product, determining the pixel points with the abnormal values larger than the preset abnormal values in the first multi-scale characteristic diagram as the defective area of the product to be detected in the mapping area of the product to be detected.

In some possible embodiments, the obtaining module 131 is specifically configured to:

acquiring a plurality of third feature maps of a product to be detected, wherein the third feature maps have different sizes and the same channel number; carrying out size unified processing on the plurality of third feature maps according to a preset size to obtain a plurality of fourth feature maps, wherein the sizes of the plurality of fourth feature maps are the same; and applying different weights to the plurality of fourth feature maps based on the attention mechanism so as to obtain a first multi-scale feature map through weighted fusion.

acquiring a plurality of fifth characteristic diagrams of a product to be detected, wherein the sizes of the plurality of fifth characteristic diagrams are different; carrying out size unified processing on the plurality of fifth feature maps according to a preset size to obtain a plurality of sixth feature maps, wherein the sizes of the plurality of sixth feature maps are the same; respectively applying different weights to different channels of the sixth feature maps based on an attention mechanism to obtain seventh feature maps; and applying different weights to the seventh feature maps based on the attention mechanism, and splicing the seventh feature maps along the channel dimension to obtain a first multi-scale feature map.

In some possible embodiments, the determining module 132 is specifically configured to:

In some possible embodiments, the outlier comprises a euclidean or cosine distance between the first location feature of the pixel point and the center of the corresponding reconstructed feature.

In some possible embodiments, the obtaining module 131 is further configured to obtain a reconstructed feature map;

the obtaining module 131 is specifically configured to: acquiring t second multi-scale feature maps of the normal product, wherein the second multi-scale feature maps contain multi-dimensional features of the normal product in different scale spaces; respectively carrying out pixel-by-pixel decomposition on the t second multi-scale feature maps to obtain t multi-scale feature vector sets, wherein the multi-scale feature vector sets comprise multi-scale feature vectors corresponding to each pixel point of the second multi-scale feature maps, and the element number of the multi-scale feature vector sets is the pixel point number M of the second multi-scale feature maps; respectively carrying out feature reconstruction on t multi-scale feature vectors with the same pixel point position in the t multi-scale feature vector sets to obtain a reconstruction feature center of a corresponding pixel point; and (4) obtaining a reconstruction characteristic diagram according to M reconstruction characteristic centers.

In some possible embodiments, the detection device further comprises:

the output module is used for outputting the thermodynamic diagram corresponding to the product to be tested according to the abnormal value; the thermodynamic diagram is marked by dark and light colors, and the lighter the color of a pixel point in the thermodynamic diagram is, the smaller an abnormal value corresponding to the pixel point is represented; the darker the color of the pixel point in the thermodynamic diagram, the larger the abnormal value corresponding to the pixel point is.

The specific implementation process and the beneficial effect description of the detection device may refer to the description of the embodiment of the detection method, and are not repeated.

Fig. 11 is a schematic structural diagram of a training apparatus for a defect detection model according to an embodiment of the present application. The training apparatus 141 of the defect detection model shown in fig. 11 (the apparatus 141 may be a computer device) includes a memory 142, a processor 143, a communication interface 144, and a bus 145. The memory 142, the processor 143, and the communication interface 144 are communicatively connected to each other via a bus 145.

The Memory 142 may be a Read Only Memory (ROM), a static Memory device, a dynamic Memory device, or a Random Access Memory (RAM). Memory 142 may store programs that, when executed by processor 143, stored in memory 142, processor 143 and communication interface 144 are configured to perform the various steps of a training process for a defect detection model of an embodiment of the present application.

The processor 143 may be a general-purpose Central Processing Unit (CPU), a microprocessor, an Application Specific Integrated Circuit (ASIC), a Graphics Processing Unit (GPU) or one or more Integrated circuits, and is configured to execute related programs to perform the steps of the training process of the defect detection model according to the embodiment of the method.

The processor 143 may also be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the training process of the defect detection model of the present application may be implemented by integrated logic circuits of hardware or instructions in the form of software in the processor 143. The processor 143 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, or discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 142, and the processor 143 reads information in the memory 142 and performs a training process of the defect detection model of the method embodiment of the present application in combination with hardware thereof.

Communication interface 144 enables communication between apparatus 141 and other devices or communication networks using transceiver means such as, but not limited to, a transceiver. For example, training data may be acquired through communication interface 144.

Bus 145 may include a path that transfers information between various components of device 141 (e.g., memory 142, processor 143, communication interface 144).

It is understood that the apparatus 141 corresponds to the training device 120. Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the technical solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

Based on the above-mentioned detection method embodiment, referring to fig. 12, fig. 12 is a schematic structural diagram of a detection apparatus provided in the embodiment of the present application; the embodiment of the present application further provides a detection device, and the detection device 151 shown in fig. 12 (the device 151 may specifically be a computer device) includes a memory 152, a processor 153, a communication interface 154, and a bus 155. The memory 152, the processor 153 and the communication interface 154 are connected to each other through a bus 155.

The Memory 152 may be a Read Only Memory (ROM), a static Memory device, a dynamic Memory device, or a Random Access Memory (RAM). The memory 152 may store a program, and when the program stored in the memory 152 is executed by the processor 153, the processor 153 and the communication interface 154 are used to perform the steps of the detection method of the embodiment of the present application.

The processor 153 may be a general Central Processing Unit (CPU), a microprocessor, an Application Specific Integrated Circuit (ASIC), a Graphics Processing Unit (GPU), or one or more Integrated circuits, and is configured to execute related programs to implement the functions that need to be executed by the units in the detection apparatus of the embodiment of the present Application, or to execute the detection method of the embodiment of the present Application.

The processor 153 may also be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the detection method of the present application may be implemented by hardware integrated logic circuits in the processor 153 or instructions in the form of software. The processor 153 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, or discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software modules may be located in ram, flash, rom, prom, or eprom, registers, etc. as is well known in the art. The storage medium is located in the memory 152, and the processor 153 reads the information in the memory 152, and completes the functions required to be executed by the units included in the detection apparatus of the embodiment of the present application or executes the detection method of the embodiment of the method of the present application in combination with the hardware thereof.

Communication interface 154 enables communication between device 151 and other devices or communication networks using transceiver means, such as, but not limited to, a transceiver. For example, a picture of the product under test may be obtained through communication interface 154.

Bus 155 may include a path to transfer information between components of device 151 (e.g., memory 152, processor 153, communication interface 154).

It should be understood that the acquiring module 131 and the determining module 132 in the detecting device may correspond to the processor 153.

It should be noted that although the apparatus 141 and the device 151 shown in fig. 11 and 12 only show memories, processors, and communication interfaces, in a specific implementation process, those skilled in the art should understand that the apparatus 141 and the device 151 also include other devices necessary for realizing normal operation. Also, those skilled in the art will appreciate that apparatus 141 and device 151 may also include hardware devices for performing other additional functions, according to particular needs. Furthermore, those skilled in the art will appreciate that apparatus 141 and device 151 may also include only those components necessary to implement embodiments of the present application, and need not include all of the components shown in fig. 11 or 12.

It is to be understood that the device 151 corresponds to the execution device 110 in fig. 4. Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The embodiment of the application also provides electronic equipment which comprises the detection device provided in the embodiment. In addition, this application embodiment still provides a detecting system, including the check out test set that provides in camera and the above-mentioned embodiment, the camera is used for acquireing the product picture that awaits measuring, and will the product picture that awaits measuring sends check out test set, check out test set still carries out according to the product picture that awaits measuring acquires the step of first multiscale characteristic diagram. Finally, the embodiment of the present application also provides a computer program product containing instructions, which when run on a computer, causes the computer to execute the above detection method.

It can be clearly understood by those skilled in the art that, for convenience and simplicity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions may be stored in a computer-readable storage medium if they are implemented in the form of software functional units and sold or used as separate products. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method of detection, comprising the steps of:

acquiring a first multi-scale feature map of a product to be detected, wherein the first multi-scale feature map comprises multi-dimensional features of the product to be detected in different scale spaces;

determining an abnormal value of each pixel point in the first multi-scale feature map according to the first multi-scale feature map and the reconstruction feature map, wherein the abnormal value of each pixel point is used for representing the product abnormal degree of the pixel point at the corresponding position of the product to be detected; the reconstructed feature map comprises multi-dimensional features of t second multi-scale feature maps of normal products;

and determining the detection result of the product to be detected according to the abnormal value of each pixel point in the first multi-scale feature map.

2. The method as claimed in claim 1, wherein the step of determining the detection result of the product to be detected according to the abnormal value of each pixel point in the first multi-scale feature map comprises the following steps:

and if the abnormal value of any pixel point in the first multi-scale characteristic diagram is larger than a preset abnormal value, determining that the product to be detected is a defective product.

3. The method as claimed in claim 1, wherein the determining the detection result of the product to be detected according to the abnormal value of each pixel point in the first multi-scale feature map comprises the following steps:

acquiring the number of pixel points of which the abnormal values are larger than preset abnormal values in the first multi-scale characteristic diagram;

and if the number is larger than the preset number, determining that the product to be detected is a defective product.

4. A method according to claim 2 or 3, characterized in that the method further comprises:

5. The method according to any one of claims 1 to 4, wherein the obtaining of the first multi-scale feature map of the product to be tested comprises the following steps:

acquiring a plurality of first characteristic diagrams of the product to be detected, wherein the sizes of the first characteristic diagrams are different;

performing size unified processing on the multiple first feature maps according to a preset size to obtain multiple second feature maps, wherein the multiple second feature maps have the same size;

splicing the plurality of second feature maps along the dimension of the channel to obtain a third multi-scale feature map;

applying different weights to different channels of the third multi-scale feature map based on an attention mechanism to obtain the first multi-scale feature map.

6. The method according to any one of claims 1 to 4, wherein the obtaining of the first multi-scale feature map of the product to be tested comprises the following steps:

acquiring a plurality of third feature maps of the product to be detected, wherein the third feature maps have different sizes and the same number of channels;

performing size unified processing on the multiple third feature maps according to a preset size to obtain multiple fourth feature maps, wherein the multiple fourth feature maps have the same size;

and applying different weights to the plurality of fourth feature maps based on an attention mechanism so as to obtain the first multi-scale feature map through weighted fusion.

7. The method according to any one of claims 1 to 4, wherein the obtaining of the first multi-scale feature map of the product to be tested comprises the following steps:

acquiring a plurality of fifth characteristic diagrams of the product to be detected, wherein the sizes of the plurality of fifth characteristic diagrams are different;

performing size unified processing on the fifth feature maps according to a preset size to obtain sixth feature maps, wherein the sixth feature maps have the same size;

respectively applying different weights to different channels of the sixth feature maps based on an attention mechanism to obtain seventh feature maps;

applying different weights to the seventh feature maps based on an attention mechanism, and splicing the seventh feature maps along a channel dimension to obtain the first multi-scale feature map.

8. The method according to any one of claims 1 to 7, wherein the step of determining the abnormal value of each pixel point in the first multi-scale feature map according to the first multi-scale feature map and the reconstructed feature map comprises the following steps:

respectively carrying out pixel-by-pixel decomposition on the first multi-scale feature map and the reconstruction feature map to obtain M first position features and M reconstruction feature centers;

and determining an abnormal value of a corresponding pixel point in the first multi-scale feature map according to the first position feature and the reconstructed feature center of the same pixel point position.

9. The method of claim 8, wherein the outliers comprise Euclidean or cosine distances between the first location feature of a pixel and the center of the corresponding reconstructed feature.

10. The method according to any one of claims 1 to 9, characterized in that the method further comprises the steps of:

acquiring the reconstruction characteristic diagram, wherein the acquiring of the reconstruction characteristic diagram comprises the following steps:

acquiring t second multi-scale feature maps of the normal product, wherein the second multi-scale feature maps contain multi-dimensional features of the normal product in different scale spaces;

respectively carrying out pixel-by-pixel decomposition on the t second multi-scale feature maps to obtain t multi-scale feature vector sets, wherein the multi-scale feature vector sets comprise multi-scale feature vectors corresponding to each pixel point of the second multi-scale feature maps, and the element number of the multi-scale feature vector sets is the pixel point number M of the second multi-scale feature maps;

respectively carrying out feature reconstruction on t multi-scale feature vectors with the same pixel point position in the t multi-scale feature vector sets to obtain a reconstructed feature center of a corresponding pixel point;

and aggregating according to the M reconstructed feature centers to obtain the reconstructed feature map.

11. The method according to any one of claims 1 to 10, characterized in that it further comprises the steps of:

outputting a thermodynamic diagram corresponding to the product to be tested according to the abnormal value; the thermodynamic diagram is marked by dark and light colors, and the lighter the color of a pixel point in the thermodynamic diagram is, the smaller an abnormal value corresponding to the pixel point is represented; the darker the color of the pixel points in the thermodynamic diagram, the larger the abnormal value corresponding to the pixel points is represented.

12. A detection device, comprising:

the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring a first multi-scale feature map of a product to be detected, and the first multi-scale feature map comprises multi-dimensional features of the product to be detected in different scale spaces;

the determining module is used for determining an abnormal value of each pixel point in the first multi-scale feature map according to the first multi-scale feature map and the reconstruction feature map, wherein the abnormal value of each pixel point is used for representing the product abnormal degree of the pixel point at the corresponding position of the product to be detected; wherein the reconstructed feature map comprises features of t second multi-scale feature maps of normal products; and determining the detection result of the product to be detected according to the abnormal value of each pixel point in the first multi-scale feature map.

13. The apparatus of claim 12, wherein the determining module is specifically configured to:

14. The apparatus of claim 12, wherein the determining module is specifically configured to:

acquiring the number of pixel points of which the abnormal values are larger than preset abnormal values in the first multi-scale feature map;

15. The apparatus of claim 13 or 14, wherein the determining module is further configured to:

16. The apparatus according to any one of claims 12 to 15, wherein the obtaining module is specifically configured to:

17. The apparatus according to any one of claims 12 to 15, wherein the obtaining module is specifically configured to:

performing size unified processing on the plurality of third feature maps according to a preset size to obtain a plurality of fourth feature maps, wherein the sizes of the plurality of fourth feature maps are the same;

applying different weights to the fourth feature maps based on an attention mechanism to obtain the first multi-scale feature map through weighted fusion.

18. The apparatus according to any one of claims 12 to 15, wherein the obtaining module is specifically configured to:

acquiring a plurality of fifth characteristic diagrams of the product to be detected, wherein the sizes of the fifth characteristic diagrams are different;

applying different weights to the seventh feature maps based on an attention mechanism, and splicing the seventh feature maps along the channel dimension to obtain the first multi-scale feature map.

19. The apparatus according to any one of claims 12 to 18, wherein the determining module is specifically configured to:

20. The apparatus of claim 19, wherein the outlier comprises a euclidean distance or a cosine distance between a first location feature of a pixel and a center of a corresponding reconstructed feature.

21. The apparatus according to any one of claims 12 to 20, wherein the obtaining module is further configured to obtain the reconstructed feature map;

the acquisition module is specifically configured to:

22. The apparatus of any one of claims 12 to 21, further comprising:

the output module is used for outputting the thermodynamic diagram corresponding to the product to be tested according to the abnormal value; the thermodynamic diagram is marked by dark and light colors, and the lighter the color of a pixel point in the thermodynamic diagram is, the smaller an abnormal value corresponding to the pixel point is represented; the darker the color of the pixel points in the thermodynamic diagram, the larger the abnormal value corresponding to the pixel points is represented.

23. A detection device comprising a processor and a memory, wherein the processor is coupled to the memory, wherein the memory is configured to store program code and wherein the processor is configured to invoke the program code to perform the detection method of any of claims 1 to 11.

24. A detection system, comprising a camera and the detection apparatus of claim 23, wherein the camera is configured to obtain a picture of a product to be detected and send the picture of the product to be detected to the detection apparatus, and the detection apparatus further performs the step of obtaining the first multi-scale feature map according to the picture of the product to be detected.

25. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which is executed by a processor to implement the detection method according to any one of claims 1 to 11.

26. A computer program product comprising instructions for causing a computer to perform the detection method of any one of claims 1 to 11 when the computer program product is run on a computer.