CN117218467A

CN117218467A - Model training method and related device

Info

Publication number: CN117218467A
Application number: CN202310121168.7A
Authority: CN
Inventors: 王昌安
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2023-02-03
Filing date: 2023-02-03
Publication date: 2023-12-12

Abstract

The application relates to the technical field of artificial intelligence, and provides a model training method and a related device, which are used for improving the quality inspection accuracy of industrial products, wherein the method comprises the following steps: and respectively extracting the characteristics of each image block in each training sample through a target model to obtain corresponding depth characteristics, respectively carrying out characteristic enhancement on each depth characteristic based on characteristic enhancement information to obtain corresponding target enhancement characteristics, then determining the detection sub-results of each image block based on each obtained target enhancement characteristic, and then obtaining the detection results of the training sample based on each determined detection sub-result and carrying out model parameter adjustment.

Description

Model training method and related device

Technical Field

The application relates to the technical field of artificial intelligence, and provides a model training method and a related device.

Background

With the development of industry and the rising of artificial intelligence (ArtificialIntelligence, AI), quality detection of industrial products by using AI technology has gradually become a main quality detection mode in manufacturing industry.

In the related art, AI quality inspection generally adopts the following modes:

mode one: detection methods based on target detection. The method needs to mark the areas containing quality defects in the training samples, and then target detection algorithms such as regional convolutional neural network (Region-based Convolutional Neural Networks, R-CNN), fast R-CNN, cascade R-CNN and the like are used for detecting the defective areas contained in each training sample.

However, since the area of the industrial product is large, the scanned image is usually a large image with a width of tens of thousands of pixels, so that the defect area is determined and marked from a pair of images with a width of tens of thousands of pixels, the marking efficiency is low, and particularly for a few unusual defects or defects with a small area, the problems of missing marking and the like are easy to generate, so that the model training effect is poor, and the detection accuracy is low.

Mode two: a detection method based on weak supervision target positioning. Such methods label the training samples for the presence of defects, and then use detection algorithms such as class activation maps (Class Activation Mapping, CAM) to determine whether defects are present in the image to be identified.

However, such methods have difficulty in processing a large image, resulting in limited expressive power of the model, and for a large image, it has been difficult to accurately locate a defective region, resulting in lower accuracy of location of the defect.

Disclosure of Invention

The embodiment of the application provides a model training method and a related device, which are used for improving the quality inspection accuracy of industrial products.

In a first aspect, an embodiment of the present application provides a model training method, where the method includes:

extracting features of each image block in each training sample through a target model to obtain corresponding depth features; wherein each training sample comprises: image blocks and real labels, each of which represents: whether a defect exists in the corresponding scanned image or not;

respectively carrying out feature enhancement on each depth feature based on the obtained feature enhancement information of each depth feature through the target model to obtain corresponding target enhancement features, and determining respective detection sub-results of each image block based on each obtained target enhancement feature;

based on the determined detection sub-results, obtaining a detection result of the training sample, determining model loss by combining the corresponding real label, and adjusting model parameters of the target model based on the model loss.

In a second aspect, an embodiment of the present application provides a model training apparatus, including:

the feature extraction unit is used for extracting features of each image block in each training sample through the target model to obtain corresponding depth features; wherein each training sample comprises: image blocks and real labels, each of which represents: whether a defect exists in the corresponding scanned image or not;

the feature enhancement unit is used for carrying out feature enhancement on each depth feature based on the obtained feature enhancement information of each depth feature through the target model to obtain corresponding target enhancement features, and determining each detection sub-result of each image block based on each obtained target enhancement feature;

and the parameter adjustment unit is used for obtaining the detection result of the training sample based on the determined detection sub-results, determining model loss by combining the corresponding real label, and adjusting the model parameters of the target model based on the model loss.

As a possible implementation manner, the feature enhancing unit is specifically configured to perform at least one of the following operations:

Respectively carrying out self-attention enhancement on each depth feature based on the self-attention information of each obtained depth feature through the target model, and taking the obtained corresponding first enhancement feature as a target enhancement feature;

respectively carrying out context enhancement on each depth feature based on the obtained context information of each depth feature through the target model, and taking the obtained corresponding second enhancement feature as a target enhancement feature;

and respectively carrying out self-attention enhancement on each depth feature based on the obtained self-attention information of each depth feature through the target model to obtain corresponding first enhancement features, respectively carrying out context enhancement on each first enhancement feature based on the obtained context information of each first enhancement feature to obtain corresponding second enhancement features, and taking the obtained second enhancement features as target enhancement features.

As a possible implementation manner, the feature enhancing unit is specifically configured to:

for each image block in the image blocks, the following operations are respectively executed:

determining the weighting coefficient corresponding to each other image block based on the vector similarity between the query vector corresponding to the image block and the key vector corresponding to each other image block;

And respectively carrying out weighting processing on the value vectors corresponding to the at least one other image based on the weighting coefficients corresponding to the at least one other image block, so as to obtain the context information corresponding to the image block.

As a possible implementation manner, the feature enhancing unit is specifically configured to, when obtaining the respective context information of each first enhancement feature based on the similarity between the query vector of each first enhancement vector and the key vector of at least one other first enhancement vector and in combination with the value vector of the at least one other first enhancement vector:

As a possible implementation manner, when determining the weighting coefficient corresponding to each of the at least one other image block based on the similarity between the query vector corresponding to one image block and the key vector corresponding to each of the at least one other image block, the feature enhancement unit is specifically configured to:

respectively calculating the similarity between the query vector corresponding to one image block and the key vector corresponding to at least one other image block;

and carrying out normalization processing on each calculated similarity to obtain the weighting coefficient corresponding to each other image block.

As a possible implementation manner, when the feature mapping is performed on each first enhancement vector to obtain the query vector, the key vector, and the value vector corresponding to each first enhancement vector, the feature enhancement unit is specifically configured to:

based on the first enhancement vectors, combining query weights to obtain query vectors corresponding to the first enhancement vectors;

based on the first enhancement vectors, combining key weights to obtain key vectors corresponding to the first enhancement vectors;

and based on the first enhancement vectors, combining the value weights to obtain the value vectors corresponding to the first enhancement vectors, wherein the query weights, the key weights and the value weights are model parameters.

As a possible implementation manner, each depth feature is a multi-channel feature, the self-attention enhancement is performed on each depth feature based on the obtained self-attention information of each depth feature, and when a corresponding first enhancement feature is obtained, the feature enhancement unit is specifically configured to:

determining channel weights corresponding to the multiple channels respectively based on the channel characteristics corresponding to the multiple channels respectively in the depth characteristics;

and weighting the characteristics corresponding to each of the multiple channels based on the channel weights corresponding to each of the multiple channels to obtain corresponding first enhancement characteristics.

As a possible implementation manner, when the detection result of the training sample is obtained based on the determined detection sub-results, the parameter adjustment unit is specifically configured to:

if at least one detection sub-result representing that the image block has a defect exists in the detection sub-results, determining that the detection result of the training sample is: the scanned image corresponding to the training sample has defects;

if the detection sub-results representing that the image block has defects do not exist in the detection sub-results, determining that the detection result of the training sample is: and the scanned image corresponding to the training sample has no defect.

As a possible implementation manner, the feature extracting unit is further configured to, before performing feature extraction on each image block included in the extracted training samples, obtain corresponding depth features, respectively:

dividing each image block group from each image block according to the set group number and the image block number contained in the group when the number of each image block exceeds the set number threshold;

the feature enhancement unit is specifically configured to, when obtaining the corresponding second enhancement feature, perform context enhancement on each first enhancement feature based on the obtained context information of each first enhancement feature, where the context information is used to obtain the corresponding second enhancement feature:

for each of the image block groups, the following operations are performed:

and determining the context information of the first enhancement features corresponding to the image blocks in one image block group, and carrying out context enhancement on the first enhancement features corresponding to the image blocks in the one image block group to obtain corresponding second enhancement features.

In a third aspect, an embodiment of the present application provides an electronic device, including a processor and a memory, where the memory stores a computer program that, when executed by the processor, causes the processor to perform the steps of any of the methods of the first aspect.

In a fourth aspect, an embodiment of the present application provides a computer readable storage medium comprising a computer program for causing an electronic device to perform the steps of any of the methods of the first aspect described above, when the computer program is run on the electronic device.

In a fifth aspect, embodiments of the present application provide a computer program product comprising a computer program stored in a computer readable storage medium, from which a processor of an electronic device reads and executes the computer program, causing the electronic device to perform the steps of any of the methods of the first aspect described above.

In the embodiment of the application, in the model training process, feature extraction is respectively carried out on each image block in each training sample to obtain corresponding depth features, then feature enhancement is respectively carried out on each depth feature based on the obtained feature enhancement information of each depth feature to obtain corresponding target enhancement features, and the respective detection sub-results of each image block are determined based on each obtained target enhancement feature.

According to the method and the device for achieving the image processing, due to the fact that model training is conducted based on the multi-example learning mode, whether defects exist in a scanned image or not is only needed to be determined, marking is not needed to be conducted on all defect areas in the scanned image, limitation on marking cost and model expression capacity is more relaxed, image processing is conducted in the multi-example learning mode based on image blocks, the requirements for calculating pressure and calculating video memory capacity of a display card are greatly reduced, learning efficiency is improved, and in addition, effective information contained in depth characteristics is enhanced through feature enhancement, detection accuracy of the image blocks is improved, and detection accuracy is further improved.

Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the application. The objectives and other advantages of the application will be realized and attained by the structure particularly pointed out in the written description and claims thereof as well as the appended drawings.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:

Fig. 1 is a schematic diagram of an application scenario provided in an embodiment of the present application;

FIG. 2 is a schematic diagram of a line scan camera capturing scan images according to an embodiment of the present application;

FIG. 3 is a schematic flow chart of a model training method according to an embodiment of the present application;

fig. 4 is a schematic diagram of an image segmentation method according to an embodiment of the present application;

FIG. 5 is a logic diagram of a model training process provided in an embodiment of the present application;

FIG. 6 is a schematic structural diagram of a quality inspection model according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a residual unit according to an embodiment of the present application;

FIG. 8 is a logic diagram of a self-attention enhancement provided in an embodiment of the present application;

FIG. 9 is a schematic diagram of a query vector, a key vector, and a value vector according to an embodiment of the present application;

FIG. 10 is a logic diagram of a context enhancement provided in an embodiment of the present application;

FIG. 11 is a schematic diagram of an image block set according to an embodiment of the present application;

FIG. 12 is a logic diagram of another quality inspection model training process provided in an embodiment of the present application;

FIG. 13 is a schematic flow chart of a quality inspection model application method provided in an embodiment of the present application;

FIG. 14 is a schematic structural diagram of a model training device according to an embodiment of the present application;

fig. 15 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the technical solutions of the present application, but not all embodiments. All other embodiments, based on the embodiments described in the present document, which can be obtained by a person skilled in the art without any creative effort, are within the scope of protection of the technical solutions of the present application.

Embodiments of the present application relate to artificial intelligence and machine learning techniques, designed primarily based on machine learning in artificial intelligence.

Artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.

The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning, automatic driving, intelligent traffic and other directions.

Machine learning is a multi-domain interdisciplinary, involving multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, and the like. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, confidence networks, reinforcement learning, transfer learning, induction learning, teaching learning, and the like.

Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, induction learning, and the like. An artificial neural network (Artificial Neural Network, ANN) abstracts the human brain neural network from the point of information processing, builds a simple model, and forms different networks according to different connection modes. The neural network is an operation model, which is formed by interconnecting a plurality of nodes (or neurons), each node represents a specific output function, called an excitation function (activation function), the connection between every two nodes represents a weighting value for the signal passing through the connection, called a weight, which is equivalent to the memory of an artificial neural network, the output of the network is different according to the connection mode of the network, the weight value and the excitation function are different, and the network itself is usually an approximation to a certain algorithm or function in nature, and can also be an expression of a logic strategy.

With research and progress of artificial intelligence technology, research and application of artificial intelligence technology are being developed in various fields, such as common smart home, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned, autopilot, unmanned, robotic, smart medical, smart customer service, car networking, autopilot, smart transportation, etc., and it is believed that with the development of technology, artificial intelligence technology will be applied in more fields and will be of increasing importance.

The scheme provided by the embodiment of the application relates to an artificial intelligence machine learning technology. According to the embodiment of the application, the quality inspection model is obtained by using training data and adopting a machine learning technology, and then in the actual quality inspection process of the industrial products, the scanned images of the industrial products are detected by using the quality inspection model obtained by learning, so that whether the industrial products have defects or not is determined. In particular, embodiments of the present application relate to a training portion and an application portion. In the training part, training a quality inspection model by using a machine learning technology, so that the quality inspection model is trained based on the training sample and the training method provided by the embodiment of the application, and model parameters are continuously adjusted by an optimization algorithm until the model converges; the application part is used for detecting the scanned image of each industrial product by using the quality inspection model trained in the training part to determine whether the industrial product has defects, and further can also determine which area has defects. In addition, it should be noted that the model training process in the embodiment of the present application may be offline training or online training, which is not limited herein, and only offline training is taken as an example for illustration.

In the related art, AI quality inspection generally adopts the following modes:

mode one: detection methods based on target detection. The method needs to mark the areas containing quality defects in the training samples, and then target detection algorithms such as R-CNN, fast R-CNN, cascade R-CNN and the like are used for detecting the defective areas contained in each training sample.

Mode two: a detection method based on weak supervision target positioning. Such methods label the training samples for the presence of defects, and then utilize detection algorithms such as CAM to determine whether defects exist in the image to be identified.

Based on the above description, the application is based on model training performed by a multi-example learning mode, so that only whether defects exist in a scanned image or not is determined, all defect areas in the scanned image are not required to be marked, the limitation on marking cost and model expression capacity is more relaxed, image processing is performed by a mode based on image blocks in multi-example learning, the requirements on computing pressure and computing video memory capacity of a display card are greatly reduced, learning efficiency is improved, and in addition, effective information contained in depth characteristics is enhanced by feature enhancement, so that detection accuracy of the image blocks is improved, and detection precision is further improved. The following description is made for some simple descriptions of application scenarios applicable to the technical solution of the embodiment of the present application, and it should be noted that the application scenarios described below are only used for illustrating the embodiment of the present application, but not limiting. In the specific implementation process, the technical scheme provided by the embodiment of the application can be flexibly applied according to actual needs.

Referring to fig. 1, a schematic diagram of an application scenario provided in an embodiment of the present application may include an acquisition device 101 and a computing device 102.

Wherein the acquisition device 101 is used for acquiring a scanned image of an industrial product. Acquisition device 101 may employ, but is not limited to, an area scan camera, a line scan camera. The area array scanning camera needs to expose a complete pixel matrix when acquiring images, and the line scanning camera constructs a final two-dimensional image by pixel lines one by one, so that the line scanning camera can be used for high-efficiency imaging scanning in the practical application process.

Referring to fig. 2, the line scan camera is required to maintain relative motion between the line scan camera and the industrial product, typically along a conveyor or rotational axis, when constructing a line scan image. When the industrial product moves to the scanning area of the line scan camera, the line scan camera collects a new pixel line, and each pixel line can contain 1024 to 16386 pixels. The line scan camera stores each pixel line through software on a vision processor or an image acquisition card, and then combines the stored pixel lines to construct a final two-dimensional image. Obviously, the image acquisition process of the line scan camera can be used to acquire images of discrete elements moving rapidly on a conveyor belt, or to acquire images of oversized objects. The line scan camera may also be used to capture defect imaging on parts with a large surface area, for example.

The computing device 102 is configured to determine whether the industrial product has a defect based on the scanned image acquired by the acquisition device 101. Computing device 102 includes, but is not limited to, a terminal device, a server, etc., computing-capable electronic device.

The terminal device may be a device owned by the user, such as a mobile phone, a tablet computer, a notebook computer, a desktop computer, an intelligent television, an intelligent vehicle-mounted device, an intelligent wearable device, and the like.

The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, namely a content distribution network (Content Delivery Network, CDN), basic cloud computing services such as big data and an artificial intelligence platform, but is not limited thereto.

The collection device 101 and the computing device 102 may be in direct or indirect communication connection via one or more networks. The network may be a wired network, or may be a Wireless network, for example, a mobile cellular network, or may be a Wireless-Fidelity (WIFI) network, or may be other possible networks, which embodiments of the present application are not limited in this respect.

It should be noted that, in the embodiment of the present application, the number of the collection devices 101 may be one or more, and similarly, the number of the collection devices 101 may be one or more, that is, the number of the collection devices 101 or the computing devices 102 is not limited.

In one embodiment, taking a quality inspection model as an example, the model training process and the model application process may be performed by a server, so as to quickly implement quality inspection by using computing resources of the server, or may be performed by a terminal device, or may be performed by the server and the terminal device together, which is not limited thereto. For example, the server acquires a training sample set, and adopts the training sample set to perform iterative training on the quality inspection model to be trained to obtain a trained quality inspection model.

In one possible application scenario, the related data (such as feature vectors, etc.) and model parameters involved in the embodiments of the present application may be stored using cloud storage (closed storage) technology. Cloud storage is a new concept which extends and develops in the concept of cloud computing, and a distributed cloud storage system refers to a storage system which integrates a large number of storage devices (or called storage nodes) of different types in a network through application software or application interfaces to cooperatively work and jointly provides data storage and service access functions for the outside through functions of cluster application, grid technology, a distributed storage file system and the like.

Of course, the method provided by the embodiment of the present application is not limited to the application scenario shown in fig. 1, but may be used in other possible application scenarios, and the embodiment of the present application is not limited. The functions that can be implemented by each device in the application scenario shown in fig. 1 will be described together in the following method embodiments, which are not described in detail herein.

The method flows provided in the embodiments of the present application may be executed by a server or a terminal device, or may be executed by both the server and the terminal device, and the description is mainly given here by way of example of the execution of the server.

Referring to fig. 3, a flow chart of a model training method provided in an embodiment of the present application is shown, the method is applied to a server, and a target model is taken as a quality inspection model for illustration, and a specific flow chart of the model training method is as follows:

s301, acquiring a training sample set, wherein each training sample comprises: image blocks and real labels, each of which represents: corresponding to whether a defect exists in the scanned image.

Among them, industrial products include, but are not limited to, electronic components, machines, instruments, textiles, and the like.

In the embodiment of the application, because the scale of the line scan image is larger, in order to reduce the data volume of processing and lighten the calculation pressure, the line scan image is subjected to image blocking to obtain corresponding image blocks.

When the image is segmented, as a possible implementation manner, the line scan image can be segmented in a non-overlapping manner, but considering that a defect area in the line scan image may exist in a plurality of image blocks, the non-overlapping segmentation may cause discontinuous boundaries of the image blocks, and the image blocks cannot contain a complete defect area, so that the accuracy of a detection result of the image blocks is affected, and the detection accuracy of the whole line scan image is further affected. Specifically, in the embodiment of the present application, the scanned image may be uniformly segmented into image blocks having overlapping areas according to the set image block size, where the size of the overlapping areas in each image block is the set area size.

For example, as shown in fig. 4, the size of the image block is set to 256×256 pixels, the size of the scanned image is 25000×16000 pixels, and the scanned image is uniformly segmented into image blocks having an overlapping region of 64 pixels according to 256×256 pixels, and each image block includes image block 1 to image block 9.

In embodiments of the present application, each scanned image may be referred to as a packet, and each image block may be referred to as each instance. In multi-instance learning (MILs), taking two classes as examples, a packet contains multiple instances, if all the instances are marked negative (negative), then the packet is not defective, otherwise the packet is defective.

Assuming that x= { X1, X2, … … xn } is used to represent a scanned image, where X1, X2, … … xn represent each image block, Y represents the real label of the scanned image, yi represents the real label corresponding to image block xi, where xi is any image block of X1, X2, … … xn, then Y can be obtained by using formula (1):

wherein, when the value of yi is +1, it indicates that there is a defect in the image block xi, when the value of yi is-1, it indicates that there is no defect in the image block xi, when the value of Y is +1, it indicates that there is a defect in the scanned image, and when the value of Y is-1, it indicates that there is no defect in the scanned image. When any yi has a value of +1, Y has a value of +1, and when all y1, y2 and … … yn have a value of-1 and Y has a value of-1, that is, when the scanned image contains an image block with a defect, the scanned image is determined to have a defect, and when the scanned image does not contain an image block with a defect, the scanned image is determined to have no defect.

For example, still referring to fig. 4, assume that each image block divided by the scanned image 1 includes: image block 1, image blocks 2, … …, image block 9, wherein image block 5 contains a defect, then the scanned image 1 is determined to be defective.

In the embodiment of the application, after the training data preparation is completed, the constructed model can be trained by utilizing the training data.

In one embodiment, during the iterative training process, all training samples are divided into designated batches and training is performed based on the training samples of each of the sub-batches. Specifically, according to the structure of the quality inspection model, after super parameters such as batch (batch), iteration number (epoch), learning rate (learning rate) and the like are set, training is started, and finally the quality inspection model is obtained.

For example, the batch of the evaluation model is set to 128, and the epoch is set to 1000,learning rate to 0.0001, that is, the training is iterated 1000 times, and each iteration divides the training sample into 128 batches for learning. Of course, the training parameters herein are just one possible example, and may be adjusted as needed in practical situations.

For ease of description, the processing of one training sample during each iteration will be described below as an example.

S302, performing iterative training on the quality inspection model by adopting a training sample set to obtain a trained quality inspection model.

In the embodiment of the application, for each training sample in the training sample set, depth feature extraction, self-attention enhancement and mutual attention enhancement are respectively carried out, and then a detection result is determined based on the enhancement features. Mutual awareness enhancement may also be referred to as context enhancement.

Referring to fig. 5, for a scanned image, feature extraction is performed on each image block through a quality inspection model to obtain corresponding depth features, then, based on self-attention information of each obtained depth feature, self-attention enhancement is performed on each depth feature to obtain corresponding first enhancement features, then, based on context information of each obtained first enhancement feature, context enhancement is performed on each first enhancement feature to obtain corresponding second enhancement features, further, based on each obtained second enhancement feature, respective detection sub-results of each image block are determined, and based on each determined detection sub-result, detection results of a training sample are obtained.

Specifically, referring to fig. 3, in each iteration, the following operations are performed:

And S3021, extracting features of each image block contained in the extracted training sample.

Referring to fig. 6, in an embodiment of the present application, a quality inspection model may include a feature extraction module, and may further include one or more of a self-attention module and a mutual-attention module. The image processing device comprises a feature extraction module, a self-attention module, a mutual-attention module and a corresponding second enhancement feature, wherein the feature extraction module is used for extracting features of an image block to obtain corresponding depth features, the self-attention module is used for carrying out self-attention enhancement on each depth feature to obtain corresponding first enhancement features, and the mutual-attention module is used for carrying out context enhancement on each first enhancement feature based on respective context information of each obtained first enhancement feature to obtain corresponding second enhancement features. And then, based on the obtained second enhancement features, determining the detection sub-results of the image blocks.

The depth features may be extracted high resolution feature maps with strong semantic information. In an embodiment of the application, the feature extraction module may employ, but is not limited to, a residual neural network (Residual neural Network, resNet), such as ResNet50. It should be noted that the feature extraction module may also use other types of networks, which are not limited thereto.

Taking ResNet50 as an example, resNet50 is the idea of adding residual error learning in a traditional convolutional neural network, solves the problems of gradient dispersion and precision reduction in a deep network, ensures the precision and controls the speed as the network can be deeper and deeper. Resnet50 is a large optic neural network structure built on the basis of a residual network.

ResNet50 is comprised of 5 blocks (blocks), each block is comprised of residual units connected in tandem, for each residual unit, channel downsampling is performed first, then feature transformation is performed using a 3x3 convolution, then the original channel size is restored by channel upsampling, and finally the output features are obtained by connecting with the input residuals. Referring to fig. 7, assuming that the input image is x, the nonlinear function (relu) of F (x) is output through the weight layer (weight layer), and finally the output H (x) =f (x) +x, the nonlinear transformation of F (x) +x can still be performed, and the network is converted into the residual function F (x) =h (x) -x.

In the embodiment of the application, each image block can be respectively input into the feature extraction module to perform feature extraction, so as to obtain corresponding depth features.

S3022, based on the obtained characteristic enhancement information of each depth characteristic, respectively carrying out characteristic enhancement on each depth characteristic to obtain a corresponding target enhancement characteristic, and based on the obtained target enhancement characteristic, determining each detection sub-result of each image block.

In the embodiment of the application, in order to improve the detection accuracy of the image block and further improve the detection accuracy, the self-attention enhancement can be performed on the depth features to enhance the features of relatively important channels in each depth feature, the context enhancement can be performed on the depth features to acquire global context information of the image layer, and the self-attention enhancement and the context enhancement can be performed on the depth features at the same time.

When the self-attention is enhanced for the depth features, the self-attention is enhanced for each depth feature based on the self-attention information of each obtained depth feature through a self-attention module, and the obtained corresponding first enhancement feature is taken as a target enhancement feature.

And when the context enhancement is performed on the depth features, respectively performing the context enhancement on the depth features based on the obtained respective context information of the depth features through a mutual attention module, and taking the obtained corresponding second enhancement features as target enhancement features.

When self-attention enhancement and context enhancement are performed on the depth features at the same time, through the self-attention module, self-attention enhancement can be performed on each depth feature based on the obtained self-attention information of each depth feature, so as to obtain corresponding first enhancement features, and through the mutual-attention module, context enhancement is performed on each first enhancement feature based on the obtained context information of each first enhancement feature, so as to obtain corresponding second enhancement features, and the obtained second enhancement features are used as target enhancement features. It should be noted that, the context enhancement may be performed first and then the self-attention enhancement may be performed, which is not limited and will not be described herein.

Hereinafter, description will be given by taking only an example in which depth feature self-attention enhancement and context enhancement are performed simultaneously for depth features.

In the embodiment of the application, after each obtained depth feature, each obtained depth feature can be respectively input into the self-attention module to perform self-attention enhancement, so as to obtain corresponding self-attention information, and further, based on the self-attention information, the depth feature is combined to obtain corresponding first enhancement feature.

Specifically, each depth feature is a multi-channel feature, based on the obtained self-attention information of each depth feature, each depth feature is self-attention enhanced, and when a corresponding first enhancement feature is obtained, the following manner may be adopted, but is not limited to:

and determining the channel weight corresponding to each of the multiple channels based on the channel characteristics corresponding to each of the multiple channels in each depth characteristic, and then carrying out weighting processing on the characteristics corresponding to each of the multiple channels based on the channel weight corresponding to each of the multiple channels to obtain corresponding first enhancement characteristics.

The self-attention information is used for representing the correlation between the channel characteristics in the corresponding depth characteristics, the self-attention information can contain the channel weights corresponding to the channel characteristics, and the relative importance degree of the channel characteristics can be measured through the channel weights. The channel weights corresponding to the channel features may also be understood as channel weights corresponding to the channels.

Based on the channel characteristics corresponding to the multiple channels in the depth characteristics, when the channel weights corresponding to the multiple channels are determined, the depth characteristics can be respectively input into the self-attention module to obtain self-attention information corresponding to the depth characteristics, and each self-attention information contains the channel weights corresponding to the channels in the corresponding depth characteristics.

The self-attention module is used for carrying out channel attention enhancement on each depth feature so as to strengthen relatively important channel features in the depth features and highlight the most discriminative part in the current image block. The self-attention module may be implemented by SE-Net, ECA-Net, self-attention (self-attention), and the like, which are not limited thereto.

In one possible channel attention network, the channel attention network comprises: the device comprises a pooling layer and a weight determining layer, wherein the pooling layer is used for carrying out global pooling processing on the depth characteristics, and the weight determining layer is used for determining channel weights of all channels in the depth characteristics.

The weight determination layer may include, but is not limited to, a first fully connected layer, a second fully connected layer, and a sigmoid layer. Taking the example that the weight determining layer comprises a first full connection layer, a ReLU layer, a second full connection layer and a sigmoid layer, assuming that the characteristic dimension of the depth characteristic is H×W×C, H, W, C is respectively identified by length, width and channel number, the values of H and W can be the same, inputting the depth characteristic of H×W×C into the pooling layer to obtain the depth characteristic of 1×1×C, then inputting the depth characteristic of 1×1×C into the first full connection layer to obtain the depth characteristic of 1×1×C/r, realizing channel downsampling, wherein the value of r can be but is not limited to 4, then, the depth feature of 1×1×c/r is input to the second fully connected layer to obtain the depth feature of 1×1×c/r, the second fully connected layer may also be called a nonlinear layer (ReLU), then, the depth feature of 1×1×c/r is input to the third fully connected layer to obtain the depth feature of 1×1×c, that is, an un-normalized weighting coefficient is obtained, and finally, the depth feature of 1×1×c output by the third fully connected layer is input to the sigmoid layer, and the weighting coefficient is normalized to 0-1 to obtain the weighting coefficient. The feature values of the different channels may then be modulated by multiplication with the original instance features (i.e., depth features) to obtain corresponding first enhancement features.

After the channel weights are obtained, the channel weights can be used to adjust the corresponding depth features to obtain the corresponding first enhancement features. Specifically, the obtained self-attention information may be multiplied by the corresponding depth features, respectively, to obtain corresponding first enhancement features.

For example, referring to fig. 8, the depth feature 1 corresponding to the image block 1 is input to the self-attention module, the self-attention information a1 corresponding to the image block 1 is obtained, the self-attention information a1 corresponding to the image block 1 is based on the self-attention information a1 corresponding to the image block 1, the first enhancement feature 1 corresponding to the image block 1 is obtained in combination with the depth feature corresponding to the image block 1, the depth feature 2 corresponding to the image block 2 is input to the self-attention module, the self-attention information a2 corresponding to the image block 2 is obtained, the first enhancement feature 2 corresponding to the image block 2 is obtained in combination with the depth feature corresponding to the image block 2 based on the self-attention information a2 corresponding to the image block 2, the depth feature 9 corresponding to the image block 9 is similarly input to the self-attention module, the self-attention information a9 corresponding to the image block 9 is obtained, and the first enhancement feature 9 corresponding to the image block 9 is obtained in combination with the depth feature corresponding to the image block 9 based on the self-attention information a9 corresponding to the image block 9.

According to the implementation mode, the relative importance degree of each channel can be comprehensively evaluated according to the characteristics of the whole image block through self-attention enhancement, so that the characteristic values of different channels are modulated, redundant characteristics can be removed, self-adaptive enhancement of the characteristics of each image block is realized, the most discriminative part of the current image block can be more focused, excessive background information is prevented from being contained in the characteristics of the image block, more compact representation can be obtained, and the final defect detection rate is finally improved. Particularly in industrial quality inspection scenes, in general, the area occupied by a defect area in a scanned image is very small, and most of the defect area is a normal background area, so that the defect detection rate can be greatly improved by performing characteristic self-adaptive enhancement through a self-attention mechanism.

After each first enhancement feature is obtained, the context enhancement can be performed on each first enhancement feature based on the respective context information of each obtained first enhancement feature, so as to obtain a corresponding second enhancement feature, and the respective detection sub-result of each image block can be determined based on each obtained second enhancement feature.

In the embodiment of the application, each piece of context information comprises: each of the first enhancement features includes respective contextual feature representations of other first enhancement vectors other than the corresponding first enhancement feature, each contextual feature representation including a feature of the corresponding other first enhancement vector that is associated with the first enhancement feature, that is, each contextual information is used to characterize a correlation between the corresponding first enhancement feature and the other first enhancement vector. Wherein the other first enhancement vectors include all or part of the first enhancement features other than the corresponding first enhancement features.

Specifically, the respective context information of each first enhancement feature may be obtained by:

performing feature mapping on each first enhancement vector to obtain a query vector, a key vector and a value vector which are respectively corresponding to each first enhancement vector, wherein the query vector, the key vector and the value vector are used for representing feature representation of the corresponding first enhancement vector in different feature spaces;

based on the similarity between the query vector of each first enhancement vector in the first enhancement vectors and the key vector of at least one other first enhancement vector, the context information of each first enhancement feature is obtained by combining the value vector of at least one other first enhancement vector.

In the embodiment of the present application, when feature mapping is performed on each first enhancement vector to obtain the query vector, the key vector and the value vector corresponding to each first enhancement vector, the following manner may be adopted, but is not limited to:

based on each first enhancement vector, combining query weights to obtain query vectors corresponding to each first enhancement vector;

based on each first enhancement vector, combining key weights to obtain key vectors corresponding to each first enhancement vector;

and combining the value weights based on the first enhancement vectors to obtain the value vectors corresponding to the first enhancement vectors.

The query weight, the key weight and the value weight are model parameters and can be obtained through model training.

For example, referring to fig. 9, key (K), query (Q), value (V) are used to represent a query vector, a key vector, and a value vector, wk, wq, and wv are used to represent a query weight, a key weight, and a value weight, respectively, based on the first enhancement vector 1, the query vector Q1 corresponding to the first enhancement vector 1 is obtained in combination with the query weight, the key vector K1 corresponding to the first enhancement vector 1 is obtained in combination with the key weight, and the value vector V1 corresponding to the first enhancement vector 1 is obtained in combination with the value weight based on the first enhancement vector 1, and similarly, based on the first enhancement vector 2, the query weight Q2, the key vector K2, and the value vector V2 corresponding to the first enhancement vector 2 are obtained in combination with the query weight, the key weight, and the value weight, the query weight Q3, the key vector K3, and the value vector V3 corresponding to the first enhancement vector 3 are obtained in combination with the query weight, respectively.

After obtaining the query vector, the key vector and the value vector corresponding to each first enhancement vector, determining the context information of each first enhancement feature based on the query vector, the key vector and the value vector corresponding to each first enhancement vector. Specifically, for each of the image blocks, the following operations are performed, respectively:

Determining the weighting coefficient corresponding to each other image block based on the vector similarity between the query vector corresponding to one image block and the key vector corresponding to each other image block;

The at least one other image block may refer to all but one image block in each image block, or may refer to a part of each image block except one image block, which is not limited thereto. It should be noted that in the embodiment of the present application, the vector similarity may be cosine similarity, but is not limited thereto.

In some embodiments, for each image block, the obtained context information may be added pixel-by-pixel with the corresponding first enhancement feature, resulting in a corresponding respective second enhancement feature. Specifically, the obtained context information may be sequentially added pixel by pixel to the corresponding first enhancement feature to obtain the corresponding second enhancement feature, or may be added pixel by pixel to the corresponding first enhancement feature to obtain the corresponding second enhancement feature after the obtained context information is added, which is not limited.

For example, referring to fig. 10, key (K), query (Q), value (V) are used to represent query vectors, key vectors, and value vectors, respectively, each image block includes image block 1 to image block 9, and for image block 1, first, based on the query vector Q1 corresponding to image block 1 and the vector similarity between key vectors K2 corresponding to image block 2, a weighting coefficient b2 corresponding to image block 2 is determined, and based on the weighting coefficient b2 corresponding to image block 2, the value vector V2 corresponding to image block 2 is weighted to obtain context information corresponding to image block 2, and the context information corresponding to image block 2 is added to the query vector Q1 corresponding to image block 1 to obtain enhancement feature 1, and secondly, based on the vector similarity between the query vector Q1 corresponding to image block 1 and the key vector K3 corresponding to image block 3, the weighting coefficient b3 corresponding to image block 3 is determined, and based on the weighting coefficient b3 corresponding to image block 3, and the value vector V2 corresponding to image block 2 is weighted to obtain context information corresponding to image block 2, and the query vector Q1 corresponding to image block 1 is added pixel by pixel, and then the image block 1 is added to obtain enhancement feature 1, and image block 9 is added to image block 1, and image block 1 is added pixel by pixel to image 1 to obtain feature 1.

Specifically, when determining the weighting coefficients corresponding to at least one other image block, a query vector corresponding to one image block may be calculated respectively, and similarity between the query vector and a key vector corresponding to at least one other image block may be calculated respectively, and then normalization processing may be performed on each calculated similarity to obtain the weighting coefficient corresponding to at least one other image block.

Wherein the normalization process may be implemented using, but is not limited to, softmax.

For example, still referring to fig. 10, a query vector Q1 corresponding to the image block 1 is calculated, and a vector similarity between a key vector K2 corresponding to the image block 2 is calculated, and then, a normalization process is performed on the vector similarity between Q1 and the key vector K2 corresponding to the image block 2 by softmax, so as to obtain a weighting coefficient b2 corresponding to the image block 2.

Through the implementation mode, the features of the image blocks are integrated through the mutual attention mechanism among the image blocks, and the features of each image block are further enhanced, so that the features of a single image block can embody context information in a larger range, and the detection precision is improved.

S3023, based on the determined detection sub-results, obtaining detection results of the training samples, and determining model loss by combining the corresponding real labels.

Specifically, based on the determined detection sub-results, there are, but not limited to, the following cases:

case one: if at least one detection sub-result representing that the image block has defects exists in each detection sub-result, determining that the detection result of the training sample is: the scanned image corresponding to the training sample has defects.

And a second case: if the detection sub-results representing that the image block has defects do not exist in the detection sub-results, determining that the detection result of the training sample is: the scanned image corresponding to the training sample has no defect.

It should be noted that, in the embodiment of the present application, the presence of a defect in the scanned image may be understood as a defect in the corresponding industrial product, and the presence of a defect in the image block may be understood as a defect in the corresponding area of the industrial product.

For example, assuming that the detection sub-result of image block 1 characterizes the defect in image block 1, the detection result of the training sample is: the scanned image corresponding to the training sample has defects.

For another example, assuming that the detection sub-results of image block 1 to image block 9 all represent that there is no defect in the corresponding image block, the detection result of the training sample is: the scanned image corresponding to the training sample has no defect.

When determining the model loss based on the detection result of the training sample in combination with the corresponding real label, the model loss may employ cross entropy loss, large-Margin Softmax Loss, and the like, but is not limited thereto.

Specifically, taking the cross entropy loss as an example only, the model loss L may be calculated in the following manner, but is not limited to:

wherein p is _i Indicating the confidence that the ith image block contains defects, i.e., the detection sub-result of the ith image block, y _i Representing the true label of the ith image block, N being the number of image blocks.

p _i Can be calculated by the following formula (3):

p _i ＝softmax(f _i ) Formula (3)

Wherein f _i Is a feature obtained by performing linear change on the second enhancement feature corresponding to the ith image block, and softmax is also called a normalized exponential function and is used for calculating the confidence that the ith image block contains the defect.

S3024, judging whether the quality inspection model meets the convergence condition.

In the embodiment of the present application, the convergence condition may include at least one of the following conditions:

(1) The total loss value is not greater than a preset loss value threshold.

(2) The iteration number reaches a preset number upper limit value.

If the determination result in S3025 is no, the quality inspection model is parameter-adjusted based on the model loss.

If the conditions are met, the evaluation model is determined to meet the convergence conditions, the training is ended, if the quality inspection model is determined not to meet the convergence conditions, the model parameters are required to be continuously adjusted, and the adjusted quality inspection model is utilized to enter the next training process, namely, the step S3031 is executed.

In the embodiment of the application, the quality inspection model can be evaluated by adopting an AUC (area under an ROC curve), when the AUC value meets a certain condition, the trained quality inspection model is output, otherwise, the quality inspection model can be retrained, wherein the AUC is used for the index of the performance of the quality inspection model, and the higher the value, the better the performance of the model.

In some embodiments, the image blocks included in the scanned image may be grouped in consideration of the scanned image possibly being too large.

Specifically, when the number of image blocks exceeds a set threshold, each image block group is divided from each image block according to the set number of groups and the number of image blocks included in the group.

For example, referring to fig. 11, the threshold value of the number is set to 1000, the number of sets is set to 8, the number of image blocks included in the set is set to 16, and when the number of image blocks exceeds 1000, each image block group is divided from each image block according to the set number of sets and the number of image blocks included in the set, and each image block group includes: image groups 1 to 8, wherein each image block group contains 16 image blocks.

In some embodiments, after obtaining each image block group, extracting features of each image block included in each image block group, obtaining a corresponding depth feature, performing self-attention enhancement on each depth feature based on self-attention information of each obtained depth feature, obtaining a corresponding first enhancement feature, determining context information of a first enhancement feature corresponding to each image block included in each image block, and performing context enhancement on a first enhancement feature corresponding to each image block included in each image block, so as to obtain a corresponding second enhancement feature.

For example, referring to fig. 12, image block 1, image block 2, and image block 3 are included in image block group 1, image block 4, image block 5, and image block 6 are included in image block group 2, and image block 7, image block 8, and image block 9 are included in image block group 3. Firstly, respectively extracting features of an image block 1, an image block 2 and an image block 3 aiming at an image block group 1 to obtain depth features corresponding to the image block 1, the image block 2 and the image block 3, respectively carrying out self-attention enhancement on the depth features based on self-attention information of the obtained depth features to obtain first enhancement features corresponding to the image block 1, the image block 2 and the image block 3, then determining context information of the first enhancement features corresponding to the image block 1, the image block 2 and the image block 3, carrying out context enhancement on the first enhancement features corresponding to the image block 1, the image block 2 and the image block 3, and obtaining second enhancement features corresponding to the image block 1, the image block 2 and the image block 3; for the image block group 2, respectively extracting the characteristics of the image blocks 4, 5 and 6 to obtain the depth characteristics corresponding to the image blocks 4, 5 and 6, respectively carrying out self-attention enhancement on the depth characteristics based on the self-attention information of each obtained depth characteristic to obtain the first enhancement characteristics corresponding to the image blocks 4, 5 and 6, then determining the context information of the first enhancement characteristics corresponding to the image blocks 4, 5 and 6, and carrying out context enhancement on the first enhancement characteristics corresponding to the image blocks 4, 5 and 6 to obtain the second enhancement characteristics corresponding to the image blocks 4, 5 and 6; similarly, for the group of image blocks 3, a second enhancement feature is obtained for each of the image blocks 4, 5, 6. Further, a respective detection sub-result of each image block may be determined based on each obtained second enhancement feature, a detection result may be obtained based on each determined detection sub-result, and a model loss may be determined in combination with a corresponding real label.

Through the implementation mode, for the line scanning image with the extremely large scale, the processing speed of the line scanning image can be remarkably improved, and meanwhile, the detection precision of the line scanning image is ensured.

In the embodiment of the application, after model training is finished, quality detection can be performed based on the quality inspection model obtained by training.

Referring to fig. 13, a schematic flow chart of an industrial product quality inspection method based on a quality inspection model according to an embodiment of the present application is shown, and the specific flow chart is as follows:

s1301, acquiring a scanning image to be detected of the industrial product to be detected, and dividing the scanning image to be detected to obtain each image block. See S301 specifically, and will not be described in detail here.

For example, the scanned image is divided to obtain image block groups 1 to 8, wherein image blocks 1, … …, 16 are included in image block group 1, and image blocks 17, … …, 32, and the like are included in image block group 2.

And S1302, respectively extracting the characteristics of each image block to obtain corresponding depth characteristics, and respectively carrying out self-attention enhancement on each depth characteristic based on the self-attention information of each obtained depth characteristic to obtain corresponding first enhancement characteristics. See in particular S3021, which will not be described in detail herein.

And S1303, respectively carrying out context enhancement on each first enhancement feature based on the obtained context information of each first enhancement feature to obtain a corresponding second enhancement feature. See in particular S3022, which will not be described in detail herein.

S1304, based on the obtained second enhancement features, determining respective detection sub-results of each image block, and based on the determined detection sub-results, obtaining detection results of the scanning image to be detected. See in particular S3023, which will not be described in detail herein.

It should be noted that, in the embodiment of the present application, the defect area including the defect in the scanned image to be detected may also be determined by each detection sub-result. Specifically, an image block corresponding to the detection sub-result representing the defect is used as a defect area.

Based on the same inventive concept, the embodiment of the application provides a model training device. As shown in fig. 14, which is a schematic structural diagram of the model training apparatus 1400, may include:

the feature extraction unit 1401 is configured to perform feature extraction on each image block in each training sample through the target model, so as to obtain a corresponding depth feature; wherein each training sample comprises: image blocks and real labels, each of which represents: whether a defect exists in the corresponding scanned image or not;

A feature enhancement unit 1402, configured to perform feature enhancement on each depth feature based on the obtained feature enhancement information of each depth feature through the target model, obtain a corresponding target enhancement feature, and determine each detection sub-result of each image block based on each obtained target enhancement feature;

a parameter adjustment unit 1403, configured to obtain a detection result of the training sample based on each determined detection sub-result, determine a model loss in combination with a corresponding real label, and perform model parameter adjustment on the target model based on the model loss.

As a possible implementation manner, the feature enhancing unit 1402 is specifically configured to perform at least one of the following operations:

As a possible implementation manner, the feature enhancing unit 1402 is further configured to:

performing feature mapping on each first enhancement vector to obtain a query vector, a key vector and a value vector which are respectively corresponding to each first enhancement vector, wherein the query vector, the key vector and the value vector are used for representing feature representations of the corresponding first enhancement vector in different feature spaces;

and obtaining the respective context information of each first enhancement feature by combining the value vector of at least one other first enhancement vector based on the similarity between the query vector of each first enhancement vector in each first enhancement vector and the key vector of at least one other first enhancement vector.

As a possible implementation manner, when the similarity between the query vector of each first enhancement vector and the key vector of at least one other first enhancement vector is based on the first enhancement vectors, and the value vector of the at least one other first enhancement vector is combined, the feature enhancement unit 1402 is specifically configured to:

As a possible implementation manner, when determining the weighting coefficient corresponding to each of the at least one other image block based on the similarity between the query vector corresponding to one image block and the key vector corresponding to each of the at least one other image block, the feature enhancement unit 1402 is specifically configured to:

As a possible implementation manner, when the feature mapping is performed on each first enhancement vector to obtain the query vector, the key vector, and the value vector corresponding to each first enhancement vector, the feature enhancement unit 1402 is specifically configured to:

As a possible implementation manner, each depth feature is a multi-channel feature, the self-attention enhancement is performed on each depth feature based on the obtained self-attention information of each depth feature, and when a corresponding first enhancement feature is obtained, the feature enhancement unit 1402 is specifically configured to:

As a possible implementation manner, when the detection result of the training sample is obtained based on the determined detection sub-results, the parameter adjustment unit 1403 is specifically configured to:

As a possible implementation manner, the feature extraction unit 1401 is further configured to, before performing feature extraction on each image block included in the extracted training samples, obtain a corresponding depth feature, respectively:

the training unit 1402 is specifically configured to, when the obtained context information of each first enhancement feature is based on the obtained context information of each first enhancement feature, perform context enhancement on each first enhancement feature, and obtain a corresponding second enhancement feature:

for each of the image block groups, the following operations are performed:

For convenience of description, the above parts are described as being functionally divided into modules (or units) respectively. Of course, the functions of each module (or unit) may be implemented in the same piece or pieces of software or hardware when implementing the present application.

The specific manner in which the respective units execute the requests in the apparatus of the above embodiment has been described in detail in the embodiment concerning the method, and will not be described in detail here.

Those skilled in the art will appreciate that the various aspects of the application may be implemented as a system, method, or program product. Accordingly, aspects of the application may be embodied in the following forms, namely: an entirely hardware embodiment, an entirely software embodiment (including firmware, micro-code, etc.) or an embodiment combining hardware and software aspects may be referred to herein as a "circuit," module "or" system.

Based on the same inventive concept, the embodiment of the application also provides electronic equipment. In one embodiment, the electronic device may be a server or a terminal device. Referring to fig. 15, which is a schematic structural diagram of one possible electronic device provided in an embodiment of the present application, in fig. 15, an electronic device 1500 includes: a processor 1510, and a memory 1520.

Wherein the memory 1520 stores a computer program executable by the processor 1510, by executing instructions stored in the memory 1520, can perform the steps of the quality inspection model training method described above.

Memory 1520 may be volatile memory (RAM), such as random-access memory (RAM); the Memory 1520 may also be a nonvolatile Memory (non-volatile Memory), such as Read-Only Memory (ROM), flash Memory (flash Memory), hard disk (HDD) or Solid State Drive (SSD); or memory 1520 is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to such. Memory 1520 may also be a combination of the above.

The processor 1510 may include one or more central processing units (central processing unit, CPU) or digital processing units, or the like. Processor 1510 is configured to implement the quality inspection model training method described above when executing the computer program stored in memory 1520.

In some embodiments, processor 1510 and memory 1520 may be implemented on the same chip, or they may be implemented separately on separate chips in some embodiments.

The particular connection medium between the processor 1510 and the memory 1520 is not limited in this embodiment. In the embodiment of the present application, the processor 1510 and the memory 1520 are connected by a bus, which is depicted in fig. 15 by a bold line, and the connection manner between other components is only schematically illustrated, but not limited thereto. The buses may be divided into address buses, data buses, control buses, etc. For ease of description, only one thick line is depicted in fig. 15, but only one bus or one type of bus is not depicted.

Based on the same inventive concept, an embodiment of the present application provides a computer readable storage medium comprising a computer program for causing an electronic device to perform the steps of the quality inspection model training method described above, when the computer program is run on the electronic device. In some possible embodiments, aspects of the quality inspection model training method provided by the present application may also be implemented in the form of a program product comprising a computer program for causing an electronic device to perform the steps of the quality inspection model training method described above when the program product is run on the electronic device, e.g. the electronic device may perform the steps as shown in fig. 3.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, a RAM, a ROM, an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (Compact Disk Read Only Memory, CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The program product of embodiments of the present application may take the form of a CD-ROM and comprise a computer program and may run on an electronic device. However, the program product of the present application is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a computer program for use by or in connection with a command execution system, apparatus, or device.

The readable signal medium may comprise a data signal propagated in baseband or as part of a carrier wave in which a readable computer program is embodied. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a computer program for use by or in connection with a command execution system, apparatus, or device.

While preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the application.

It will be apparent to those skilled in the art that various modifications and variations can be made to the present application without departing from the spirit or scope of the application. Thus, it is intended that the present application also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims

1. A method of model training, the method comprising:

2. The method according to claim 1, wherein the performing, by the target model, feature enhancement on each depth feature based on the obtained feature enhancement information of each depth feature, respectively, to obtain a corresponding target enhancement feature, adopts at least one of the following operations:

3. The method of claim 2, wherein the respective context information for each of the first enhancement features is obtained by:

4. The method of claim 3, wherein the obtaining the respective context information for each first enhancement feature based on the similarity between the query vector for each first enhancement vector and the key vector for at least one other first enhancement vector in combination with the value vector for the at least one other first enhancement vector comprises:

5. The method of claim 4, wherein determining the weighting coefficients for each of the at least one other image block based on the similarity between the query vector for the one image block and the key vector for each of the at least one other image block, comprises:

6. The method of claim 3, wherein the performing feature mapping on the first enhancement vectors to obtain the query vector, the key vector, and the value vector corresponding to the first enhancement vectors respectively includes:

7. The method of any of claims 2-6, wherein each depth feature is a multi-channel feature, the self-attention enhancing each depth feature based on the obtained self-attention information of each depth feature, respectively, to obtain a corresponding first enhanced feature, comprising:

and weighting the channel characteristics corresponding to the multiple channels based on the channel weights corresponding to the multiple channels respectively to obtain corresponding first enhancement characteristics.

8. The method of any one of claims 1-6, wherein the obtaining a test result of the training sample based on each determined test sub-result comprises:

9. The method according to any one of claims 2-6, wherein the feature extraction is performed on each image block included in the extracted training samples, and before obtaining the corresponding depth feature, the method further includes:

based on the obtained context information of each first enhancement feature, respectively performing context enhancement on each first enhancement feature to obtain a corresponding second enhancement feature, including:

for each of the image block groups, the following operations are performed:

10. A model training device, comprising:

11. The apparatus of claim 10, wherein the feature enhancement unit is specifically configured to perform at least one of:

12. The apparatus of claim 11, wherein the feature enhancement unit is specifically configured to:

13. An electronic device comprising a processor and a memory, wherein the memory stores a computer program which, when executed by the processor, causes the processor to perform the steps of the method of any of claims 1 to 9.

14. A computer readable storage medium, characterized in that it comprises a computer program for causing an electronic device to perform the steps of the method according to any one of claims 1-9 when said computer program is run on the electronic device.

15. A computer program product, characterized in that it comprises a computer program stored in a computer readable storage medium, from which computer readable storage medium a processor of an electronic device reads and executes the computer program, causing the electronic device to perform the steps of the method according to any one of claims 1-9.