CN116977262A

CN116977262A - Object quality detection method, quality detection model construction method and device

Info

Publication number: CN116977262A
Application number: CN202310340960.1A
Authority: CN
Inventors: 张博深
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2023-03-28
Filing date: 2023-03-28
Publication date: 2023-10-31

Abstract

The present application relates to an object quality detection method, a quality detection model construction method, an apparatus, a device, a storage medium, and a program product. The method involves artificial intelligence, comprising: and carrying out quality detection processing on the product image to be detected corresponding to the object quality detection request according to the trained quality detection model, and obtaining defect confidence coefficient data corresponding to the product image to be detected. According to each first detection model and each second detection model, prediction processing is carried out on each enhanced product image sample to obtain a first prediction result and a second prediction result, supervision loss is determined according to supervision data of the first prediction result and the second prediction result, reward parameters are determined according to supervision loss and strengthening loss of the second detection model, a target detection model is determined from each first detection model according to the reward parameters, distillation training is carried out on the second detection model according to the target detection model to obtain a quality detection model, and quality detection accuracy of products is improved by adopting the method.

Description

Object quality detection method, quality detection model construction method and device

Technical Field

The present application relates to the field of artificial intelligence, and in particular, to an object quality detection method, a method and an apparatus for constructing a quality detection model, a computer device, a storage medium, and a computer program product.

Background

With the development of artificial intelligence technology and the increasing of quality requirements of different products in the production and manufacturing process, the industrial defect quality inspection technology is developed. The industrial defect quality inspection mainly comprises the steps of obtaining a product surface picture by shooting the surface of an industrial product, and obtaining a corresponding detection and identification result by carrying out feature extraction, defect identification, quality detection and other treatments on the product surface picture so as to determine whether the product has defects or not and avoid the defective product from flowing into the market.

Conventionally, a method of training a convolutional neural network model is generally adopted to extract features of a product surface picture according to the convolutional neural network model, and the extracted features are classified into two categories, namely a defect type and a defect type, so that the quality problem of the product is classified.

However, in the practical application process, due to the diversification of products, the defect images collected for different products do not belong to simple classification, for example, the defect images including more defect images cannot be directly classified into the defect images to a lower degree, or the defect degrees of some images are very slight, and the defect images can be basically classified into the defect-free images, namely, the product images in the practical application process cannot be comprehensively expressed through simple binary labels. Meanwhile, the simple binary labels need manual pre-labeling, when different people label, the labeling results are different according to different defect degrees of the defect images, and the pre-labeled labels have noise and error data. If the model is trained by using the noisy and erroneous data, the obtained model also carries noise, which may result in degradation of the recognition, classification, and other performances of the model, and further the accuracy of the quality detection result obtained by using the model is also low.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a method of constructing a quality inspection model, a method of inspecting an object quality, an apparatus, a computer device, a computer-readable storage medium, and a computer program product, which are capable of improving quality inspection accuracy of a product.

In a first aspect, the present application provides a method for object quality detection. The method comprises the following steps:

receiving an object quality detection request, and acquiring a product image to be detected corresponding to the object quality detection request;

performing quality detection processing on the product image to be detected according to the trained quality detection model to obtain defect confidence coefficient data corresponding to the product image to be detected;

the trained quality detection model is obtained by performing knowledge distillation training on a second detection model according to a target detection model; the target detection models are determined from the first detection models according to rewarding parameters, and the rewarding parameters are determined according to supervision loss in the training process of the second detection models and reinforcement loss in the reinforcement training process of the first detection models according to the reinforced product image samples; the supervision loss is determined according to supervision data corresponding to a first prediction result and a second prediction result, the first prediction result is obtained by performing prediction processing on each enhanced product image sample according to a plurality of trained first detection models, and the second prediction result is obtained by performing prediction processing on each enhanced product image sample according to a second detection model.

In a second aspect, the present application provides a method for constructing a quality inspection model. The method comprises the following steps:

obtaining enhanced product image samples, and carrying out prediction processing on each enhanced product image sample according to a plurality of trained first detection models and second detection models to obtain a first prediction result corresponding to each first detection model and a second prediction result corresponding to the second detection model;

determining supervision data corresponding to each first prediction result, and determining supervision loss in the training process of the second detection model according to the supervision data and the second prediction result;

performing reinforcement training on each first detection model according to each reinforcement product image sample, and determining reinforcement loss in the reinforcement training process;

determining a reward parameter based on the supervision loss and the reinforcement loss;

and determining a target detection model from each first detection model according to the reward parameters, and carrying out knowledge distillation training on the second detection model according to the target detection model to obtain a trained quality detection model.

In a third aspect, the present application further provides an object quality detection apparatus. The device comprises:

The to-be-detected product image obtaining module is used for receiving an object quality detection request and obtaining a to-be-detected product image corresponding to the object quality detection request;

the defect confidence coefficient data obtaining module is used for carrying out quality detection processing on the product image to be detected according to the trained quality detection model to obtain defect confidence coefficient data corresponding to the product image to be detected;

In a fourth aspect, the application further provides a device for constructing the quality detection model. The device comprises:

the prediction result obtaining module is used for obtaining enhanced product image samples, and carrying out prediction processing on the enhanced product image samples according to a plurality of trained first detection models and second detection models to obtain first prediction results corresponding to the first detection models and second prediction results corresponding to the second detection models;

the monitoring loss determining module is used for determining monitoring data corresponding to each first prediction result and determining monitoring loss in the training process of the second detection model according to the monitoring data and the second prediction result;

the strengthening loss determining module is used for carrying out strengthening training on each first detection model according to each strengthening product image sample and determining strengthening loss in the strengthening training process;

a reward parameter determination module for determining a reward parameter based on the supervision loss and the reinforcement loss;

and the quality detection model obtaining module is used for determining a target detection model from the first detection models according to the reward parameters, and carrying out knowledge distillation training on the second detection model according to the target detection model to obtain a trained quality detection model.

In a fifth aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor which when executing the computer program performs the steps of:

In a sixth aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor which when executing the computer program performs the steps of:

In a seventh aspect, the present application also provides a computer-readable storage medium. The computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:

In an eighth aspect, the present application also provides a computer-readable storage medium. The computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:

In a ninth aspect, the present application also provides a computer program product. The computer program product comprises a computer program which, when executed by a processor, implements the steps of:

In a tenth aspect, the present application also provides a computer program product. The computer program product comprises a computer program which, when executed by a processor, implements the steps of:

In the object quality detection method, the construction method, the device, the computer equipment, the storage medium and the program product, the image of the product to be detected corresponding to the object quality detection request is obtained by receiving the object quality detection request, and then the quality detection processing is carried out on the image of the product to be detected according to the trained quality detection model, so that the defect confidence data corresponding to the image of the product to be detected is obtained. The trained quality detection model is obtained by carrying out knowledge distillation training on the second detection model according to the target detection model, the target detection model is obtained by determining from each first detection model according to reward parameters, the purpose of further weighting and selecting each first detection model by utilizing reinforcement training is achieved, the most suitable target detection model is determined, the reward parameters are obtained by determining the supervision loss in the training process of the second detection model and the reinforcement loss in the reinforcement training process of each first detection model according to the reinforced product image sample, and the supervision loss is obtained by determining according to the supervision data corresponding to the first prediction result and the second prediction result. The second detection model is subjected to knowledge distillation training according to the target model, so that the second detection model can learn knowledge in the first detection model, dependence on a pre-labeled label in a product image sample and noise error data brought by the pre-labeled label are reduced, a quality detection model with higher model accuracy is obtained, and quality detection accuracy of products by using the quality detection model is improved.

Drawings

FIG. 1 is an application environment diagram of an object quality detection method and a method for constructing a quality detection model in one embodiment;

FIG. 2 is a flow chart of a method of object quality detection in one embodiment;

FIG. 3 is a schematic representation of defect images of different severity in one embodiment;

FIG. 4 is a schematic diagram of a process for obtaining defect confidence data corresponding to an image of a product to be inspected in one embodiment;

FIG. 5 is a schematic diagram of defect confidence data corresponding to an image of a product to be inspected in one embodiment;

FIG. 6 is a flow diagram of obtaining a trained quality inspection model in one embodiment;

FIG. 7 is a schematic diagram of a process for obtaining a trained first detection model in one embodiment;

FIG. 8 is a schematic diagram of a reinforcement learning algorithm during reinforcement training in one embodiment;

FIG. 9 is a flow chart of a method for detecting object quality in another embodiment;

FIG. 10 is a flow chart of a method of constructing a quality inspection model in one embodiment;

FIG. 11 is a schematic diagram of a process for obtaining a quality inspection model in one embodiment;

FIG. 12 is a block diagram of an object quality detection apparatus in one embodiment;

FIG. 13 is a block diagram showing a construction apparatus of a quality inspection model in one embodiment;

Fig. 14 is an internal structural diagram of a computer device in one embodiment.

Detailed Description

The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

The object quality detection method and the quality detection model construction method provided by the embodiment of the application relate to an artificial intelligence technology, and can be applied to various scenes such as cloud technology, artificial intelligence, intelligent traffic, network media, auxiliary driving and the like. Among these, artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision. The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

The Computer Vision technology (CV) is a science of researching how to make a machine "look at", and more specifically, to replace a human eye with a camera and a Computer to perform machine Vision such as recognition, detection and measurement on a target, and further perform graphic processing, so that the Computer is processed into an image more suitable for the human eye to observe or transmit to an instrument to detect. As a scientific discipline, computer vision research-related theory and technology has attempted to build artificial intelligence systems that can acquire information from images or multidimensional data. Computer vision techniques typically include image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D techniques, virtual reality, augmented reality, synchronous positioning, and map construction, among others, as well as common biometric recognition techniques such as face recognition, fingerprint recognition, and others. Machine Learning (ML) is a multi-domain interdisciplinary, and involves multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, and algorithm complexity theory. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, confidence networks, reinforcement learning, transfer learning, induction learning, teaching learning, and the like.

With research and advancement of artificial intelligence technology, research and application of artificial intelligence technology is being developed in various fields, such as common smart home, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned, automatic driving, unmanned aerial vehicles, robots, smart medical treatment, smart customer service, etc., and it is believed that with the development of technology, artificial intelligence technology will be applied in more fields and with increasing importance value.

Cloud technology (Cloud technology) refers to a hosting technology for integrating hardware, software, network and other series resources in a wide area network or a local area network to realize calculation, storage, processing and sharing of data, is a generic term of network technology, information technology, integration technology, management platform technology, application technology and the like applied based on a Cloud computing business mode, can form a resource pool, and is flexible and convenient as required. Because background services of the technical network system require a large amount of computing and storage resources, such as video websites, picture websites and more portal websites, with the advanced development and application of the internet industry, each object may have its own identification mark, and all objects need to be transmitted to the background system for logic processing, data of different levels will be processed separately, and various industry data needs strong system rear shield support, which is usually realized through cloud computing as an important support. The artificial intelligence cloud Service, which is also commonly referred to as AIaaS (AI as Service), belongs to a Service method of an artificial intelligence platform currently mainstream, specifically, the AIaaS platform splits several common AI services and provides independent or packaged services at the cloud. The service mode is similar to that of an AI theme mall, namely, all developers can access one or more artificial intelligence services provided by the use platform through an API interface, and partial senior developers can deploy and operate and maintain self-proprietary cloud artificial intelligence services by using an AI framework and AI infrastructure provided by the platform.

The object quality detection method and the quality detection model construction method provided by the embodiment of the application relate to the technologies of computer vision technology, machine learning, cloud technology and the like in the artificial intelligence technology, and can be applied to an application environment shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. The data storage system may store data that the server 104 needs to process. The data storage system may be integrated on the server 104 or may be located on a cloud or other network server. The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, internet of things devices, portable wearable devices, aircrafts, etc., and the internet of things devices may be smart speakers, smart car devices, etc. The portable wearable device may be a smart watch, smart bracelet, headset, or the like. The server 104 may be an independent physical server, or may be a server cluster formed by a plurality of physical servers, or may be a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs, and basic cloud computing services such as big data and artificial intelligence platforms, where the terminal 102 and the server 104 may be directly or indirectly connected through wired or wireless communication modes, which is not limited in the embodiment of the present application.

Further, both the terminal 102 and the server 104 may be separately configured to execute the object quality detection method and the quality detection model construction method provided in the embodiments of the present application, and the terminal 102 and the server 104 may also cooperatively execute the object quality detection method and the quality detection model construction method provided in the embodiments of the present application. For example, taking the terminal 102 and the server 104 cooperatively execute the method for detecting the object quality provided in the embodiment of the present application as an example, the server 104 receives the request for detecting the object quality and obtains the image of the product to be detected corresponding to the request for detecting the object quality. The object quality detection request may be triggered based on the terminal 102, the terminal 102 may send the triggered object quality request to the server 104, and the product image to be detected may be stored in cloud storage of the server 104, or in a data storage system, or in local storage of the terminal 102, and may be obtained from the server 104, or the data storage system, or the terminal 102 when the object quality detection process is required. Further, the server 104 performs quality detection processing on the product image to be detected according to the trained quality detection model, so as to obtain defect confidence data corresponding to the product image to be detected, and further may feed the obtained defect confidence data back to the terminal 102, or store the defect confidence data in cloud storage data or a storage system of the server 104.

The trained quality detection model is obtained by performing knowledge distillation training on the second detection model by the server 104 according to the target detection model, the target detection model is obtained by determining the server 104 from the first detection models according to rewarding parameters, and the rewarding parameters are obtained by determining the server 104 according to supervision loss in the training process of the second detection model and reinforcement loss in the reinforcement training process of the first detection models according to the reinforced product image samples. The supervision loss is determined by the server 104 according to the supervision data corresponding to the first prediction result and the second prediction result, where the first prediction result is obtained by performing prediction processing on each enhanced product image sample according to a plurality of trained first detection models, and similarly, the second prediction result is obtained by performing prediction processing on each enhanced product image sample according to the second detection model.

Similarly, taking the method for constructing the quality detection model provided by the embodiment of the present application as an example, where the terminal 102 and the server 104 cooperatively execute the method, the server 104 obtains the enhanced product image samples, and performs prediction processing on each enhanced product image sample according to a plurality of trained first detection models and second detection models, so as to obtain a first prediction result corresponding to each first detection model and a second prediction result corresponding to the second detection model. The enhanced product image sample may be stored in a cloud storage of the server 104, or in a data storage system, or in a local storage of the terminal 102, and may be obtained from the server 104, or the data storage system, or the terminal 102 when the quality detection model needs to be built. Further, the server 104 determines the supervision loss in the training process of the second detection model according to the supervision data and the second prediction result by determining the supervision data corresponding to each first prediction result, and similarly, the server 104 also needs to perform the reinforcement training on each first detection model according to each enhanced product image sample, and determine the reinforcement loss in the reinforcement training process, so as to determine the reward parameter based on the supervision loss and the reinforcement loss. Finally, the server 104 determines a target detection model from the first detection models according to the reward parameters, and performs knowledge distillation training on the second detection model according to the target detection model, thereby obtaining a trained quality detection model.

In one embodiment, as shown in fig. 2, there is provided an object quality detection method, which is described by taking an example that the method is applied to the server in fig. 1, and includes the following steps:

step S202, receiving an object quality detection request, and acquiring a product image to be detected corresponding to the object quality detection request.

Specifically, in the quality detection process of industrial products or electronic products, a detection personnel can trigger an object quality detection request based on a terminal, the terminal sends the triggered object quality detection request to a server, and after receiving the object quality detection request, the server further analyzes the object quality detection request based on the object quality request to obtain a product image to be detected corresponding to the object quality detection request.

The product image to be detected can be a product image of an industrial product or an electronic product, such as an integral surface image of the industrial product or the electronic product, a surface image of a certain component, or a joint image of different components, and the corresponding defect confidence data are obtained by detecting the quality of the product image of the industrial product or the electronic product, and whether quality defects exist in each product image can be determined according to the defect confidence data.

In one embodiment, as shown in fig. 3, defect images with different severity levels are provided, referring to fig. 3, it can be seen that fig. 3, fig. a is an OK image (i.e., a product image without quality defects), fig. 3, fig. b is a light defect image (i.e., the image defect level is very slight, which may be essentially divided into defect-free images), and fig. 3, fig. c is a serious defect image.

When the traditional two-classification deep learning model is adopted to detect each product image, the product images can be simply divided into defective images and non-defective images, the product images with different severity can not be subdivided, the product images with lower defect levels can be easily and directly divided into defective images, the actual division condition of the non-defective images can be basically divided due to the very slight defect levels of the images, and further the trained quality detection model is needed to be utilized to detect the quality of the product images to be detected with different severity levels, so that the respective defect confidence coefficient data can be obtained, and whether the product images belong to the defective images can be further judged according to the defect confidence coefficient data.

And step S204, carrying out quality detection processing on the product image to be detected according to the trained quality detection model, and obtaining defect confidence coefficient data corresponding to the product image to be detected. The trained quality detection model is obtained by carrying out knowledge distillation training on a second detection model according to a target detection model, the target detection model is obtained by determining from each first detection model according to rewarding parameters, the rewarding parameters are obtained by determining according to supervision loss in the training process of the second detection model and reinforcement loss in the reinforcement training process of each first detection model according to the reinforcement product image samples, the supervision loss is obtained by determining according to supervision data corresponding to first prediction results and second prediction results, the first prediction results are obtained by carrying out prediction processing on each reinforcement product image sample according to a plurality of trained first detection models, and the second prediction results are obtained by carrying out prediction processing on each reinforcement product image sample according to the second detection model.

Specifically, quality detection processing is carried out on each product image to be detected according to the trained quality detection model, and defect confidence coefficient data corresponding to the product image to be detected is obtained through the quality detection model. The defect confidence coefficient data is used for determining whether the product image to be detected belongs to the defect image or not, specifically, the defect confidence coefficient data and a preset defect confidence threshold value can be compared, if the defect confidence coefficient data is larger than the preset defect confidence threshold value, the corresponding product image to be detected is indicated to belong to the defect image, and if the defect confidence coefficient data is not larger than the preset defect confidence threshold value, the corresponding product image to be detected is indicated to belong to the defect-free image.

The preset defect confidence threshold may be set according to actual requirements, and specifically may be different values within the (0, 1) range, for example, values of 0.5, 0.8, or 0.9, and is not limited to a certain specific value or a certain specific values. For example, the preset defect confidence threshold may be set to 0.5, which indicates that the corresponding product image to be detected belongs to a defect image if the defect confidence data is greater than the preset defect confidence threshold by 0.5, and indicates that the corresponding product image to be detected belongs to a defect-free image if the defect confidence data is not greater than the preset defect confidence threshold by 0.5.

In one embodiment, the quality detection model for performing quality detection is obtained by performing knowledge distillation training on the second detection model according to a target detection model, wherein the target detection model is determined from each first detection model according to a reward parameter, and the reward parameter is determined according to a supervision loss in the training process of the second detection model and an enhancement loss in the enhancement training process of each first detection model according to an enhancement product image sample. The monitoring loss is determined according to monitoring data corresponding to a first prediction result and a second prediction result, the first prediction result is obtained by performing prediction processing on each enhanced product image sample according to a plurality of trained first detection models, and the second prediction result is obtained by performing prediction processing on each enhanced product image sample according to a second detection model.

The network hierarchical structures of the first detection model and the second detection model are the same, the number of hierarchical nodes of the first detection model is larger than that of hierarchical nodes of the same hierarchy in the second detection model, and then knowledge of the target detection model determined from each first detection model is transmitted to the second detection model when knowledge distillation training is carried out.

Specifically, the first prediction result can be obtained by performing prediction processing on each enhanced product image sample according to a plurality of trained first detection models, the second detection result can be obtained by performing prediction processing on each enhanced product image sample according to a second detection model, and further, the supervision loss in the training process of the second detection model can be determined by determining the supervision data corresponding to each first prediction result and according to the supervision data and the second prediction result.

Further, the first detection models are subjected to reinforcement training according to the reinforcement product image samples, reinforcement loss in the chemical training process can be obtained, and further, according to the determined supervision loss and reinforcement loss, rewarding parameters for screening the target detection models can be determined, so that the target detection models are determined from the first detection models according to the rewarding parameters, knowledge distillation training is performed on the second detection models according to the target detection models, and finally, the trained quality detection models are obtained.

In one embodiment, as shown in fig. 4, an example of a process of obtaining defect confidence data corresponding to an image of a product to be detected is provided, and referring to fig. 4, it can be known that the defect confidence data corresponding to the image of the product to be detected can be output by inputting the image of the product to be detected into a trained quality detection model, extracting features of the image of the product to be detected through the quality detection model, and predicting probability. Further, the defect confidence data and a preset defect confidence threshold (an example of the defect confidence threshold is shown in fig. 4, and the defect confidence threshold is 0.5) are compared, and the product images to be detected are classified according to the comparison result.

If the defect confidence coefficient data is greater than the preset defect confidence coefficient threshold value 0.5, the corresponding product image to be detected is indicated to belong to the defect image, and if the defect confidence coefficient data is not greater than the preset defect confidence coefficient threshold value 0.5, the corresponding product image to be detected is indicated to belong to the defect-free image, namely to the normal image, so that the accurate classification of the product image to be detected according to the defect confidence coefficient data is realized.

Further, as shown in fig. 5, an example of defect confidence data corresponding to an image of a product to be detected is provided, referring to fig. 5, it can be seen that the image a in fig. 5 is an OK image (i.e. a normal image without defects), that is, the defect confidence data corresponding to the image a is 0.05, which is less than the defect confidence threshold value of 0.5, and the image b in fig. 5 is a defect image, that is, the defect confidence data corresponding to the image b is 0.95, which is greater than the defect confidence threshold value of 0.5.

In the object quality detection method, the object quality detection request is received to obtain the image of the product to be detected corresponding to the object quality detection request, and then the quality detection processing is carried out on the image of the product to be detected according to the trained quality detection model to obtain the defect confidence coefficient data corresponding to the image of the product to be detected. The trained quality detection model is obtained by carrying out knowledge distillation training on the second detection model according to the target detection model, the target detection model is obtained by determining from each first detection model according to reward parameters, the purpose of further weighting and selecting each first detection model by utilizing reinforcement training is achieved, the most suitable target detection model is determined, the reward parameters are obtained by determining the supervision loss in the training process of the second detection model and the reinforcement loss in the reinforcement training process of each first detection model according to the reinforced product image sample, and the supervision loss is obtained by determining according to the supervision data corresponding to the first prediction result and the second prediction result. The second detection model is subjected to knowledge distillation training according to the target model, so that the second detection model can learn knowledge in the first detection model, dependence on a pre-labeled label in a product image sample and noise error data brought by the pre-labeled label are reduced, a quality detection model with higher model accuracy is obtained, and quality detection accuracy of products by using the quality detection model is improved.

In one embodiment, as shown in fig. 6, the manner of obtaining the trained quality detection model specifically includes the following steps:

step S602, obtaining enhanced product image samples, and carrying out prediction processing on each enhanced product image sample according to a plurality of trained first detection models and second detection models to obtain a first prediction result corresponding to each first detection model and a second prediction result corresponding to the second detection model.

Specifically, by acquiring product image samples and performing data enhancement processing on each product image sample, enhanced product image samples subjected to data enhancement processing under different visual angles are obtained, so that randomness of the product image samples for model training is increased, and error data carried by each product image sample in the training process is reduced. The data enhancement processing includes different processing modes such as rotation, translation, scaling, clipping and Gaussian noise addition on the product image.

Further, according to the plurality of trained first detection models, the first prediction results corresponding to the first detection models can be obtained by performing prediction processing on the enhanced product image samples, and similarly, the second prediction results corresponding to the second detection models can be obtained by performing prediction processing on the enhanced product image samples according to the second detection models.

The network hierarchical structures of the first detection model and the second detection model are the same, and the number of hierarchical nodes of the first detection model is larger than that of hierarchical nodes of the same hierarchy in the second detection model, namely the first detection model can be understood as a teacher model for carrying out knowledge distillation training on the second detection model, and the second detection model can be understood as a student model for learning knowledge of other teacher models. The first detection model and the second detection model may be specifically a deep learning model or a neural network model, that is, a quality detection model that is finally used for quality detection may be obtained by performing multi-level training on the deep learning model or the neural network model.

In one embodiment, before obtaining the enhanced product image samples and performing prediction processing on each enhanced product image sample according to a plurality of trained first detection models and second detection models to obtain a first prediction result corresponding to each first detection model and a second prediction result corresponding to the second detection model, the method further comprises the step of obtaining the trained first detection models. The method for obtaining the trained first detection model comprises the following steps:

Acquiring product image samples, and training an initial detection model according to each product image sample; determining training loss in the initial detection model training process; and if the training loss meets the training ending condition of the initial detection model, obtaining a trained first detection model.

Specifically, product images corresponding to industrial products or electronic products are obtained, and each product image is labeled in advance, namely a labeling label comprising a defect label and a normal label is added to the product image, so that a plurality of product image samples are obtained, and the initial detection model is trained according to each product image sample carrying the labeling label, so that a trained first detection model is obtained.

Further, in the training process, the training loss of training the initial detection model is determined, and whether the training loss meets the training ending condition of the initial detection model is judged. And if the training loss is determined to meet the training ending condition of the initial detection model, obtaining a trained first detection model. The training ending condition of the initial detection model may specifically be that the training loss reaches a preset loss threshold, or that the training iteration number of the initial detection model reaches a preset number.

Specifically, the training loss l for training the initial detection model is obtained through the determination and calculation of the following formula (1):

l＝CE(p,y) (1)；

wherein, p represents the feature extraction of the product image sample through the initial detection model and outputs the obtained probability prediction result, y represents the labeling label (comprising a normal label and a defect label) carried by the product sample image, and CE (-) represents the cross entropy loss function, namely, the training loss l for training the initial detection model can be obtained by calculating the cross entropy loss value between the probability prediction result p and the labeling label y carried by the product sample image.

In one embodiment, other loss functions, such as Kl loss (i.e., kl divergence loss), may be used in calculating the training loss, and the loss calculation method is not limited to the cross entropy loss function.

Further, specifically, the probability prediction result p obtained by extracting the characteristics of the product image sample through the initial detection model and outputting is determined through the following formula (2):

p＝f(x；θ) (2)；

wherein f (; theta) represents an initial detection model, x represents a product image sample, theta represents a weight parameter of the initial detection model in the training process, and the training-completed model weight theta is obtained by performing gradient descent iteration updating on the model parameter by using the training loss l.

In one embodiment, as shown in fig. 7, a process example of obtaining a trained first detection model is provided, and referring to fig. 7, it can be known that by training an initial detection model by using a plurality of product image samples, determining a training loss in the training process according to an output result of the model, that is, a probability prediction result, and a label corresponding to the product image sample, and then further judging whether the training loss meets a training end condition of the initial detection model, and obtaining the trained first detection model when determining that the training loss meets the training end condition of the initial detection model.

Further, training the plurality of initial detection models according to different product image sample sets to obtain a plurality of trained first detection models, for example, K trained first detection models, and model weights of the first detection models, and randomly initializing the model weights of the first detection models to obtain model weights with differences, including

The structure with larger model parameters is adopted as a teacher model (namely a first detection model), the strong learning fitting capability of the large model is utilized, so that a student model (namely a second detection model) can learn knowledge of the teacher model, and in the subsequent training process of the second detection model, a supervision signal except a pre-labeled label is provided, the influence caused by the noise label in the label is overcome, and the teacher model is not directly deployed in the final deployment stage, namely the quality detection model obtained by carrying out multi-level training on the second detection model is actually deployed, and the teacher model is not required to be deployed, so that the problem of time consumption increase in training caused by training and deployment of the large model is avoided.

In one embodiment, for the product image samples (x, y), enhancement processing such as rotation, translation, scaling, clipping, gaussian noise addition and the like is performed to obtain enhanced product image samples (x', y), prediction processing is performed on each enhanced product image sample by using the trained K first detection models to obtain a first prediction result corresponding to each first detection model, and prediction processing is performed on each enhanced product image sample by using the second detection model to obtain a second prediction result corresponding to the second detection model.

Specifically, the following formula (3) is used for obtaining a first prediction result corresponding to each first detection model:

wherein K represents the number of the first detection models as K, i.e. the first detection models may specifically comprisep ₁ Representing the first detection model->P is the first prediction result of (2) ₂ The representation represents the first detection model->P is the first prediction result of (2) _K Representing the first detection model->X' represents the addition of product image samples, +.>The model parameters are used for representing the first detection models respectively.

Likewise, a second prediction result corresponding to the second detection model is obtained specifically by the following equation (4):

p _s ＝f _S (x′；θ _S ) (4)；

wherein f _s (；θ _S ) Representing the secondDetection model, θ _S For the model weight of the second detection model to be optimized in the training process, c' represents adding the product image sample, p _s And representing a second prediction result corresponding to the second detection model.

Step S604, determining supervision data corresponding to each first prediction result, and determining a supervision loss in the training process of the second detection model according to the supervision data and the second prediction result.

Specifically, the weighting parameters associated with each first detection model are initialized and normalized to obtain the processed weighting coefficients, and then the weighting processing is sequentially performed on each first prediction result based on the weighting coefficients to obtain the supervision data in the training process of the second detection model.

Wherein the second detection model is further guided by designing a set of weighting parameters { W } during the training of the second detection model in order to utilize the prediction results of the plurality of trained first detection models ₁ ，W ₂ ，...，W _K And (d) weighting the first prediction result of the first detection model.

Specifically, by weighting { W } ₁ ，W ₂ ，...，W _K Initialization processing is performed, and weighting parameters { W } are set using a softmax function by the following equation (5) ₁ ，W ₂ ，...，W _K Normalized to obtain the weighting coefficient W _i ′：

W _i ′＝softmax(W ₁ ，W ₂ ，...，W _K )，i∈1，2，...，K (5)

Wherein i represents the ith weighting coefficient, and K represents that there are K weighting coefficients, that is, K weighting coefficients corresponding to K first detection models one by one are specifically included. Wherein the leachable parameter { W ] is plotted by a softmax function ₁ ，W ₂ ，...，W _K Normalized to obtain weighting coefficient W _i 'can be understood as a form of probability and the sum of the weighting coefficients is 1, i.e. W' ₁ 、W′ ₂ 、......、W′ _K Sum of (2) results of summation of (3)1.

Further, by a weighting factor W' _i And weighting each first prediction result in turn for constructing and obtaining supervision data in the training process of the second detection model. Wherein the supervision data p is specifically constructed by the following formula (6) _merge ：

p _merge ＝W′ ₁ *p ₁ +W′ ₂ *p ₂ +…，+W′ _K *p _K (6)；

Wherein p is ₁ Representing a first detection modelP is the first prediction result of (2) ₂ Representation represents a first detection modelP is the first prediction result of (2) _K Representing the first detection model->W 'of the first prediction result of (2)' ₁ 、W′ ₂ 、……、W′ _K The weighting coefficients corresponding to the first detection models are represented.

Further, after determining the supervision data corresponding to each first prediction result, determining the supervision loss in the training process of the second detection model according to the supervision data and the second prediction result. Wherein the supervision loss L in the training process of the second detection model is determined by specifically adopting the following formula (7) _kd ：

L _kd ＝CE(p _S ，p _merge ) (7)；

Wherein p is _S Representing a second prediction result, p, corresponding to a second detection model _merge Representing the supervision data during the training of the second detection model, CE (·) representing the cross entropy loss function, i.e. by calculating the second prediction result p _S Supervision data p _merge Cross entropy loss value between the two can obtain supervision in the training process of the second detection modelLoss L _kd 。

In one embodiment, a supervised loss L is calculated _kd In this case, other loss functions, such as Kl loss (i.e., kl divergence loss), may be used, and the loss calculation method is not limited to the cross entropy loss function.

Step S606, performing reinforcement training on each first detection model according to each reinforcement product image sample, and determining reinforcement loss in the reinforcement training process.

Specifically, according to each enhanced product image sample, performing enhanced training on each first detection model to obtain model weight data corresponding to each first detection model, and determining target model weight data meeting the screening conditions of the enhanced training from the model weight data. And after determining the weight data of the target model meeting the screening conditions of the reinforcement training, determining a reinforcement detection model matched with the weight data of the target model, and carrying out prediction processing on each reinforcement product image sample according to the reinforcement detection model, thereby obtaining reinforcement prediction results corresponding to each reinforcement detection model, and determining reinforcement loss according to the reinforcement detection results and the second prediction results.

The purpose of performing reinforcement training on each first detection model according to each reinforcement product image sample is to determine the most suitable reinforcement detection model from K first detection models by utilizing the discrete search capability of reinforcement learning, and monitor the most suitable reinforcement detection model in the training process of the second detection model according to different reinforcement product image samples, so that the selection capability and the feature expression capability in the knowledge distillation process are enhanced.

Further, by determining model weight data corresponding to each first detection model after reinforcement training, the model weight data specifically comprises W "= [ W ] ₁ ″，W ₂ ″，...，W _i ″，...，W _K ″]Wherein W is _i "represents model weight data corresponding to the ith first detection model, W _K "represents model weight data corresponding to the kth first detection model. Wherein W is _i "has a value of (0, 1), W _i "bigger then corresponding first testThe higher the score of the test model, the more intensive training screening conditions can be understood as the need to screen out W _i The largest first detection model, namely, the largest target model weight data is selected through screening, and the first detection model matched with the target weight data is determined as the enhanced detection model.

Specifically, the model index ID of the first detection model matched with the target weight data is determined according to the following formula (8):

ID＝argmax(W″) (8)；

where ID represents the model index of the first detection model that matches the target weight data, argmax (W') represents the maximum target model weight data.

Similarly, after the reinforcement detection model matched with the target weight data is determined, prediction processing is further performed on each reinforcement product image sample according to the reinforcement detection model, so that reinforcement prediction results corresponding to each reinforcement detection model are obtained, and reinforcement loss is determined according to the reinforcement detection results and the second prediction results.

Further, the strengthening loss L is determined specifically by the following formula (9) _reinf ：

L _reinf ＝CE(p _S ，p _ID ) (9)；

Wherein p is _S Representing two prediction results corresponding to the second detection model, p _ID And (3) representing the reinforcement prediction result obtained by performing prediction processing on each reinforcement product image sample according to the determined reinforcement detection model. CE (. Cndot.) represents the cross entropy loss function, i.e., by calculating the second prediction result p _S And strengthening the prediction result p _ID The cross entropy loss value between the two can obtain the strengthening loss L _reinf 。

In one embodiment, the intensification loss L is calculated _reinf In this case, other loss functions, such as K1 loss (i.e., KL divergence loss), may be used, and the loss calculation method is not limited to the cross entropy loss function.

In one embodiment, reinforcement learning may be understood as an optimization algorithm in the training process, as shown in fig. 8, and an example of a reinforcement learning algorithm in the reinforcement training process is provided, and referring to fig. 8, it is known that in the reinforcement learning algorithm, specifically, according to the current state s, the environment is changed by the action policy (i.e. action a) of the Agent, and a reward parameter r is fed back. The reward parameter r is a score value used for measuring the current state, and the higher the score is, the more correct the Agent's behavior is, i.e. through reinforcement learning, the Agent can learn a behavior strategy that maximizes the reward function r.

Specifically, for the first detection model in the embodiment of the present application, the Agent is model weight data corresponding to each first detection model after the reinforcement training, including W ₁ ″，W ₂ ″，...，W _i ″，...，W _K The behavior strategy of the Agent can be understood as a first detection model specifically trained in the training process, and an enhanced image sample applied to the trained first detection model, and when determining the reward parameter r, fusion loss in the training process of the second detection model is required to be applied. Specifically, the inverse number of fusion losses in the training process of the second detection model is determined as a reward parameter r.

Wherein, the rewarding parameter r is determined by the following formula (10):

r＝-L _all (10)；

wherein L is _all The fusion loss in the training process of the second detection model is represented, and is specifically determined by label loss, supervision loss, reinforcement loss and difference loss in the training process of the second detection model.

Step S608, determining the reward parameter based on the supervision loss and the reinforcement loss.

The reward parameter is the inverse number of the fusion loss in the training process of the second detection model, and the fusion loss in the training process of the second detection model is specifically determined by label loss, supervision loss, reinforcement loss and difference loss in the training process of the second detection model.

Specifically, based on the second prediction result and the labeling label carried by the enhanced product image sample, determining label loss in the training process of the second detection model, and determining difference loss according to the enhanced prediction result and the supervision data.

Wherein the label loss L in the training process of the second detection model is calculated specifically by the following formula (11) _sup ：

L _sup ＝CE(p _S ，y) (11)；

Wherein p is _S For the second prediction result corresponding to the second detection model, y represents the labeling label (including normal label and defect label) carried by the enhanced product sample image, and CE (·) represents the cross entropy loss function, i.e. by calculating the second prediction result p _S And enhancing the cross entropy loss value between the labeling labels y carried by the product sample images to obtain label loss L in the training process of the second detection model _sup 。

In one embodiment, a tag loss L is calculated _sup In this case, other loss functions, such as K1 loss (i.e., KL divergence loss), may be used, and the loss calculation method is not limited to the cross entropy loss function.

Similarly, the difference loss L is calculated specifically by the following equation (12) _diff ：

L _diff ＝-KL(p _merge ||p _ID ) (12)；

Wherein KL () represents KL divergence, p _merge Representing supervisory data, p _ID Represents the enhanced detection result, -KL (p) _merge ||p _ID ) It can be understood that the supervision data p is calculated _merge Enhanced detection result p _ID And obtaining divergence difference data between the reinforcement learning branches and the softmax function weighting branches in the training process. Wherein by setting the difference loss L _diff It is avoided that both the reinforcement learning branch and the softmax function weighting branch converge to the same minimum, but the supervision data p is made _merge And enhanced detection result p _ID The difference between them becomes larger, avoiding p _merge ＝＝p _ID Reducing the impact on the model accuracy of the second detection model during training.

Further, according to the label loss, the supervision loss, the first weight corresponding to the supervision loss, the reinforcement loss, the second weight corresponding to the reinforcement loss, the difference loss and the third weight corresponding to the difference loss, the fusion loss is determined, and finally, the reward parameter in the reinforcement training process can be determined based on the fusion loss. Wherein, the fusion loss L in the model training process is calculated by the following formula (13) _all ：

L _all ＝L _sup +β ₁ L _kd +β ₂ L _reinf +β ₃ L _diff (13)；

Wherein L is _sup Representing label loss, L, during training of the second detection model _kd Representing the loss of supervision, beta, during training of the second detection model ₁ Representing a first weight corresponding to a supervision loss, L _reinf Indicating loss of intensification, beta ₂ Representing a second weight corresponding to the intensification loss, L _diff Indicating differential loss, beta ₃ A third weight corresponding to the difference loss is represented. Wherein beta is ₁ 、β ₂ 、β ₃ The specific gravity of each loss can be adjusted and set according to actual application requirements, and the method is not limited to specific values.

In one embodiment, L is lost by fusion _all Gradient calculation is carried out, and model weight theta of the second detection model can be updated by back propagation _S Weighting coefficient { W' ₁ ，W′ ₂ ，...，W _i ′，...，W′ _K Model weight data { W } corresponding to each first detection model after reinforcement training ₁ ″，W ₂ ″，...，W _i ″，...，W _K "to achieve co-optimization of different weights or parameters.

Wherein the weighting coefficient { W' ₁ ，W′ ₂ ，...，W _i ′，...，W′ _K Each first detection model after the intensive training corresponds toModel weight data { W } ₁ ″，W ₂ ″，...，W _i ″，...，W _K "continuously updating in the training process, namely determining different reinforcement detection models and target detection models according to different reinforcement training data (namely different reinforcement product image samples), and W _i ' and W _i The size of the "also directly reflects the contribution degree of the corresponding ith teacher model, so that the most appropriate enhanced detection model and the target detection model are selected. The model weight of each first detection model can be adaptively adjusted by utilizing the difference of fitting capacities of different first detection models on different enhanced product image samples, so that the second detection model is supervised according to the determined target detection model, and the influence of noise labels in labeling labels on the second detection model in the training process is reduced.

Step S610, determining a target detection model from the first detection models according to the reward parameters, and performing knowledge distillation training on the second detection model according to the target detection model to obtain a trained quality detection model.

Specifically, the maximum rewarding parameter is determined from the rewarding parameters, the target detection model corresponding to the maximum rewarding parameter is determined from the first detection models, and further knowledge distillation training is carried out on the second detection model according to the enhanced product image samples and the target detection models, so that a trained quality detection model is obtained.

Further, the target detection model corresponding to the maximum reward parameter may be understood as the first detection model with the highest score value determined in the reinforcement training process, that is, by using the first detection model with the highest score value, the second detection model is supervised and knowledge distillation trained, and when the training end condition of the second detection model is satisfied, a trained quality detection model is obtained.

The training ending condition of the second detection model may specifically be that the fusion loss in the model training process reaches a preset fusion loss threshold, or that the training iteration number of the second detection model reaches a preset training number, and when the training ending condition of the second detection model is reached, determining the second detection model at the end of training as a trained quality detection model.

In this embodiment, by acquiring the enhanced product image samples, prediction processing is performed on each enhanced product image sample according to a plurality of trained first detection models and second detection models, so as to obtain a first prediction result and a second prediction result. Further, by determining the supervision data corresponding to each first prediction result, the supervision data is utilized to realize the supervision training of the second prediction result of the second detection model, so that the supervision loss in the process of training the second detection model can be determined according to the supervision data and the second prediction result. Meanwhile, according to each enhanced product image sample, strengthening training is carried out on each first detection model, strengthening loss in the strengthening training process is determined, and rewarding parameters are determined based on obtained supervision loss and strengthening loss, so that the most suitable target detection model is determined from each first detection model according to rewarding parameters, the purposes of further weighting and selecting each first detection model and determining the most suitable target detection model by utilizing strengthening training are achieved, knowledge distillation training is carried out on the second detection model according to the target detection model, the second detection model can learn knowledge in the first detection model, dependence on labels marked in the product image sample in advance and noise error data caused by the labels marked in advance are reduced, and therefore a quality detection model with higher model accuracy is obtained, and quality detection accuracy of products by utilizing the quality detection model is improved.

In one embodiment, as shown in fig. 9, there is provided an object quality detection method, which specifically includes the following steps:

step S901, obtaining product image samples, and training an initial detection model according to each product image sample.

Step S902, determining the training loss in the training process of the initial detection model, and if the training loss meets the training ending condition of the initial detection model, obtaining a trained first detection model.

Step S903, obtaining enhanced product image samples, and performing prediction processing on each enhanced product image sample according to the plurality of trained first detection models and the second detection models, so as to obtain a first prediction result corresponding to each first detection model and a second prediction result corresponding to the second detection model.

Step S904, performing an initialization process and a normalization process on the weighting parameters associated with each first detection model, and obtaining each weighting coefficient after the processing.

Step S905, based on each weighting coefficient, weighting processing is sequentially performed on each first prediction result, so as to obtain supervision data in the training process of the second detection model.

Step S906, according to the supervision data and the second prediction result, determining the supervision loss in the training process of the second detection model.

Step S907, performing reinforcement training on each first detection model according to each reinforcement product image sample, and obtaining model weight data corresponding to each first detection model.

Step S908, determining target model weight data satisfying the training screening condition from the model weight data.

Step S909, determining a strengthening detection model matched with the target weight data, and carrying out prediction processing on each strengthening product image sample according to the strengthening detection model to obtain strengthening prediction results corresponding to each strengthening detection model.

Step S910, determining the strengthening loss according to the strengthening detection result and the second prediction result.

Step S911, determining label loss in the training process of the second detection model based on the second prediction result and the labeling label carried by the enhanced product image sample.

Step S912, determining a difference loss according to the reinforcement prediction result and the supervision data.

Step S913, determining a fusion loss according to the label loss, the supervision loss, the first weight corresponding to the supervision loss, the reinforcement loss, the second weight corresponding to the reinforcement loss, the difference loss and the third weight corresponding to the difference loss.

Step S914, determining rewarding parameters in the strengthening training process based on the fusion loss.

Step S915, determining a maximum rewarding parameter from the rewarding parameters, and determining a target detection model corresponding to the maximum rewarding parameter from the first detection models.

And step 916, performing knowledge distillation training on the second detection model according to the image samples of the enhanced products and the target detection model to obtain a trained quality detection model.

Step S917, receiving the object quality detection request, and acquiring a product image to be detected corresponding to the object quality detection request.

And step S918, carrying out quality detection processing on the product image to be detected according to the trained quality detection model, and obtaining defect confidence coefficient data corresponding to the product image to be detected.

In one embodiment, as shown in fig. 10, a method for constructing a quality detection model is provided, and the method is applied to the server in fig. 1 for illustration, and specifically includes the following steps:

step S1002, obtaining enhanced product image samples, and performing prediction processing on each enhanced product image sample according to a plurality of trained first detection models and second detection models, so as to obtain a first prediction result corresponding to each first detection model and a second prediction result corresponding to the second detection model.

Specifically, the enhanced product image samples after the data enhancement processing under different visual angles are obtained by acquiring product image samples and performing data enhancement processing on each product image sample, wherein the data enhancement processing includes different processing modes such as rotation, translation, scaling, clipping, gaussian noise addition and the like on the product image.

In one embodiment, the means for obtaining a trained first detection model comprises:

Step S1004, determining supervision data corresponding to each first prediction result, and determining a supervision loss in the training process of the second detection model according to the supervision data and the second prediction result.

Specifically, the weighting parameters associated with each first detection model are initialized and normalized to obtain the processed weighting coefficients, and then the weighting processing is sequentially performed on each first prediction result based on the weighting coefficients to obtain the supervision data in the training process of the second detection model, and after the supervision data corresponding to each first prediction result is determined, the supervision loss in the training process of the second detection model is further determined according to the supervision data and the second prediction result.

Step S1006, performing reinforcement training on each first detection model according to each reinforcement product image sample, and determining reinforcement loss in the reinforcement training process.

Specifically, according to each enhanced product image sample, performing enhanced training on each first detection model, obtaining model weight data corresponding to the enhanced trained first detection model, and determining target model weight data meeting the screening conditions of the enhanced training from the model weight data. And after determining the weight data of the target model meeting the screening conditions of the reinforcement training, determining a reinforcement detection model matched with the weight data of the target model, and carrying out prediction processing on each reinforcement product image sample according to the reinforcement detection model, thereby obtaining reinforcement prediction results corresponding to each reinforcement detection model, and determining reinforcement loss according to the reinforcement detection results and the second prediction results.

Step S1008, determining a reward parameter based on the supervising and strengthening losses.

Specifically, based on the second prediction result and the labeling label carried by the enhanced product image sample, determining label loss in the training process of the second detection model, and determining difference loss according to the enhanced prediction result and the supervision data. Further, according to the label loss, the supervision loss, the first weight corresponding to the supervision loss, the reinforcement loss, the second weight corresponding to the reinforcement loss, the difference loss and the third weight corresponding to the difference loss, determining fusion loss, and finally determining the rewarding parameter in the reinforcement training process based on the fusion loss.

And step S1010, determining a target detection model from the first detection models according to the rewarding parameters, and carrying out knowledge distillation training on the second detection model according to the target detection model to obtain a trained quality detection model.

In one embodiment, as shown in fig. 11, there is provided an example of a process for obtaining a quality inspection model, and as can be seen with reference to fig. 11, the process for obtaining the quality inspection model specifically includes the following stages:

1. one training phase

The primary training stage is used for training a plurality of initial detection models, and K first detection models are obtained through training.

Specifically, product images corresponding to industrial products or electronic products are obtained, and each product image is labeled in advance, namely labeling labels including defect labels and normal labels are added to the product images, so that a plurality of product image samples are obtained, and accordingly initial detection models are respectively trained according to different product image samples carrying the labeling labels, and K trained first detection models are obtained.

As can be seen from fig. 11, the product image samples are (x, y), and the initial detection models are trained by the product image samples to obtain K first detection models (including T ₁ 、T ₂ 、……、T _K ) And model weights for each first detection model

2. Secondary training phase

Wherein in the secondary training phase, a first prediction result (including P) of the trained first detection model for the enhanced product image sample needs to be obtained ₁ 、P ₂ 、……、P _K ) And obtaining a second prediction result (i.e., P) of the second detection model for the enhanced product image sample _S ) And designing a set of weighting parameters { W } during training of the second detection model ₁ ,W ₂ ,…,W _K Initializing and normalizing the weighting parameters to obtain weighting coefficients { W } ₁ ′、W ₂ ′、……、W _K ' and according to the weighting coefficient { W } ₁ ′、W ₂ ′、……、W _K And (3) weighting the first prediction result of the first detection model so as to construct and obtain supervision data in the training process of the second detection model, and further determining supervision loss in the training process of the second detection model according to the supervision data and the second prediction result.

Specifically, for the product image samples (x, y), enhancement processing such as rotation, translation, scaling, clipping, gaussian noise addition and the like is performed to obtain enhanced product image samples (x', y), prediction processing is performed on each enhanced product image sample by using the trained K first detection models to obtain a first prediction result corresponding to each first detection model, and prediction processing is performed on each enhanced product image sample by using the second detection model to obtain a second prediction result corresponding to the second detection model.

Similarly, in the secondary training stage, it is also necessary to perform reinforcement training on each first detection model according to each reinforcement product image sample, and determine reinforcement loss in the reinforcement training process.

And performing reinforcement training on each first detection model according to each reinforcement product image sample to obtain model weight data corresponding to each first detection model, and determining target model weight data meeting reinforcement training screening conditions from the model weight data. And after determining the weight data of the target model meeting the screening conditions of the reinforcement training, determining a reinforcement detection model matched with the weight data of the target model, and carrying out prediction processing on each reinforcement product image sample according to the reinforcement detection model, thereby obtaining reinforcement prediction results corresponding to each reinforcement detection model, and determining reinforcement loss according to the reinforcement detection results and the second prediction results.

In one embodiment, reinforcement learning may be understood as an optimization algorithm in the training process, in which the environment is changed by the behavior policy of the Agent (i.e., action a) according to the current state s, and a reward parameter r is fed back. The reward parameter r is a score value used for measuring the current state, and the higher the score is, the more correct the Agent's behavior is, i.e. through reinforcement learning, the Agent can learn a behavior strategy that maximizes the reward function r.

Specifically, for the first detection model in the embodiment of the present application, the Agent is model weight data corresponding to each first detection model after the reinforcement training, including W ₁ ″,W ₂ ″,…,W _i ″,…,W _K The behavior strategy of the Agent can be understood as a first detection model specifically trained in the training process, and an enhanced image sample applied to the trained first detection model, and when determining the reward parameter r, fusion loss in the training process of the second detection model is required to be applied. Specifically, the inverse number of fusion losses in the training process of the second detection model is determined as a reward parameter r.

In the second training stage, after the supervision loss and the reinforcement loss are determined, the label loss in the training process of the second detection model is determined based on the second prediction result and the labeling label carried by the enhanced product image sample, and the difference loss is determined according to the reinforcement prediction result and the supervision data, so that the fusion loss can be determined according to the label loss, the supervision loss, the first weight corresponding to the supervision loss, the reinforcement loss, the second weight corresponding to the reinforcement loss, the difference loss and the third weight corresponding to the difference loss, and finally the reward parameter in the reinforcement training process can be determined based on the fusion loss.

Further, the maximum rewarding parameter is determined from the rewarding parameters, the target detection model corresponding to the maximum rewarding parameter is determined from the first detection models, and further knowledge distillation training is conducted on the second detection model according to the enhanced product image samples and the target detection models, so that a trained quality detection model is obtained.

3. Test phase

Specifically, in the testing stage, feature extraction and probability prediction are performed on each input test product image by using a trained quality detection model, corresponding predicted defect confidence coefficient data are obtained, the test defect confidence coefficient data and a preset defect confidence coefficient threshold value are compared, and the category to which the test product image belongs is obtained according to the comparison result.

The test product images are also pre-labeled, namely labeling labels are added to the test product images in advance, wherein the labeling labels comprise defect labels and normal labels, namely the test product images carrying the defect labels are divided into defect images, and the test product images carrying the normal labels are divided into normal images. And comparing the test defect confidence coefficient data with a preset defect confidence coefficient threshold, wherein the comparison result comprises that the test defect confidence coefficient data is larger than the preset defect confidence coefficient threshold and the test defect confidence coefficient data is smaller than the preset defect confidence coefficient threshold. When the test defect confidence coefficient data is larger than a preset defect confidence coefficient threshold value, the test result is that the test product image belongs to a defect product, and the test defect confidence coefficient data is smaller than the preset defect confidence coefficient threshold value, the test result is that the test product image belongs to a normal product.

Further, according to the label tag added in advance and the category to which the test product image determined by the prediction result belongs, the accuracy of the test result is judged, when the accuracy of the test result reaches a preset accuracy threshold, the quality detection model obtained by current training is indicated to accord with the actual application requirement, and the quality detection links of actual industrial products, electronic products and the like can be put into, so that the quality detection accuracy of the products and the classification accuracy of the products with defects are improved.

In the method for constructing the quality detection model, the first prediction result and the second prediction result are obtained by obtaining the enhanced product image samples and performing prediction processing on each enhanced product image sample according to the plurality of trained first detection models and the second detection models. Further, by determining the supervision data corresponding to each first prediction result, the supervision data is utilized to realize the supervision training of the second prediction result of the second detection model, so that the supervision loss in the process of training the second detection model can be determined according to the supervision data and the second prediction result. Meanwhile, according to each enhanced product image sample, strengthening training is carried out on each first detection model, strengthening loss in the strengthening training process is determined, and rewarding parameters are determined based on obtained supervision loss and strengthening loss, so that the most suitable target detection model is determined from each first detection model according to rewarding parameters, the purposes of further weighting and selecting each first detection model and determining the most suitable target detection model by utilizing strengthening training are achieved, knowledge distillation training is carried out on the second detection model according to the target detection model, the second detection model can learn knowledge in the first detection model, dependence on labels marked in the product image sample in advance and noise error data caused by the labels marked in advance are reduced, and therefore a quality detection model with higher model accuracy is obtained, and quality detection accuracy of products by utilizing the quality detection model is improved.

In one embodiment, a method for constructing a quality inspection model is provided, comprising the steps of,

and acquiring product image samples, and training an initial detection model according to each product image sample.

Determining the training loss in the training process of the initial detection model, and obtaining a trained first detection model if the training loss meets the training ending condition of the initial detection model.

And obtaining enhanced product image samples, and carrying out prediction processing on each enhanced product image sample according to a plurality of trained first detection models and second detection models to obtain a first prediction result corresponding to each first detection model and a second prediction result corresponding to the second detection model.

And carrying out initialization processing and normalization processing on the weighting parameters associated with each first detection model to obtain each processed weighting coefficient.

And based on each weighting coefficient, sequentially carrying out weighting processing on each first prediction result to obtain supervision data in the training process of the second detection model.

And determining the supervision loss in the training process of the second detection model according to the supervision data and the second prediction result.

And performing reinforcement training on each first detection model according to each reinforcement product image sample to obtain model weight data corresponding to each first detection model.

And determining target model weight data meeting the screening conditions of the intensive training from the model weight data.

And determining a strengthening detection model matched with the target weight data, and carrying out prediction processing on each strengthening product image sample according to the strengthening detection model to obtain strengthening prediction results corresponding to each strengthening detection model.

And determining the strengthening loss according to the strengthening detection result and the second prediction result.

And determining label loss in the training process of the second detection model based on the second prediction result and the labeling labels carried by the enhanced product image samples.

And determining the difference loss according to the reinforced prediction result and the supervision data.

Determining a fusion loss according to the label loss, the supervision loss, the first weight corresponding to the supervision loss, the reinforcement loss, the second weight corresponding to the reinforcement loss, the difference loss and the third weight corresponding to the difference loss.

Based on the fusion loss, a reward parameter in the reinforcement training process is determined.

And determining a maximum rewarding parameter from the rewarding parameters, and determining a target detection model corresponding to the maximum rewarding parameter from the first detection models.

And carrying out knowledge distillation training on the second detection model according to the image samples of the enhanced products and the target detection model to obtain a trained quality detection model.

It should be understood that, although the steps in the flowcharts related to the above embodiments are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.

Based on the same inventive concept, the embodiment of the application also provides an object quality detection device for realizing the above related object quality detection method and a quality detection model construction device for realizing the quality detection model construction method. The implementation scheme of the device for solving the problem is similar to the implementation scheme described in the above method, so the specific limitation in the embodiments of the device for constructing one or more object quality detection devices and quality detection models provided below can be referred to the above limitation of the object quality detection method and the quality detection model, which are not described herein.

In one embodiment, as shown in fig. 12, there is provided an object quality detecting apparatus including: a product image to be inspected acquisition module 1202, and a defect confidence data acquisition module 1204, wherein:

the to-be-detected product image obtaining module 1202 is configured to receive the object quality detection request, and obtain a to-be-detected product image corresponding to the object quality detection request.

The defect confidence coefficient data obtaining module 1204 is configured to perform quality detection processing on the product image to be detected according to the trained quality detection model, so as to obtain defect confidence coefficient data corresponding to the product image to be detected. The trained quality detection model is obtained by performing knowledge distillation training on the second detection model according to the target detection model; the target detection model is determined from each first detection model according to rewarding parameters, and the rewarding parameters are determined according to supervision loss in the training process of the second detection model and strengthening loss in the strengthening training process of each first detection model according to the image sample of the enhanced product; the supervision loss is determined according to supervision data corresponding to a first prediction result obtained by performing prediction processing on each enhanced product image sample according to a plurality of trained first detection models, and a second prediction result obtained by performing prediction processing on each enhanced product image sample according to a second detection model.

In the object quality detection device, the object quality detection request is received to obtain the image of the product to be detected corresponding to the object quality detection request, and then the quality detection processing is carried out on the image of the product to be detected according to the trained quality detection model to obtain the defect confidence coefficient data corresponding to the image of the product to be detected. The trained quality detection model is obtained by carrying out knowledge distillation training on the second detection model according to the target detection model, the target detection model is obtained by determining from each first detection model according to reward parameters, the purpose of further weighting and selecting each first detection model by utilizing reinforcement training is achieved, the most suitable target detection model is determined, the reward parameters are obtained by determining the supervision loss in the training process of the second detection model and the reinforcement loss in the reinforcement training process of each first detection model according to the reinforced product image sample, and the supervision loss is obtained by determining according to the supervision data corresponding to the first prediction result and the second prediction result. The second detection model is subjected to knowledge distillation training according to the target model, so that the second detection model can learn knowledge in the first detection model, dependence on a pre-labeled label in a product image sample and noise error data brought by the pre-labeled label are reduced, a quality detection model with higher model accuracy is obtained, and quality detection accuracy of products by using the quality detection model is improved.

In one embodiment, there is provided an object quality detection apparatus, further comprising a quality detection model training module, comprising:

the prediction result obtaining module is used for obtaining the enhanced product image samples, and carrying out prediction processing on the enhanced product image samples according to the plurality of trained first detection models and the second detection models to obtain first prediction results corresponding to the first detection models and second prediction results corresponding to the second detection models;

a reward parameter determination module for determining a reward parameter based on the supervised and enhanced losses;

and the quality detection model obtaining module is used for determining a target detection model from the first detection models according to the rewarding parameters, and carrying out knowledge distillation training on the second detection model according to the target detection model to obtain a trained quality detection model.

In one embodiment, the reinforcement loss determination module is further configured to:

performing reinforcement training on each first detection model according to each reinforcement product image sample to obtain model weight data corresponding to each first detection model; determining target model weight data meeting the screening conditions of the intensive training from the model weight data; determining a strengthening detection model matched with the target weight data, and carrying out prediction processing on each strengthening product image sample according to the strengthening detection model to obtain strengthening prediction results corresponding to each strengthening detection model; and determining the strengthening loss according to the strengthening detection result and the second prediction result.

In one embodiment, the reward parameter determination module is further configured to:

determining label loss in the training process of the second detection model based on the second prediction result and the labeling labels carried by the enhanced product image samples; determining a difference loss according to the reinforced prediction result and the supervision data; determining a fusion loss according to the label loss, the supervision loss, the first weight corresponding to the supervision loss, the strengthening loss, the second weight corresponding to the strengthening loss, the difference loss and the third weight corresponding to the difference loss; based on the fusion loss, a reward parameter in the reinforcement training process is determined.

In one embodiment, the supervising loss determination module is further configured to:

carrying out initialization processing and normalization processing on weighting parameters associated with each first detection model to obtain each processed weighting coefficient; and based on each weighting coefficient, sequentially carrying out weighting processing on each first prediction result to obtain supervision data in the training process of the second detection model.

In one embodiment, the quality detection model obtaining module is further configured to:

determining a maximum rewarding parameter from all rewarding parameters, and determining a target detection model corresponding to the maximum rewarding parameter from all first detection models; and carrying out knowledge distillation training on the second detection model according to the image samples of the enhanced products and the target detection model to obtain a trained quality detection model.

In one embodiment, there is provided an object quality detection apparatus, further including a first detection model obtaining module configured to:

In one embodiment, as shown in fig. 13, there is provided a device for constructing a quality inspection model, including: a prediction result acquisition module 1302, a supervised loss determination module 1304, an enhanced loss determination module 1306, a reward parameter determination module 1308, and a quality detection model acquisition module 1310, wherein:

the prediction result obtaining module 1302 is configured to obtain enhanced product image samples, and perform prediction processing on each enhanced product image sample according to the plurality of trained first detection models and the second detection models, so as to obtain a first prediction result corresponding to each first detection model and a second prediction result corresponding to the second detection model.

The supervision loss determination module 1304 is configured to determine supervision data corresponding to each first prediction result, and determine a supervision loss during training of the second detection model according to the supervision data and the second prediction result.

The reinforcement loss determination module 1306 is configured to perform reinforcement training on each first detection model according to each reinforcement product image sample, and determine reinforcement loss in the reinforcement training process.

The bonus parameter determination module 1308 is configured to determine bonus parameters based on the supervising and strengthening losses.

And the quality detection model obtaining module 1310 is configured to determine a target detection model from the first detection models according to the reward parameter, and perform knowledge distillation training on the second detection model according to the target detection model, so as to obtain a trained quality detection model.

In the device for constructing the quality detection model, the first prediction result and the second prediction result are obtained by obtaining the enhanced product image samples and performing prediction processing on each enhanced product image sample according to the plurality of trained first detection models and the second detection models. Further, by determining the supervision data corresponding to each first prediction result, the supervision data is utilized to realize the supervision training of the second prediction result of the second detection model, so that the supervision loss in the process of training the second detection model can be determined according to the supervision data and the second prediction result. Meanwhile, according to each enhanced product image sample, strengthening training is carried out on each first detection model, strengthening loss in the strengthening training process is determined, and rewarding parameters are determined based on obtained supervision loss and strengthening loss, so that the most suitable target detection model is determined from each first detection model according to rewarding parameters, the purposes of further weighting and selecting each first detection model and determining the most suitable target detection model by utilizing strengthening training are achieved, knowledge distillation training is carried out on the second detection model according to the target detection model, the second detection model can learn knowledge in the first detection model, dependence on labels marked in the product image sample in advance and noise error data caused by the labels marked in advance are reduced, and therefore a quality detection model with higher model accuracy is obtained, and quality detection accuracy of products by utilizing the quality detection model is improved.

The above-described respective modules in the object quality detection apparatus, the quality detection model constructing apparatus may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided, which may be a server, and the internal structure of which may be as shown in fig. 14. The computer device includes a processor, a memory, an Input/Output interface (I/O) and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer equipment is used for storing data such as to-be-detected product images, quality detection models, defect confidence data, target detection models, first detection models, second detection models, rewarding parameters, supervision loss, strengthening loss, first prediction results, second prediction results, supervision data, enhanced product image samples and the like. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements a method of object quality detection, a method of construction of a quality detection model.

It will be appreciated by those skilled in the art that the structure shown in fig. 14 is merely a block diagram of a portion of the structure associated with the present inventive arrangements and is not limiting of the computer device to which the present inventive arrangements are applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.

In an embodiment, there is also provided a computer device comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the method embodiments described above when the computer program is executed.

In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when executed by a processor, carries out the steps of the method embodiments described above.

In an embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the steps of the method embodiments described above.

It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data need to comply with the related laws and regulations and standards of the related country and region.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the embodiments provided herein may include at least one of a relational database and a non-relational database. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processor referred to in the embodiments provided in the present application may be a general-purpose processor, a central processing unit, a graphics processor, a digital signal processor, a programmable logic unit, a data processing logic unit based on quantum computing, or the like, but is not limited thereto.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The foregoing examples illustrate only a few embodiments of the application and are described in detail herein without thereby limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of the application should be assessed as that of the appended claims.

Claims

1. A method of object quality detection, the method comprising:

2. The method of claim 1, wherein the means for obtaining a trained quality inspection model comprises:

3. The method of claim 2, wherein the training each of the first detection models based on each of the enhanced product image samples and determining a loss of reinforcement for the training process comprises:

performing reinforcement training on each first detection model according to each reinforcement product image sample to obtain model weight data corresponding to each first detection model after reinforcement training;

determining target model weight data meeting the screening conditions of the intensive training from the model weight data;

Determining a strengthening detection model matched with the target weight data, and carrying out prediction processing on each strengthening product image sample according to the strengthening detection model to obtain a strengthening prediction result corresponding to each strengthening detection model;

and determining strengthening loss according to the strengthening detection result and the second prediction result.

4. A method according to claim 3, wherein said determining a reward parameter based on said supervision loss and said reinforcement loss comprises:

determining label loss in the training process of the second detection model based on the second prediction result and the labeling labels carried by the enhanced product image samples;

determining a difference loss according to the reinforcement prediction result and the supervision data;

determining a fusion loss according to the tag loss, the supervision loss, the first weight corresponding to the supervision loss, the strengthening loss, the second weight corresponding to the strengthening loss, the difference loss and the third weight corresponding to the difference loss;

and determining a reward parameter in the strengthening training process based on the fusion loss.

5. The method of any one of claims 2 to 4, wherein determining the supervision data corresponding to each of the first predictors comprises:

Carrying out initialization processing and normalization processing on the weighting parameters associated with each first detection model to obtain each processed weighting coefficient;

and based on the weighting coefficients, sequentially carrying out weighting processing on the first prediction results to obtain supervision data in the training process of the second detection model.

6. The method according to any one of claims 2 to 4, wherein determining a target detection model from each of the first detection models according to the reward parameter, and performing knowledge distillation training on the second detection model according to the target detection model, to obtain a trained quality detection model, includes:

determining a maximum rewarding parameter from the rewarding parameters, and determining a target detection model corresponding to the maximum rewarding parameter from the first detection models;

and carrying out knowledge distillation training on the second detection model according to each enhanced product image sample and the target detection model to obtain a trained quality detection model.

7. The method according to any one of claims 1 to 4, wherein the means for obtaining the trained first detection model comprises:

Acquiring product image samples, and training an initial detection model according to each product image sample;

determining a training loss in the initial detection model training process;

and if the training loss meets the training ending condition of the initial detection model, obtaining a trained first detection model.

8. A method of constructing a quality inspection model, the method comprising:

9. An object quality detection apparatus, the apparatus comprising:

10. A device for constructing a quality inspection model, the device comprising:

11. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 8 when the computer program is executed.

12. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 8.

13. A computer program product comprising a computer program, characterized in that the computer program, when executed by a processor, implements the steps of the method of any one of claims 1 to 8.