CN117992898B - Training method of anomaly detection model, object anomaly detection method and device - Google Patents

Training method of anomaly detection model, object anomaly detection method and device Download PDF

Info

Publication number
CN117992898B
CN117992898B CN202410405801.XA CN202410405801A CN117992898B CN 117992898 B CN117992898 B CN 117992898B CN 202410405801 A CN202410405801 A CN 202410405801A CN 117992898 B CN117992898 B CN 117992898B
Authority
CN
China
Prior art keywords
sample
image
target object
text
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410405801.XA
Other languages
Chinese (zh)
Other versions
CN117992898A (en
Inventor
汪铖杰
吴运声
马利庄
樊珂
甘振业
张江宁
高斌斌
彭瑾龙
刘永
吴永坚
黄小明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202410405801.XA priority Critical patent/CN117992898B/en
Publication of CN117992898A publication Critical patent/CN117992898A/en
Application granted granted Critical
Publication of CN117992898B publication Critical patent/CN117992898B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Image Analysis (AREA)

Abstract

The application provides a training method of an anomaly detection model, an object anomaly detection method and an object anomaly detection device, and belongs to the technical field of artificial intelligence. The method comprises the following steps: acquiring a plurality of groups of sample pairs based on a plurality of first sample images, wherein each group of sample pairs comprises a sample block and a sample text, the sample block comprises a part of a target object, and the sample text is used for describing the abnormal situation of the part in the sample block; for each group of sample pairs, respectively extracting features of a sample block and a sample text in the sample pairs through an anomaly detection model to obtain image features of the sample block and text features of the sample text, determining similarity between the image features and the text features, and performing anomaly detection on a part of a target object in the input block through the anomaly detection model; based on the respective similarity and the preset similarity of the plurality of groups of samples, the anomaly detection model is iteratively trained. The abnormality detection is performed based on the abnormality detection model obtained by training the method, so that the accuracy of abnormality detection can be improved.

Description

Training method of anomaly detection model, object anomaly detection method and device
Technical Field
The application relates to the technical field of artificial intelligence, in particular to a training method of an anomaly detection model, an object anomaly detection method and an object anomaly detection device.
Background
Industrial anomaly detection is important in practical production. When the abnormal detection is carried out on the industrially produced articles, the distribution characteristics of the normal articles are generally established, and then whether the articles are normal or not is judged by judging the similarity between the characteristics of the articles to be detected and the distribution characteristics. Before the distribution characteristics of the normal articles are established, a large number of normal articles need to be collected, namely abnormal articles in the large number of articles need to be detected and then removed.
In the related art, abnormal articles are detected by a clustering method. The basic assumption of the method is that the occupation ratio of the abnormal articles in actual production is low, so that the method is used for clustering the articles into clusters based on the similarity, and the abnormal articles are clustered into clusters with small isolated scale after the clustering is completed, so that the abnormal articles are very likely to be abnormal articles, and the abnormal articles are removed. However, the proportion of abnormal articles in actual production may not be low, which results in that some abnormal articles are clustered into a larger cluster after clustering, and confusion with normal articles is caused, so that the detection method is inaccurate.
Disclosure of Invention
The embodiment of the application provides a training method of an anomaly detection model, an object anomaly detection method and an object anomaly detection device, and anomaly detection is carried out based on the anomaly detection model obtained by training by the method, so that the accuracy of anomaly detection can be improved. The technical scheme is as follows.
In one aspect, a training method of an anomaly detection model is provided, the method comprising:
acquiring a plurality of groups of sample pairs based on a plurality of first sample images, wherein the plurality of first sample images comprise target objects, each group of sample pairs comprises a sample block and a sample text, the sample block comprises a local part of the target objects, and the sample text is used for describing abnormal conditions of the local part in the sample block;
For each group of sample pairs, respectively extracting features of a sample block and a sample text in the sample pairs through an anomaly detection model to obtain image features of the sample block and text features of the sample text, and determining similarity between the image features and the text features, wherein the anomaly detection model is used for carrying out anomaly detection on part of the target object in the input block;
And iteratively training the anomaly detection model based on the respective similarity and the preset similarity of the plurality of groups of samples.
In another aspect, there is provided an object abnormality detection method including:
acquiring a plurality of tiles of an image, the image comprising a target object, each tile comprising a portion of the target object;
For each image block, determining a target text corresponding to the image block through an anomaly detection model, wherein the anomaly detection model is obtained through the training method, and the target text is used for describing the local anomaly condition of the target object in the image block;
And determining the abnormal information of the target object in the image based on the target texts corresponding to the multiple image blocks, wherein the abnormal information is used for describing the abnormal condition of the target object.
In another aspect, there is provided a training apparatus of an abnormality detection model, the apparatus including:
An acquisition module, configured to acquire a plurality of groups of sample pairs based on a plurality of first sample images, where the plurality of first sample images each include a target object, each group of sample pairs includes a sample block and a sample text, the sample block includes a part of the target object, and the sample text is used to describe an abnormal condition of the part in the sample block;
The extraction module is used for extracting characteristics of a sample block and a sample text in each group of sample pairs through an abnormality detection model, so as to obtain image characteristics of the sample block and text characteristics of the sample text, and determining similarity between the image characteristics and the text characteristics, wherein the abnormality detection model is used for carrying out abnormality detection on part of the target object in the input block;
and the training module is used for iteratively training the abnormal detection model based on the respective similarity and the preset similarity of the plurality of groups of samples.
In some embodiments, the acquiring module is configured to:
For each first sample image, respectively dividing the first sample image based on sliding windows with a plurality of sizes to obtain a plurality of block sets with respective sizes, wherein a plurality of sample blocks included in each block set with the size are all of the sizes;
and obtaining the plurality of groups of sample pairs based on the sample blocks in the block sets of the plurality of first sample images and the sample text corresponding to each sample block.
In some embodiments, each sample tile corresponds to a plurality of sample text that describe local anomalies in the sample tile in different text, respectively; the extraction module is used for:
and respectively extracting features of a plurality of sample texts in the sample pair through the anomaly detection model to obtain initial text features respectively corresponding to the plurality of sample texts, and determining the average value of the plurality of initial text features to obtain the text features.
In some embodiments, the apparatus further comprises:
The filling module is used for filling a plurality of text templates based on the local part in each sample block and the local abnormal information in the sample block to obtain a plurality of sample texts, the abnormal information is used for describing the local abnormal condition, and the text templates are different.
In some embodiments, the acquiring module is further configured to acquire a plurality of second sample images, each of the plurality of second sample images including the target object;
the device also comprises a segmentation module, which is used for segmenting the target object from the second sample image for each second sample image to obtain the first sample image.
In some embodiments, the acquiring module is further configured to acquire a plurality of third sample images, each of the plurality of third sample images including the target object;
The device further comprises a calibration module, which is used for calibrating the position of the target object in the third sample image based on the position of the target object in the template image for each third sample image, so as to obtain the first sample image, wherein the position of the target object in the first sample image is matched with the position of the target object in the template image.
In another aspect, there is provided an object abnormality detection apparatus including:
An acquisition module to acquire a plurality of tiles of an image, the image comprising a target object, each tile comprising a portion of the target object;
The determining module is used for determining a target text corresponding to each block according to an abnormality detection model, wherein the abnormality detection model is obtained through the training method, and the target text is used for describing local abnormal conditions of the target object in the block;
the determining module is further configured to determine, based on target texts corresponding to the multiple tiles, anomaly information of a target object in the image, where the anomaly information is used to describe an anomaly condition of the target object.
In some embodiments, the determining module is configured to:
Determining that an abnormality exists in the target object in the image and determining that the abnormality exists in the local part of the target object under the condition that the target text corresponding to at least one of the plurality of tiles indicates that the abnormality exists in the local part of the target object;
and under the condition that the target text corresponding to each of the plurality of tiles indicates that no abnormality exists in the part of the target object, determining that no abnormality exists in the target object in the image.
In some embodiments, the determining module is configured to:
For each image block, extracting image features of the image block through the anomaly detection model, determining similarity between the image features and a plurality of preset text features, and determining target text features with the similarity meeting preset requirements from the preset text features, wherein the preset text features respectively correspond to preset texts, and the target texts are preset texts corresponding to the target text features.
In some embodiments, the acquiring module is configured to:
And respectively dividing the image based on sliding windows with a plurality of sizes to obtain block sets with respective sizes, wherein the block sets with each size comprise a plurality of blocks with the sizes.
In some embodiments, each tile includes a plurality of pixel points, and the determining module is configured to:
for each tile of each size, assigning a similarity between an image feature of the tile and a corresponding target text feature to a plurality of pixel points in the tile if the target text corresponding to the tile indicates that there is an abnormality in a portion of the target object in the tile;
for each pixel point, obtaining an abnormal value of the pixel point based on the similarity of the pixel point under the multiple sizes, wherein the abnormal value is used for indicating the probability of abnormality of the pixel point;
And determining the abnormal information of the target object in the image based on the abnormal value of each pixel point in the image, wherein the abnormal information of the target object comprises at least one of the position and the abnormal area of the pixel point where the target object is abnormal.
In some embodiments, the plurality of images are multiple, and the determining module is further configured to determine, based on abnormality information of each of the target objects in the plurality of images, a plurality of target images in the plurality of images, where no abnormality exists in the target object in the target images; determining non-abnormal characteristics of the target object based on the plurality of target images, wherein the non-abnormal characteristics are characteristics of the target object without abnormality;
The apparatus further includes a first detection module for anomaly detection of an image including a target object based on the non-anomaly feature.
In some embodiments, the image is a plurality of images, and the determining module is further configured to determine a plurality of target tiles in the plurality of tiles based on target texts corresponding to the plurality of tiles, where no anomaly exists in a part of the target object in the target tiles; determining a local non-abnormal characteristic of the target object based on a plurality of target tiles of a plurality of images, wherein the non-abnormal characteristic is a characteristic of the target object with no abnormal part;
The apparatus also includes a second detection module to detect anomalies in a tile that includes a portion of the target object based on the non-anomalies.
In another aspect, a computer device is provided, where the computer device includes a processor and a memory, where the memory is configured to store at least one program, where the at least one program is loaded and executed by the processor to implement a training method of an anomaly detection model or an object anomaly detection method in an embodiment of the present application.
In another aspect, a computer readable storage medium is provided, where at least one section of program is stored, where the at least one section of program is loaded and executed by a processor to implement a training method of an anomaly detection model or an anomaly detection method of an object in an embodiment of the present application.
In another aspect, a computer program product is provided, the computer program product including at least one program stored in a computer-readable storage medium, the at least one program being read from the computer-readable storage medium by a processor of a computer device, the processor executing the at least one program, so that the computer device performs the training method of the abnormality detection model or the object abnormality detection method according to any one of the above-described implementations.
The embodiment of the application provides a training method of an abnormality detection model, which is used for training the abnormality detection model based on a sample block and sample text for describing local abnormality of a target object in the sample block. The image features and the text features are extracted by the anomaly detection model, and the image features and the text features in pairs have higher similarity, so that the anomaly detection model is trained based on the similarity between the image features and the text features and the preset similarity, so that the anomaly detection model learns the rule of higher similarity between the image features and the text features in pairs, namely the anomaly detection model trained by the method can accurately extract the features. Thus, through the anomaly detection model, for any partial image block comprising the target object, the text characteristic with high similarity can be determined based on the extracted image characteristic, and the text corresponding to the text characteristic is the text for describing the partial anomaly condition of the target object in the image block, so that the anomaly detection is performed based on the anomaly detection model trained by the method, and the accuracy of anomaly detection can be improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic illustration of an implementation environment provided by an embodiment of the present application;
FIG. 2 is a flowchart of a training method of an anomaly detection model according to an embodiment of the present application;
FIG. 3 is a flowchart of another training method of an anomaly detection model provided by an embodiment of the present application;
FIG. 4 is a schematic diagram of an image calibration provided by an embodiment of the present application;
FIG. 5 is a flowchart of an object anomaly detection method according to an embodiment of the present application;
FIG. 6 is a flowchart of another object anomaly detection method provided by an embodiment of the present application;
FIG. 7 is a schematic flow chart of industrial anomaly detection according to an embodiment of the present application;
FIG. 8 is a flow chart of a training phase in industrial inspection provided by an embodiment of the present application;
FIG. 9 is a flow chart of a test phase in an industrial inspection provided by an embodiment of the present application;
FIG. 10 is a block diagram of a training apparatus for an anomaly detection model provided by an embodiment of the present application;
FIG. 11 is a block diagram of an object anomaly detection device provided by an embodiment of the present application;
Fig. 12 is a block diagram of a terminal according to an embodiment of the present application;
fig. 13 is a block diagram of a server according to an embodiment of the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail with reference to the accompanying drawings.
The terms "first," "second," and the like in this disclosure are used for distinguishing between similar elements or items having substantially the same function and function, and it should be understood that there is no logical or chronological dependency between the terms "first," "second," and "n," and that there is no limitation on the amount and order of execution. The term "at least one" in the present application means one or more, and the meaning of "a plurality of" means two or more.
It should be noted that, the information (including but not limited to user equipment information, user personal information, etc.), data (including but not limited to data for analysis, stored data, presented data, etc.), and signals related to the present application are all authorized by the user or are fully authorized by the parties, and the collection, use, and processing of the related data is required to comply with the relevant laws and regulations and standards of the relevant countries and regions. For example, the sample images referred to in the present application are all acquired with sufficient authorization.
The following describes the terms of art to which the present application relates:
Artificial intelligence (ARTIFICIAL INTELLIGENCE, AI) is the theory, method, technique, and application system that simulates, extends, and extends human intelligence using a digital computer or a machine controlled by a digital computer, perceives the environment, obtains knowledge, and uses the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision. The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include, for example, sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, pre-training model technologies, operation/interaction systems, mechatronics, and the like. The pre-training model is also called a large model and a basic model, and can be widely applied to all large-direction downstream tasks of artificial intelligence after fine adjustment. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.
Machine learning (MACHINE LEARNING, ML) is a multi-domain interdisciplinary, involving multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, and the like. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, induction learning, teaching learning, and the like. The pre-training model is the latest development result of deep learning, and integrates the technology.
The Pre-training model (Pre-training model), also called a matrix model and a large model, refers to a deep neural network (Deep Neural Network, DNN) with large parameters, trains the model on massive unlabeled data, utilizes the function approximation capability of the large-Parameter DNN (Deep Neural Network, the deep neural network) to enable PTM (Pre-training model) to extract common features on the data, and is suitable for downstream tasks through fine Tuning (fine Tuning), parameter efficient fine Tuning (PEFT, parameter-EFFICIENT FINE-Tuning), prompt-Tuning and other technologies. Therefore, the pre-training model can achieve ideal effects in a small sample (Few-shot) or Zero sample (Zero-shot) scene. PTM can be classified according to the data modality of processing into a language model (ELMO, BERT, GPT), a visual model (swin-transducer, viT, V-MOE), a speech model (VALL-E), a multi-modal model (ViBERT, CLIP, flamingo, gato), etc., wherein a multi-modal model refers to a model that builds a representation of two or more data modality features. The pre-trained model is an important tool for outputting Artificial Intelligence Generation Content (AIGC), and can also be used as a general interface for connecting a plurality of specific task models.
The following describes an implementation environment according to the present application:
the training method of the anomaly detection model provided by the embodiment of the application can be executed by computer equipment, and the computer equipment can be provided as a server or a terminal. An implementation environment schematic diagram of the training method of the anomaly detection model provided by the embodiment of the application is described below.
Referring to fig. 1, fig. 1 is a schematic diagram of an implementation environment of a training method of an anomaly detection model according to an embodiment of the present application, where the implementation environment includes a terminal 101 and a server 102. The terminal 101 and the server 102 can be directly or indirectly connected through wired or wireless communication, and the present application is not limited herein. In some embodiments, the server 102 is configured to train an anomaly detection model, where the trained anomaly detection model is used to determine text corresponding to a tile that is used to describe anomalies in portions of a target object that the tile includes. The terminal 101 has installed thereon a target application for abnormality detection of an object. In some embodiments, the trained anomaly detection model is embedded on the terminal 101, and the terminal 101 performs anomaly detection of the object through the anomaly detection model. In other embodiments, the terminal 101 performs abnormality detection of the object through an abnormality detection model on the server 102.
In some embodiments, the terminal 101 may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart voice interaction device, a smart home appliance, a vehicle-mounted terminal, an aircraft, a VR (Virtual Reality) device, an AR (Augmented Reality) device, and the like. In some embodiments, the server 102 is a stand-alone server, a server cluster or a distributed system formed by a plurality of servers, and can also be a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDN (Content Delivery Network content delivery network), and basic cloud computing services such as big data and artificial intelligence platforms. In some embodiments, the server 102 primarily takes on computing work and the terminal 101 takes on secondary computing work; or server 102 takes on secondary computing services and terminal 101 takes on primary computing work; or the server 102 and the terminal 101 perform cooperative computing by adopting a distributed computing architecture.
Referring to fig. 2, fig. 2 is a flowchart of a training method of an anomaly detection model according to an embodiment of the present application, and the method includes the following steps.
201. The computer device obtains a plurality of groups of sample pairs based on a plurality of first sample images, each of the plurality of first sample images including a target object, each group of sample pairs including a sample tile including a portion of the target object and sample text describing an anomaly of the portion in the sample tile.
In the embodiment of the application, the target object can be any object to be detected for abnormality, for example, the target object can be various parts produced in industry. The plurality of sample tiles are obtained by dividing the first sample image, and one first sample image can be divided into the plurality of sample tiles.
The plurality of first sample images all comprise target objects, the states of the target objects in the first sample images can be the same or different, and the different states comprise different sizes, different shooting angles, different brightness and the like. The plurality of first sample images includes an image in which the target object is not abnormal and an image in which the target object is abnormal.
In the embodiment of the present application, an abnormal situation refers to that there is an abnormality locally or there is no abnormality. If the target object is a part, the local abnormality refers to a local defect of the part, such as unsmooth surface, scratch on the surface, insufficient size, and the like, which are all abnormal conditions. Further, in the case of anomalies locally present in the sample tile, the sample text is also used to describe the categories of anomalies, such as surface roughness, surface scratches, undersize, etc.
For each set of sample pairs, in the event that there is no abnormality in a portion of the target object included in the sample tile in the sample pair, then the sample text in the sample pair describes that there is no abnormality in the portion of the sample tile. In the case that there is an abnormality in a part of the target object included in the sample tile in the sample pair, the sample text in the sample pair describes the abnormality in the part of the sample tile, or the sample text in the sample pair describes the kind of abnormality in the part of the sample tile.
It should be noted that, since the target object has a plurality of different parts, such as a head-tail part, a middle part, a part head, a part tail, and the like, optionally, the sample text further describes the names of the parts included in the sample block, so that the trained anomaly detection model can further determine which part is the part included in the block.
202. For each group of sample pairs, the computer equipment respectively performs feature extraction on sample blocks and sample texts in the sample pairs through an anomaly detection model to obtain image features of the sample blocks and text features of the sample texts, determines similarity between the image features and the text features, and the anomaly detection model is used for performing anomaly detection on parts of target objects in the input blocks.
In an embodiment of the present application, the anomaly detection model includes an image encoder and a text encoder. The image encoder is used for encoding the image to obtain image characteristics. The text encoder is used for encoding the text to obtain text characteristics.
In an embodiment of the application, the image features are vectors or matrices for representing the sample tiles. Text features are vectors or matrices that are used to represent sample text. The similarity may be cosine similarity.
203. The computer device iteratively trains the anomaly detection model based on respective similarities and preset similarities of the plurality of sets of samples.
In the embodiment of the application, the preset similarity can be set and changed according to the requirement. The training target of the anomaly detection model is to enable the feature similarity of the sample block and the sample text in the sample pair to reach the preset similarity. In the embodiment of the application, the computer equipment determines the loss value based on the similarity of the sample pair and the preset similarity, and iteratively adjusts the model parameters of the abnormal detection model based on the loss value. Model parameters of the anomaly detection model include parameters of the image encoder and parameters of the text encoder.
In the embodiment of the application, the computer equipment carries out iterative training on the abnormality detection model based on the respective similarity and the preset similarity of a plurality of groups of samples until the preset requirement is met. The reaching of the preset requirement may be that the loss value reaches convergence, or that the loss value reaches a preset threshold, or that the iteration number reaches a preset number of times, or that the similarity reaches a preset similarity, which is not limited herein.
In the embodiment of the application, the trained abnormality detection model is used for extracting the image characteristics of an input image block, determining target text characteristics with similarity meeting preset requirements between the image characteristics from a plurality of preset text characteristics, taking texts corresponding to the target text characteristics as texts corresponding to the image block, and the texts are used for describing the local abnormality of target objects in the image block. Optionally, the feature extraction is performed on the plurality of preset texts by using an anomaly detection model obtained through training on the plurality of preset text features.
The embodiment of the application provides a training method of an anomaly detection model, which is used for training the anomaly detection model based on a sample block and a sample text for describing the local anomaly condition of a target object in the sample block, and because the anomaly detection model extracts image features and text features, and the paired image features and text features have higher similarity, the anomaly detection model is trained based on the similarity between the image features and the text features and preset similarity, so that the anomaly detection model learns the rule with higher similarity between the paired image features and the text features, and the trained anomaly detection model can accurately extract the features, so that for any block comprising the local of the target object, the text features with high similarity can be determined based on the extracted image features, and the text corresponding to the text features is the text for describing the local anomaly condition of the target object in the block, therefore, the anomaly detection model obtained based on the training method has high accuracy of anomaly detection on the local of the target object in the block. In addition, the method is trained based on the image blocks, the image blocks comprise the local parts of the target object, and further, the abnormality detection model can be used for specifically detecting which local parts of the target object have the abnormality, namely, the abnormality detection accuracy is improved. Therefore, the abnormality detection is performed based on the abnormality detection model trained by the method, and the accuracy and precision of the abnormality detection can be improved.
Fig. 2 is a basic flow of the training method of the anomaly detection model, and the training method of the anomaly detection model is further described below based on fig. 3. Referring to fig. 3, fig. 3 is a flowchart of a training method of an anomaly detection model according to an embodiment of the present application, and the method includes the following steps.
301. The computer device acquires a plurality of second sample images, each of the plurality of second sample images including a target object, and for each of the second sample images, segments the target object from the second sample images to obtain a plurality of first sample images, the first sample images including the target object.
In the embodiment of the application, the second sample image not only comprises the target object but also comprises a background, and other interference elements are included in the background. Therefore, the target object is segmented from the second sample image, the distinction between the target object and the background in the second sample image is realized, and the background in the second sample image is removed, so that the second sample image only retains the target object, and the first sample image is obtained.
Optionally, the computer device performs image segmentation by means of a SAM (SEGMENT ANYTHING Model, segment arbitrary Model) Model. The computer equipment inputs the second sample image into the SAM model, and the target object in the second sample image is segmented through the SAM model to distinguish the foreground from the background, and then the background is removed, so that noise interference caused by the background is removed. Further, the computer device sets the pixel value of the segmented background to 0, that is, sets the background to black, and accordingly, the target object in the first sample image is the foreground, and the background other than the target object in the first sample image is black.
In this embodiment, the target object is segmented from the second sample image, so as to remove the background in the second sample image, so that interference caused by the background is avoided, and further, subsequent image processing is facilitated.
In some embodiments, the computer device also calibrates the position of the target object in the sample image. Wherein the computer device obtains a plurality of third sample images, each of the plurality of third sample images including a target object; and for each third sample image, calibrating the position of the target object in the third sample image based on the position of the target object in the template image to obtain a first sample image, wherein the position of the target object in the first sample image is matched with the position of the target object in the template image.
The matching of the position of the target object in the first sample image and the position of the target object in the template image means that the two positions are identical or the coordinate difference between the two positions is within a preset range.
Optionally, the computer device inputs the third sample image into an image calibration module, and the image calibration module calibrates the position of the target object in the third sample image by the position of the target object in the template image, so as to obtain the first sample image. The image calibration module is used for calibrating an input image so as to accurately extract the characteristics of the area where the target object is located in the image.
Optionally, the image calibration module performs calibration by using a minimum error iterative method, which defines parameters to be estimated by constructing an error function, and optimizes the parameters based on the current estimated value by using an iterative algorithm to gradually reduce the error function. The image calibration module firstly carries out edge detection on an input image and a target object in a template image, and obtains respective contour maps of the input image and the template image. The contour map is then subjected to a non-maximum suppression (NMS) operation to convert the contour map into a two-dimensional set of contour points. Or carrying out partition processing on the contour map to obtain a two-dimensional contour point set of each region in the image so as to carry out partition processing on the image. The average error of the two-dimensional contour point set between the input image and the template image is the error function. The error function is shown in the following formula (1).
(1);
Wherein,Representing an error function, i.e. average error,/>Representing the number of contour points in the template image,/>Is the input image of the first/>Coordinates of the contour points,/>Representing the/>, in the template imageCoordinates of the contour points,/>Representing accumulation.
After the error function is set, an iterative optimization method is adopted to calibrate the input image. First, the currently estimated transformation matrix, i.e. the initial transformation matrix, is applied to the set of contour points to be registered. And then, establishing a position matching relation between the input image and the contour point set of the template image by using a nearest neighbor searching method, and eliminating matching pairs with overlarge errors, namely eliminating contour points with difficult position matching. Finally, the transformation matrix is estimated again by using a RANSAC (Random Sample Consensus, random sample consensus algorithm) algorithm, and returns to the first step until the error function is converged to obtain a target transformation matrix, and the coordinates of each point in the input image are transformed based on the target transformation matrix to obtain a calibrated image.
In other embodiments, the computer device may also employ an unsupervised image alignment study with a photometric loss function to perform image calibration, not specifically limited herein.
In this embodiment, the position of the target object in the sample image is calibrated, so that the area where the target object is located in the sample image can be conveniently extracted for processing, for example, the target object can be conveniently segmented from the image, the area where the target object is located in the image can be conveniently determined, and the characteristics of the target object in the image can be conveniently extracted.
For example, referring to fig. 4, fig. 4 is a schematic diagram of an image calibration according to an embodiment of the present application. The method comprises the steps of determining a target transformation matrix between a template image and an input image, and processing the input image based on the target transformation matrix to minimize the position difference between the input image and a target object in the template image, so as to obtain a calibrated image. The orientation and angle of the target object in the calibrated image are the same as the orientation and angle of the target object in the template image.
In some embodiments, the computer device calibrates the position of the target object prior to segmenting the second sample image, thereby facilitating image segmentation. Or after the computer equipment divides the second sample image, calibrating the position of the target object, and at the moment, the position of the target object is calibrated conveniently because the background interference is eliminated.
302. For each first sample image, the computer equipment respectively segments the first sample image based on sliding windows with a plurality of sizes to obtain a plurality of block sets with respective sizes, wherein a plurality of sample blocks included in each block set with the size are all with the size, and the sample blocks include parts of a target object.
In the embodiment of the present application, the size of each sliding window may be set and changed according to needs, which is not particularly limited herein. For example, a first sample imageResolution of/>The given encoder f performs image segmentation. Sliding windows of multiple sizes are denoted/>Each sliding window/>Is a binary mask. For representing the local activity of the k x k kernel around pixel point (i, j). i. j, h, w, k are non-negative integers. The sample tile corresponding to each sliding window is denoted/>Segmentation is performed according to sliding windows of multiple sizes, and the effective area of the first sample image, i.e. one sample block, is defined as/>,/>Representing the element product. Where the kernel size k corresponds to the number of surrounding contexts for each position in the computed image, which controls the balance between local detail and global information in the segmentation.
In some embodiments, the image encoder in the anomaly detection model is capable of automatically segmenting the input image. In other embodiments, the anomaly detection model is not capable of segmenting the input image, and the computer device segments the image before inputting the anomaly detection model.
In some embodiments, the computer device performs image segmentation by ViT patch (Vision Transformer, visual converter, tile) method, which includes sliding windows of multiple sizes. E.g. a 2 x 2 small size sliding window corresponding to a 32 x 32 resolution image, and a3 x 3 medium size sliding window corresponding to a 48 x 48 resolution image. In other embodiments, the computer device performs image segmentation by the method of ViT token (tiles) to capture the context of the image.
In the embodiment of the application, the first sample image is respectively segmented through the sliding windows with a plurality of sizes, so that each first sample image corresponds to a plurality of sample image blocks with a plurality of sizes, the sample image blocks with different sizes comprise the part of the target object from small to large, the defect of the target object from small to large in the sample image can be captured through the sample image blocks with the plurality of sizes, the diversity of the samples is improved, the abnormality detection model is trained based on the samples, and the generalization capability and the accuracy of the abnormality detection model can be improved.
303. The computer equipment obtains a plurality of groups of sample pairs based on sample blocks in respective block sets of a plurality of first sample images and sample texts corresponding to each sample block, wherein each group of sample pairs comprises one sample block and one sample text, and the sample text is used for describing local abnormal conditions in the sample blocks.
In an embodiment of the present application, each first sample image is divided into a plurality of sample tiles, and accordingly, each first sample image corresponds to a plurality of groups of sample pairs.
In some embodiments, the same abnormal situation may be described by different texts, so one sample block may correspond to a plurality of sample texts, the plurality of sample texts describe local abnormal situations in the sample blocks respectively in different texts, and further one sample block may form a plurality of groups of sample pairs, the sample blocks in the plurality of groups of sample pairs are identical, and the sample texts are different.
In some embodiments, each sample tile corresponds to a plurality of sample texts that each describe a local anomaly in the sample tile in a different text. The process of extracting the characteristics of the sample text in the sample pair through the anomaly detection model by the computer equipment to obtain the text characteristics of the sample text comprises the following steps: and the computer equipment respectively performs feature extraction on a plurality of sample texts in the sample pair through the anomaly detection model to obtain initial text features respectively corresponding to the plurality of sample texts, and determines the average value of the plurality of initial text features to obtain the text features.
In the embodiment of the application, the plurality of initial text features are a plurality of vectors with the same dimension, the text features are also vectors, and the dimension of the text features is the same as the dimension of the initial text features. Determining the mean of the plurality of initial text features refers to determining the mean of the plurality of vectors to obtain the text features. Or the plurality of initial text features are a plurality of matrixes with the same dimension, the text features are also matrixes, and the dimension of the text features is the same as that of the initial text features. Determining the mean of the plurality of initial text features refers to determining the mean of the plurality of matrices to obtain the text features.
Wherein, a plurality of sample texts respectively describe local abnormal conditions in the sample block in different texts. The plurality of sample text includes, but is not limited to, at least one of: each sample text includes at least one word that is different from the words in the other sample text; the plurality of sample texts include different numbers of words; the order of words in the plurality of sample texts differs from one another.
For example, sample text used to describe the local absence of anomalies may be "perfect" or "defect free. The sample text used to describe the local presence of anomalies may be "damaged" or "defective".
In the embodiment of the application, as the same abnormal condition can be described through different texts, the text characteristics of a plurality of texts describing the same abnormal condition are averaged, so that the obtained text characteristics are more accurate and more effective.
In some embodiments, the computer device obtains a plurality of sample text by the following steps. The computer equipment fills a plurality of text templates based on local and local abnormality information in the sample blocks for each sample block to obtain a plurality of sample texts, wherein the abnormality information is used for describing local abnormality conditions, and the text templates are different.
In the embodiment of the present application, a plurality of text templates are preset, where the plurality of text templates includes but is not limited to at least one of the following cases: each text template includes at least one word that is different from the words in the other text templates; the plurality of text templates includes different numbers of words; the order of words differs between words in the plurality of text templates.
Wherein the text template may be populated based on local and local anomaly information, and the text template may be populated based on one of the local and anomaly information. For example, the text template may be "[ c ] a clip photograph", "[ c ] a photograph", c being at least one of the abnormality information of the part to be filled and the name of the part.
In the embodiment, the text templates are universal text templates under various abnormal conditions, so that a plurality of sample texts are obtained based on the plurality of text templates, and the text templates are filled only based on the abnormal conditions, so that the convenience for obtaining the sample texts is improved.
In other embodiments, a text template for describing different abnormal conditions is also provided, and then only the local names of the target objects need to be filled in the text template to obtain sample text. For example, the text templates may be "perfect o", "non-defective o", or "damaged o", or "defective o", where o represents the name of the part to be filled, such as "the head of a screw". Alternatively, o may also represent the name of the target object to be filled, such as a "screw".
In the embodiment of the present application, the process of acquiring a plurality of pairs of samples based on a plurality of first sample images is implemented through steps 302 to 303 described above. In the embodiment, the first sample image is segmented based on the sliding windows with multiple sizes, so that multiple image blocks with multiple sizes of the first sample image are obtained, the diversity of samples is improved, defects of the target object from small to large can be captured, the defect of the target object from small to large can be identified by the anomaly detection model, and the accuracy of the anomaly detection model is improved.
It should be noted that steps 302-303 are just one alternative implementation of the process, and the computer device may implement the process in other alternative implementations; for example, the first sample image is divided through a sliding window with one size to obtain a plurality of blocks of the first sample image, and then a sample pair is obtained based on the sample block and the sample text.
304. For each group of sample pairs, the computer equipment respectively performs feature extraction on sample blocks and sample texts in the sample pairs through an anomaly detection model to obtain image features of the sample blocks and text features of the sample texts, determines similarity between the image features and the text features, and the anomaly detection model is used for performing anomaly detection on parts of target objects in the input blocks.
In the embodiment of the application, the characteristic extraction of the sample block and the sample text can obtain the representation of the sample block and the sample text with better expressivity, so that the subsequent model is convenient for carrying out abnormal judgment.
In some embodiments, a trained feature extraction network is used for the preliminary feature extraction, such as ResNet network. The feature extraction network performs pre-training on a large-scale data set, and can better extract semantic features of images and texts. In the subsequent learning process, the parameters of the network remain unchanged, i.e. the network is only responsible for feature extraction and does not update the parameters in the subsequent training process. Optionally, at least one full-connection layer is connected to the back of the network, and the full-connection layer is used for extracting the features output by the network again to obtain corresponding image features or text features, that is, only the parameters of the full-connection layer need to be adjusted in the training process, so that the training efficiency can be improved.
305. The computer device iteratively trains the anomaly detection model based on respective similarities and preset similarities of the plurality of sets of samples.
In the embodiment of the present application, the similarity of each group of sample pairs refers to the similarity between the features of the sample block and the sample text in the sample pair.
In any iteration process, the model parameters of the anomaly detection model can be adjusted through the similarity of at least one group of sample pairs and the preset similarity. If loss values between the similarity of the at least one group of sample pairs and the preset similarity are determined, the model parameters are adjusted based on the average value of the loss values. In the next iteration process, at least one group of samples of the next batch are respectively input into the adjusted abnormality detection model to obtain respective similarity of at least one group of samples of the next batch, and then model parameters of the abnormality detection model are adjusted through the similarity of the at least one group of samples and the preset similarity. Repeating the iterative process until reaching the preset requirement.
In other embodiments, where the pair of samples is a positive pair of samples, the computer device may also construct a negative pair of samples, and train in conjunction with the positive pair of samples and the negative pair of samples. The negative sample pair includes a sample tile and negative sample text, the negative sample text not being used to describe local anomalies in the sample tile. Further, the negative sample text is any sample text in positive sample pairs except the positive sample pair in which the sample block is located, so that the acquisition efficiency of the sample text is improved.
In the embodiment of the application, in the training process of the anomaly detection model, the image characteristics and the text characteristics are respectively extracted through the image encoder and the text encoder in the anomaly detection model, the paired image-text is pulled up in the characteristic space by utilizing the loss function of contrast learning, and the unpaired image-text is pushed away in the characteristic space, so that the pre-training of the image text pair is realized. Therefore, in the use stage of the abnormality detection model, the abnormality of the target object is described by presetting different texts, such as "one defective object", "one non-defective object", respectively. And extracting respective text features of different texts and extracting image features of the input image through an anomaly detection model, and comparing the similarity between the extracted image features and each preset text feature, so as to judge which text the image is closer to, and realize anomaly detection.
The embodiment of the application provides a training method of an anomaly detection model, which is used for training the anomaly detection model based on a sample block and a sample text for describing the local anomaly condition of a target object in the sample block, and because the anomaly detection model extracts image features and text features, and the paired image features and text features have higher similarity, the anomaly detection model is trained based on the similarity between the image features and the text features and preset similarity, so that the anomaly detection model learns the rule with higher similarity between the paired image features and the text features, and the trained anomaly detection model can accurately extract the features, so that for any block comprising the local of the target object, the text features with high similarity can be determined based on the extracted image features, and the text corresponding to the text features is the text for describing the local anomaly condition of the target object in the block, therefore, the anomaly detection model obtained based on the training method has high accuracy of anomaly detection on the local of the target object in the block. In addition, the method is trained based on the image blocks, the image blocks comprise the local parts of the target object, and further, the abnormality detection model can be used for specifically detecting which local parts of the target object have the abnormality, namely, the abnormality detection accuracy is improved. Therefore, the abnormality detection is performed based on the abnormality detection model trained by the method, and the accuracy and precision of the abnormality detection can be improved.
The anomaly detection model was trained by the embodiments of fig. 2 and 3 described above. The abnormality detection is performed based on the abnormality detection model trained in fig. 2 or 3. Referring to fig. 5, fig. 5 is a method for detecting an abnormality of an object according to an embodiment of the present application, which includes the following steps.
501. The computer device acquires a plurality of tiles of an image, the image including a target object, each tile including a portion of the target object.
In the embodiment of the application, the target object can be any object to be detected for abnormality, such as various parts produced in industry. The plurality of tiles is based on segmenting the image.
502. For each block, the computer equipment determines a target text corresponding to the block through an abnormality detection model, wherein the target text is used for describing the local abnormality of a target object in the block.
In the embodiment of the application, the abnormality detection model is used for detecting abnormality of a local part of the target object in the inputted image block. The computer equipment inputs the image block into an anomaly detection model, extracts image characteristics of the image block through the anomaly detection model, determines text characteristics with similarity to the image characteristics meeting preset requirements, and further takes a text corresponding to the text characteristics as a target text.
503. The computer equipment determines the abnormal information of the target object in the image based on the target texts corresponding to the multiple image blocks, wherein the abnormal information is used for describing the abnormal condition of the target object.
In some embodiments, the anomaly information of the target object includes target texts corresponding to a plurality of tiles respectively, so as to obtain anomaly conditions of each local part of the target object; or the abnormality information is used to describe whether the target object as a whole is abnormal. The abnormality information may be "abnormality in the target object" or "abnormality in the target object" for example.
The embodiment of the application provides an object anomaly detection method, which is based on an anomaly detection model for detection, wherein the anomaly detection model is trained based on a sample image block and a sample text for describing the local anomaly condition of a target object in the sample image block, and because the anomaly detection model extracts image features and text features, and the paired image features and text features have higher similarity, the anomaly detection model is trained based on the similarity between the image features and the text features and the preset similarity, so that the anomaly detection model learns the rule of higher similarity between the paired image features and the text features, the trained anomaly detection model can accurately extract the features, and therefore, for any image block comprising the local part of the target object, the text corresponding to the text features is the text for describing the local anomaly condition of the target object in the image block, the anomaly detection model obtained based on the method has high accuracy of anomaly detection of the local part of the target object in the image block, and further the anomaly detection of the target object in the image block can be obtained based on the anomaly condition respectively corresponding to the image block, and the overall anomaly detection accuracy and the overall anomaly detection can be improved.
The embodiment of fig. 5 is a basic process of object abnormality detection based on an abnormality detection model, and the object abnormality detection method is further described below based on the embodiment of fig. 6. Referring to fig. 6, fig. 6 is a flowchart of a method for detecting an abnormality of an object according to an embodiment of the present application, the method includes the following steps.
601. The computer device acquires a plurality of tiles of an image, the image including a target object, each tile including a portion of the target object.
In some embodiments, a computer device acquires an initial image, the initial image including a target object; the target object is segmented from the initial image to obtain the image. Or the computer equipment calibrates the position of the target object in the initial image based on the position of the target object in the template image to obtain the image. Or the computer equipment divides the target object from the initial image to obtain a first image, and then calibrates the position of the target object in the first image based on the position of the target object in the template image to obtain the image. Or the computer equipment calibrates the position of the target object in the initial image based on the position of the target object in the template image to obtain a second image, and then segments the target object from the second image to obtain the image.
The specific process of dividing and calibrating the image by the computer device is the same as the process of dividing and calibrating the image in step 301, and will not be described herein.
In some embodiments, a process for a computer device to acquire a plurality of tiles of an image includes the steps of: the computer equipment divides the image based on sliding windows with a plurality of sizes respectively to obtain a plurality of block sets with respective sizes, and a plurality of blocks included in each block set with the size are all of the sizes.
The process of dividing the image by the computer device is the same as the process of dividing the first sample image in step 302, and will not be described herein.
602. For each image block, the computer equipment extracts image characteristics of the image block through an anomaly detection model, determines similarity between the image characteristics and a plurality of preset text characteristics, determines target text characteristics with the similarity meeting preset requirements from the plurality of preset text characteristics, obtains target texts based on the target text characteristics, and the plurality of preset text characteristics respectively correspond to preset texts, wherein the target texts are preset texts corresponding to the target text characteristics, and are used for describing local anomalies of target objects in the image block.
Wherein, the plurality of preset text features respectively correspond to preset texts; the computer equipment inputs the preset texts into an abnormality detection model respectively, and the characteristic extraction is carried out on the preset texts through the abnormality detection model respectively to obtain the respective preset text characteristics of the preset texts. In the embodiment, a plurality of preset text features are determined in advance through the abnormality detection model, so that a large number of image blocks can be detected by repeatedly utilizing the preset text features, the efficiency is improved, and the resources are saved.
In the embodiment of the application, the fact that the similarity between the target text feature and the image feature meets the preset requirement means that the similarity between the target text feature and the image feature reaches the target similarity, or the similarity between the target text feature and the image feature is the largest of the similarities of a plurality of preset text features.
In some embodiments, the number of preset texts is two, namely normal text and abnormal text, and the corresponding two preset text features are normal text features and abnormal text features. Normal text is used to describe that there is no abnormality in a part of a tile, and abnormal text is used to describe that there is an abnormality in a part of a tile. If the similarity between the image features and the normal text features is higher, no abnormality exists in part of the block, and the block is a normal block, namely a target block. If the similarity between the image features and the abnormal text features is higher, the local part in the image is abnormal, and the block is an abnormal block.
In the embodiment of the present application, the above-mentioned step 602 implements a process of determining, for each tile, the target text corresponding to the tile by using the anomaly detection model. In this embodiment, since the trained anomaly detection model can accurately extract image features of the image blocks and text features of the text, and the similarity between the text features and the image features of the text for describing local anomalies in the image blocks is high, and then a plurality of text features are preset, the similarity between the text features and the image features is respectively compared, and the text corresponding to the text features with the similarity meeting the preset requirements is the text for describing local anomalies in the image blocks, namely, the anomaly detection model is used for rapidly and accurately detecting the local anomalies in the target object in the image blocks.
It should be noted that, step 602 is only one alternative implementation manner of implementing the process, and the computer device may implement the process in other alternative implementations, which are not described herein.
603. The computer equipment determines the abnormal information of the target object in the image based on the target texts corresponding to the multiple image blocks, wherein the abnormal information is used for describing the abnormal condition of the target object.
In some embodiments, the process of determining the anomaly information of the target object in the image by the computer device based on the target text corresponding to each of the plurality of tiles includes the following steps: the method comprises the steps that under the condition that target text corresponding to at least one of a plurality of image blocks indicates that the local of a target object is abnormal, the computer equipment determines that the target object in an image is abnormal and determines that the local of the target object is abnormal; and under the condition that the target text corresponding to each of the plurality of tiles indicates that no abnormality exists in the local part of the target object, determining that no abnormality exists in the target object in the image.
The text corresponding to at least one block indicates that the local part of the target object is abnormal, and the text indicates that the target object is abnormal at least in one local part, and further indicates that the target object is abnormal. On this basis, since the plurality of tiles respectively include the local part of the target object, the local part of the target object with the abnormality is determined based on the at least one tile.
In some embodiments, the target text further indicates a type of abnormality, and the local abnormality type of the abnormality of the target object can be determined based on the target text, and accordingly, the abnormality information includes the local abnormality type and the local abnormality type. In some embodiments, based on the proportion of at least one tile in the plurality of tiles, the area of the target object where the anomaly exists, and the like, can also be determined, and accordingly, the anomaly information includes the area of the target object where the anomaly exists.
In other embodiments, the anomaly information includes target texts corresponding to a plurality of tiles, so that it is known which tiles have anomalies locally or which tiles have no anomalies locally through the target texts of the tiles in the anomaly information.
In the embodiment of the application, the overall abnormal condition of the target object is determined based on the target text of each of the plurality of image blocks, so that the summary of the abnormal information of the plurality of image blocks is realized, the readability of the abnormal detection result is improved, and the user experience is further improved.
In some embodiments, the computer device divides by a sliding window of a plurality of sizes to obtain a plurality of tiles of a plurality of sizes, each tile including a plurality of pixels. The process of determining the abnormal information of the target object in the image by the computer equipment based on the target texts respectively corresponding to the plurality of image blocks comprises the following steps: for each image block of each size, the computer equipment assigns the similarity between the image characteristics of the image block and the corresponding target text characteristics to a plurality of pixel points in the image block under the condition that the target text corresponding to the image block indicates that the local part of the target object in the image block is abnormal; for each pixel point, obtaining an abnormal value of the pixel point based on the similarity of the pixel point under a plurality of sizes, wherein the abnormal value is used for indicating the probability of abnormality of the pixel point; based on the abnormal value of each pixel point in the image, determining the abnormal information of the target object in the image, wherein the abnormal information of the target object comprises at least one of the position and the abnormal area of the pixel point where the target object is abnormal.
Optionally, the process of obtaining the outlier of the pixel point by the computer device based on the similarity of the pixel point under multiple sizes includes the following steps: the computer equipment determines the average value of the similarity of the pixel point under a plurality of sizes to obtain an abnormal value of the pixel point, wherein the average value can be a harmonic average value or an arithmetic average value. Or alternatively. And the computer equipment determines the sum of the similarity of the pixel under a plurality of sizes to obtain the abnormal value of the pixel.
Optionally, the process of determining the anomaly information of the target object in the image by the computer device based on the anomaly value of each pixel point in the image includes the following steps: the computer equipment determines a plurality of target pixel points, of which the abnormal value is larger than an abnormal threshold value, in the plurality of pixel points of the image, and determines the pixel points positioned on the target object in the plurality of target pixel points, so that the position of the pixel point, of which the target object is abnormal, is obtained. Further, after obtaining the pixels with the abnormality on the target object, the computer device determines the ratio between the number of the pixels and the number of the pixels on the target object, so as to obtain the abnormal area of the target object.
In other embodiments, the computer device determines a mean of similarities between image features of the tile and a plurality of preset text features, and assigns the mean to a plurality of pixels in the tile. Or the computer equipment determines a first text feature for describing a plurality of texts with local anomalies in a plurality of preset text features, determines a mean value of similarity between the image features of the image block and the plurality of first text features, and assigns the mean value to a plurality of pixel points in the image block. Or the computer device may further determine a plurality of second text features of the plurality of preset text features having a similarity to the image feature greater than a similarity threshold, determine a mean value of the similarity between the image feature of the tile and the plurality of second text features, and assign the mean value to a plurality of pixels in the tile.
In this embodiment, for each pixel, by fusing the similarity of the pixel under multiple sizes, the abnormal value of each pixel is more accurate, that is, the probability that the pixel indicated by the abnormal value is abnormal is more accurate, and then the abnormal information of the target object in the image is determined by the abnormal values of the multiple pixels, so that the abnormal information is more accurate.
In some embodiments, the anomaly detection model provided by the embodiment of the present application is used for screening anomaly samples, and the remaining normal samples are used as training data for constructing an anomaly detection network, where the anomaly detection network is used for anomaly detection of an object. An abnormal sample refers to a sample image including a target object having an abnormality, and a normal sample refers to a sample image including a target object having no abnormality. Wherein, abnormal samples are detected from a large number of sample images through an abnormal detection model, and an abnormal detection network is constructed based on the remaining normal samples. Specifically, a non-abnormal feature of the target object is constructed based on the normal sample, that is, the feature distribution of the target object in the normal sample, and then the abnormal detection network detects the abnormal object based on the non-abnormal feature, that is, the object with a large difference from the non-abnormal feature is detected, and the object is the abnormal object. The construction and use of the anomaly detection network is described in the following steps, which are optionally implemented.
604. The computer device determines a plurality of target images in the plurality of images based on the abnormality information of the target objects in the plurality of images, and the target objects in the target images are not abnormal.
In the embodiment of the application, the anomaly information is used for describing the anomaly condition of the target object, so that whether the target object in the image is abnormal or not can be determined based on the anomaly information, and the image of which the anomaly information indicates that the target object is not abnormal is taken as the target image.
605. The computer device determines, based on the plurality of target images, a non-abnormal feature of the target object, the non-abnormal feature being a feature possessed by the target object in which no abnormality exists.
The computer equipment respectively performs feature extraction on the plurality of image features to obtain the image features of the plurality of target images, and obtains the non-abnormal features of the target object based on the image features of the plurality of target images.
Alternatively, the computer device performs feature extraction via the anomaly detection model, or the computer device performs feature extraction via other feature extraction networks, such as via ResNet network.
Wherein the non-abnormal features include image features of each of the plurality of target images. Or the computer equipment clusters the image features of the target images to obtain a plurality of class clusters, the average value of the image features in each class cluster is used as the image feature corresponding to the class cluster, and the non-abnormal features comprise the image features corresponding to the class clusters. Or the computer equipment determines the average value of the image characteristics of a plurality of target images to obtain non-abnormal characteristics.
Thus, the non-abnormal feature can represent a feature distribution of the target object in which no abnormality exists. Alternatively, the non-anomalous features are stored in a memory bank that, in a straightforward manner, discretizes the feature distribution of the target object in the absence of anomalies.
606. The computer device performs anomaly detection on an image including the target object based on the non-anomaly characteristic.
In some embodiments, the computer device determines that the target object in the image is abnormal if a distance between an image feature of the image including the target object and a non-abnormal feature is greater than a distance threshold; the computer device determines that the target object in the image is not abnormal if a distance between an image feature of the image including the target object and the non-abnormal feature is not greater than a distance threshold. The distance may be a cosine distance.
In some embodiments, if the non-abnormal feature includes an image feature of each of the plurality of target images or an image feature of each of the plurality of clusters, determining a target image feature closest to a distance between image features of the image from the plurality of image features included in the non-abnormal feature, and if the distance is not greater than a distance threshold, determining that the target object in the image is not abnormal; and determining that the target object in the image is abnormal when the distance is greater than the distance threshold.
Wherein, the distance threshold value can be set and changed according to the requirement. The distance threshold may be determined based on the yield requirements for the article in actual production. Alternatively, a small amount of verification data (an image of a target object in which no abnormality exists) may be collected, and these verification data are input into the abnormality detection network, and an abnormality score (i.e., distance) of each verification data is calculated. The anomaly scores are then ranked from small to large. Assuming that the yield in the actual demand is m%, taking the abnormal score of the m% position in the ordered abnormal scores as a distance threshold.
In the embodiment of the application, the abnormal information of the image is determined through the abnormal detection model, the abnormal detection model can accurately detect the image with the abnormality of the target object, and then the abnormal detection model can detect the image with the abnormality of the target object in the plurality of images, so that the normal image without the abnormality of the target object is reserved, and the accuracy of the abnormality detection is improved. And the non-abnormal characteristics of the normal object established based on the normal images are high in accuracy. Accordingly, by performing anomaly detection based on the non-anomaly characteristic, images which are not matched with the non-anomaly characteristic can be detected, and further, the target objects included in the images, namely, the objects with anomalies, further improve the accuracy of anomaly detection.
The above embodiment has been described taking the abnormality detection network as an example of the detection object including the image of the entire target object. In other embodiments, the anomaly detection network targets a local tile including the target object, and the anomaly sample refers to a local tile sample including an anomaly, and the normal sample refers to a local tile sample including no anomaly. And screening abnormal samples from a large number of image blocks through an abnormal detection model, and constructing an abnormal detection network based on the rest normal samples. Specifically, a non-abnormal feature of the target object is constructed based on the normal sample, wherein the non-abnormal feature is the feature distribution of the local part of the target object in the normal sample, and then the abnormal detection network detects the abnormal local part of the target object based on the non-abnormal feature, namely, the local part with a large difference from the non-abnormal feature is detected, and the local part is abnormal. After detecting the abnormal part, the abnormal target object is detected.
The computer equipment determines a plurality of target image blocks in the image blocks based on target texts corresponding to the image blocks respectively, wherein no abnormality exists in part of a target object in the target image blocks; determining a local non-abnormal characteristic of the target object based on a plurality of target tiles of each of the plurality of images, wherein the non-abnormal characteristic is a characteristic of the target object which is not abnormal; abnormality detection is performed on a tile that includes a portion of the target object based on the non-abnormal features.
The target text is used for describing the abnormal situation of the local part of the target object, so that whether the local part in the block is abnormal or not can be determined based on the target text, and the block with the local part free of the abnormality is taken as the target block.
Optionally, the computer device performs feature extraction on the multiple target tiles respectively to obtain respective image features of the multiple target tiles, and obtains local non-abnormal features based on the respective image features of the multiple target tiles. Wherein, the computer device performs feature extraction through the anomaly detection model, and since the image features of the target tile are already extracted when determining the target text, the image features extracted in step 602 can be directly obtained. Or the computer device may perform feature extraction over other feature extraction networks, such as through ResNet network 50.
Wherein the local non-abnormal features include image features of each of the plurality of target tiles. Or the computer equipment clusters the image features of the target tiles to obtain a plurality of class clusters, the average value of the image features in each class cluster is used as the image feature corresponding to the class cluster, and the non-abnormal features comprise the image features corresponding to the class clusters. Or the computer equipment determines the average value of the block characteristics of the plurality of target blocks to obtain the non-abnormal characteristics.
In some embodiments, the computer device performs unified processing on a plurality of tiles including different portions of the target object to obtain local non-abnormal features. In other embodiments, the computer device determines, for each local portion of the target object, a non-anomalous characteristic of the local portion based on a plurality of target tiles including the local portion, thereby deriving respective non-anomalous characteristics of the plurality of local portions of the target object. Accordingly, the computer device performs anomaly detection for the tiles that include each local based on the non-anomaly characteristic of that local.
The process of abnormality detection of the image including the local image block by the computer device based on the local non-abnormal feature is the same as the process of abnormality detection of the image including the target object based on the non-abnormal feature of the target object, and will not be described herein.
In the embodiment of the application, the abnormal information of the image is determined through the abnormal detection model, the abnormal detection model can accurately detect the partial abnormal image blocks of the target object, the partial abnormal image blocks in the image blocks can be detected through the abnormal detection model, the partial normal image blocks without the abnormal image blocks are reserved, the accurate detection of the abnormal image blocks is realized, and the accuracy of the normal partial non-abnormal characteristics established based on the normal image blocks is high. Correspondingly, the abnormal detection is carried out based on the non-abnormal features, so that the image blocks which are not matched with the non-abnormal features can be accurately detected, and the image blocks comprise parts, namely abnormal parts, so that the accuracy of the abnormal detection is realized.
When the anomaly detection network constructs the non-anomaly characteristic of the target object, the default training data are normal samples, and the normal samples refer to images of the target object without anomalies. Therefore, the image after calibration is firstly denoised, that is, the abnormal sample is detected and removed, and then the rest sample can be used as training data to train the abnormal detection network. In the embodiment of the application, the method provided by the embodiment of the application can accurately detect the abnormal sample, so that the training data used by the abnormal detection network are all normal samples, the accuracy of the trained abnormal detection network is improved, and the accuracy of abnormality detection based on the abnormal detection network is high.
In the embodiment of the present application, the abnormality detection is described by taking the abnormality detection model obtained by the Training method as an example, and in other embodiments, the computer device may also directly use a basic model that has been trained on a large-scale data set, such as a CLIP (Contrastive Language-Image Pre-Training, contrast language-Image Pre-Training) model, and an improved model based on CLIP, such as an APRIL-GAN model, anomaly CLIP model, which are not particularly limited herein. The large model can have good zero sample migration capability by pre-training on a large-scale multi-mode data set, and further, the model is utilized for denoising, a series of problems caused by a clustering method can be avoided, and the denoising means detects and rejects abnormal samples. Therefore, the method provided by the embodiment of the application realizes a framework of full-automatic anomaly detection based on the basic large model, utilizes the universal capability of the large model, does not depend on specific assumption and pretraining of a large number of normal articles, realizes a more reliable denoising process, effectively improves the denoising performance of the whole framework, and further realizes a more efficient, stable and universal full-automatic industrial anomaly detection process.
The embodiment of the application provides a method for constructing an anomaly detection network without updating parameters, so that the time cost of a training stage is extremely low, and the efficiency is further improved. In other embodiments, the computer device may also construct the anomaly detection network through PatchCore (an image processing algorithm based on a convolutional neural network), uniAD (an anomaly detection algorithm), SIMPLENET (an anomaly detection algorithm), and the like, which are not described in detail herein.
The method provided by the embodiment of the application can be applied to automatic industrial quality detection, can accurately detect the abnormal defects in the industrial parts, greatly reduces the labor cost of industrial quality detection, and is more efficient, stable and universal based on denoising of the abnormal detection module.
For example, referring to fig. 7, fig. 7 is a schematic flow chart of industrial anomaly detection according to an embodiment of the present application. The method provided by the embodiment of the application is applied to a full-automatic industrial anomaly detection process, and the process comprises a training stage and a testing stage. In the training phase, the anomaly detection model is trained based on the factory produced parts. First, a sample image including a part is aligned with a template image, the alignment being based on the position of a target object in the template image, the position of the target object in the sample image being calibrated. And then training based on the sample images to obtain an abnormality detection model. Denoising through the anomaly detection model, and constructing an anomaly detection network based on the residual normal samples, wherein the anomaly detection network comprises non-anomaly characteristics of the parts. In the testing stage and the actual abnormality detection process, the parts produced by the factory are subjected to abnormality detection through an abnormality detection network so as to detect abnormal parts with large differences from non-abnormal characteristics.
For example, referring to fig. 8, fig. 8 is a flow chart of a training phase in industrial inspection provided by an embodiment of the present application. The computer equipment acquires the input automatically acquired images, performs denoising based on the abnormal detection model after image alignment, namely, eliminates abnormal samples, and further constructs an abnormal detection network based on the residual normal samples. For another example, referring to fig. 9, fig. 9 is a flow chart of a testing phase in an industrial inspection according to an embodiment of the present application. The computer equipment acquires an input test image, performs abnormality detection through an abnormality detection network after image alignment, and outputs an abnormality detection result.
In the above embodiments, the actual anomaly detection is performed through the anomaly detection network as an example, and the computer device may also perform the actual anomaly detection directly based on the anomaly detection model. Or respectively carrying out actual anomaly detection through the anomaly detection network and the anomaly detection model, and synthesizing detection results of the anomaly detection network and the anomaly detection model to obtain a target detection result. Taking the object detected by either one of the two as the object with abnormality; or an object detected by both of them is used as an object having an abnormality.
The embodiment of the application provides an object anomaly detection method, which is used for detecting based on an anomaly detection model, wherein the anomaly detection model is obtained by training based on a sample image block and a sample text for describing local anomaly of a target object in the sample image block; the training-obtained anomaly detection model can accurately extract image features of the image blocks and text features of texts, the similarity between the text features and the image features of the texts for describing local anomalies in the image blocks is high, a plurality of text features are preset, the similarity between the text features and the image features is respectively compared, the texts corresponding to the text features with the similarity meeting the preset requirements are texts for describing the local anomalies in the image blocks, namely, the anomaly detection model is used for rapidly and accurately detecting the local anomalies in the target object in the image blocks, and on the basis, the overall anomalies of the target object in the image can be obtained based on the anomalies respectively corresponding to each image block in the image, so that the convenience and the accuracy of anomaly detection are improved.
Fig. 10 is a block diagram of a training apparatus for an anomaly detection model according to an embodiment of the present application. Referring to fig. 10, the apparatus includes:
An obtaining module 1001, configured to obtain a plurality of groups of sample pairs based on a plurality of first sample images, where the plurality of first sample images include a target object, each group of sample pairs includes a sample block and a sample text, the sample block includes a part of the target object, and the sample text is used to describe an abnormal situation of the part in the sample block;
The extraction module 1002 is configured to perform feature extraction on a sample block and a sample text in each group of sample pairs through an anomaly detection model, to obtain image features of the sample block and text features of the sample text, determine similarity between the image features and the text features, and perform anomaly detection on a part of a target object in the input block through the anomaly detection model;
the training module 1003 is configured to iteratively train the anomaly detection model based on respective similarities and preset similarities of the plurality of sets of samples.
In some embodiments, the obtaining module 1001 is configured to:
For each first sample image, respectively dividing the first sample image based on sliding windows with a plurality of sizes to obtain a plurality of block sets with respective sizes, wherein a plurality of sample blocks included in each block set with the size are all of the sizes;
And obtaining a plurality of groups of sample pairs based on the sample blocks in the block sets of the first sample images and the sample text corresponding to each sample block.
In some embodiments, each sample tile corresponds to a plurality of sample texts, and the plurality of sample texts describe local abnormal conditions in the sample tiles respectively in different texts; an extraction module 1002, configured to:
and respectively extracting features of a plurality of sample texts in the sample pair through an anomaly detection model to obtain initial text features respectively corresponding to the plurality of sample texts, and determining the average value of the plurality of initial text features to obtain the text features.
In some embodiments, the apparatus further comprises:
the filling module is used for filling a plurality of text templates based on local and local abnormality information in the sample blocks for each sample block to obtain a plurality of sample texts, the abnormality information is used for describing local abnormality conditions, and the plurality of text templates are different.
In some embodiments, the obtaining module 1001 is further configured to obtain a plurality of second sample images, each of the plurality of second sample images including the target object;
the apparatus further comprises a segmentation module for segmenting the target object from the second sample image for each second sample image, resulting in a first sample image.
In some embodiments, the obtaining module 1001 is further configured to obtain a plurality of third sample images, each of the plurality of third sample images including the target object;
the device further comprises a calibration module, which is used for calibrating the position of the target object in the third sample image based on the position of the target object in the template image for each third sample image, so as to obtain a first sample image, wherein the position of the target object in the first sample image is matched with the position of the target object in the template image.
The embodiment of the application provides a training device of an abnormality detection model, which is used for training the abnormality detection model based on a sample block and a sample text for describing the local abnormality of a target object in the sample block, and because the abnormality detection model extracts image features and text features, and the paired image features and text features have higher similarity, the abnormality detection model is trained based on the similarity between the image features and the text features and preset similarity, so that the abnormality detection model learns the rule of higher similarity between the paired image features and the text features, and the trained abnormality detection model can accurately extract the features, so that for any block comprising the local part of the target object, the text features with high similarity can be determined based on the extracted image features, and the text corresponding to the text features is the text for describing the local abnormality of the target object in the block, therefore, the abnormality detection accuracy of the local part of the target object in the block is high based on the abnormality detection model obtained by training of the device. Moreover, the device is trained based on the image blocks, the image blocks comprise the local parts of the target object, and then the abnormality detection model can be used for specifically detecting which local parts of the target object have abnormality, namely, the abnormality detection accuracy is improved. Therefore, the abnormality detection is performed based on the abnormality detection model obtained by training, and the accuracy and precision of the abnormality detection can be improved.
Fig. 11 is a block diagram of an object abnormality detection apparatus provided according to an embodiment of the present application. Referring to fig. 11, the apparatus includes:
An acquisition module 1101 for acquiring a plurality of tiles of an image, the image comprising a target object, each tile comprising a portion of the target object;
The determining module 1102 is configured to determine, for each tile, a target text corresponding to the tile through an anomaly detection model, where the anomaly detection model is obtained through the training method, and the target text is used to describe a local anomaly condition of a target object in the tile;
The determining module 1102 is further configured to determine, based on the target texts corresponding to the multiple tiles, anomaly information of the target object in the image, where the anomaly information is used to describe an anomaly condition of the target object.
In some embodiments, the determining module 1102 is configured to:
Determining that the target object in the image is abnormal and determining that the target object is abnormal in part under the condition that the target text corresponding to at least one of the plurality of tiles indicates that the abnormality exists in part of the target object;
and under the condition that the target text corresponding to each of the plurality of tiles indicates that no abnormality exists in the local part of the target object, determining that no abnormality exists in the target object in the image.
In some embodiments, the determining module 1102 is configured to:
For each image block, extracting image characteristics of the image block through an anomaly detection model, determining similarity between the image characteristics and a plurality of preset text characteristics, determining target text characteristics with the similarity meeting preset requirements from the plurality of preset text characteristics, wherein the plurality of preset text characteristics respectively correspond to preset texts, and the target texts are preset texts corresponding to the target text characteristics.
In some embodiments, the obtaining module 1101 is configured to:
The image is divided based on sliding windows with a plurality of sizes, so that a plurality of block sets with respective sizes are obtained, and the blocks included in each block set with the sizes are all of the sizes.
In some embodiments, each tile includes a plurality of pixels, a determination module 1102 for:
For each tile of each size, assigning similarity between image features of the tile and corresponding target text features to a plurality of pixel points in the tile when target text corresponding to the tile indicates that an abnormality exists in a part of a target object in the tile;
for each pixel point, obtaining an abnormal value of the pixel point based on the similarity of the pixel point under a plurality of sizes, wherein the abnormal value is used for indicating the probability of abnormality of the pixel point;
based on the abnormal value of each pixel point in the image, determining the abnormal information of the target object in the image, wherein the abnormal information of the target object comprises at least one of the position and the abnormal area of the pixel point where the target object is abnormal.
In some embodiments, the number of images is multiple, and the determining module 1102 is further configured to determine, based on the anomaly information of each of the target objects in the plurality of images, a plurality of target images in the plurality of images, where no anomaly exists in the target object in the target image; determining non-abnormal characteristics of the target object based on the plurality of target images, wherein the non-abnormal characteristics are characteristics of the target object without abnormality;
The apparatus further includes a first detection module for performing anomaly detection on an image including the target object based on the non-anomaly characteristic.
In some embodiments, the image is multiple, and the determining module 1102 is further configured to determine, based on target texts corresponding to the multiple tiles respectively, multiple target tiles in the multiple tiles, where no abnormality exists in a part of the target object in the target tiles; determining a local non-abnormal characteristic of the target object based on a plurality of target tiles of each of the plurality of images, wherein the non-abnormal characteristic is a characteristic of the target object which is not abnormal;
the apparatus further includes a second detection module for anomaly detection of a local tile including the target object based on the non-anomaly characteristic.
The embodiment of the application provides an object anomaly detection device, which is used for detecting based on an anomaly detection model, wherein the anomaly detection model is obtained by training based on a sample image block and a sample text for describing the local anomaly condition of a target object in the sample image block, and because the anomaly detection model extracts image features and text features, and the paired image features and text features have higher similarity, the anomaly detection model is trained based on the similarity between the image features and the text features and the preset similarity, so that the anomaly detection model learns the rule of higher similarity between the paired image features and the text features, the trained anomaly detection model can accurately extract the features, and therefore, for any image block comprising the local part of the target object, the text corresponding to the text features is the text for describing the local anomaly condition of the target object in the image block, the anomaly detection accuracy of the target object in the image block is high based on the anomaly detection model obtained by the device, and the anomaly detection accuracy of the local anomaly condition of the target object in the image block can be obtained based on the anomaly condition respectively, and the anomaly detection accuracy of the target object in the image can be improved.
In the embodiment of the application, the computer equipment can be a terminal or a server, and when the computer equipment is the terminal, the terminal is used as an execution main body to implement the technical scheme provided by the embodiment of the application; when the computer equipment is a server, the server is used as an execution main body to implement the technical scheme provided by the embodiment of the application; or the technical scheme provided by the application is implemented through interaction between the terminal and the server, and the embodiment of the application is not limited to the embodiment.
Fig. 12 shows a block diagram of a terminal 1200 according to an exemplary embodiment of the present application.
In general, the terminal 1200 includes: a processor 1201 and a memory 1202.
Processor 1201 may include one or more processing cores, such as a 4-core processor, an 8-core processor, or the like. The processor 1201 may be implemented in at least one hardware form of DSP (DIGITAL SIGNAL Processing), FPGA (Field-Programmable gate array), PLA (Programmable Logic Array ). Processor 1201 may also include a main processor, which is a processor for processing data in an awake state, also referred to as a CPU (Central Processing Unit ), and a coprocessor; a coprocessor is a low-power processor for processing data in a standby state. In some embodiments, the processor 1201 may be integrated with a GPU (Graphics Processing Unit, image processor) for rendering and drawing of content required to be displayed by the display screen. In some embodiments, the processor 1201 may also include an AI (ARTIFICIAL INTELLIGENCE ) processor for processing computing operations related to machine learning.
Memory 1202 may include one or more computer-readable storage media, which may be non-transitory. Memory 1202 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 1202 is used to store at least one program code for execution by processor 1201 to implement the training method or object anomaly detection method of the anomaly detection model provided by the method embodiments of the present application.
In some embodiments, the terminal 1200 may further optionally include: a peripheral interface 1203, and at least one peripheral. The processor 1201, the memory 1202, and the peripheral interface 1203 may be connected by a bus or signal lines. The individual peripheral devices may be connected to the peripheral device interface 1203 via buses, signal lines, or a circuit board. Specifically, the peripheral device includes: at least one of radio frequency circuitry 1204, a display 1205, a camera assembly 1206, audio circuitry 1207, and a power supply 1208.
The peripheral interface 1203 may be used to connect at least one peripheral device associated with an I/O (Input/Output) to the processor 1201 and the memory 1202. In some embodiments, the processor 1201, the memory 1202, and the peripheral interface 1203 are integrated on the same chip or circuit board; in some other embodiments, either or both of the processor 1201, the memory 1202, and the peripheral interface 1203 may be implemented on separate chips or circuit boards, which is not limited in this embodiment.
The Radio Frequency circuit 1204 is used for receiving and transmitting RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuit 1204 communicates with a communication network and other communication devices via electromagnetic signals. The radio frequency circuit 1204 converts an electrical signal into an electromagnetic signal for transmission, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 1204 includes: antenna systems, RF transceivers, one or more amplifiers, tuners, oscillators, digital signal processors, codec chipsets, subscriber identity module cards, and so forth. The radio frequency circuit 1204 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocol includes, but is not limited to: the world wide web, metropolitan area networks, intranets, generation mobile communication networks (2G, 3G, 4G, and 5G), wireless local area networks, and/or WiFi (WIRELESS FIDELITY ) networks. In some embodiments, the radio frequency circuit 1204 may further include NFC (NEAR FIELD Communication) related circuits, which is not limited by the present application.
The display 1205 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display 1205 is a touch display, the display 1205 also has the ability to collect touch signals at or above the surface of the display 1205. The touch signal may be input as a control signal to the processor 1201 for processing. At this time, the display 1205 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the display 1205 may be one and disposed on a front panel of the terminal 1200; in other embodiments, the display 1205 may be at least two, respectively disposed on different surfaces of the terminal 1200 or in a folded design; in other embodiments, the display 1205 may be a flexible display disposed on a curved surface or a folded surface of the terminal 1200. Even more, the display 1205 may be arranged in an irregular pattern that is not rectangular, i.e., a shaped screen. The display 1205 can be made of LCD (Liquid CRYSTAL DISPLAY), OLED (Organic Light-Emitting Diode) or other materials.
The camera assembly 1206 is used to capture images or video. Optionally, camera assembly 1206 includes a front camera and a rear camera. Typically, the front camera is disposed on the front panel of the terminal and the rear camera is disposed on the rear surface of the terminal. In some embodiments, the at least two rear cameras are any one of a main camera, a depth camera, a wide-angle camera and a tele camera, so as to realize that the main camera and the depth camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize a panoramic shooting and Virtual Reality (VR) shooting function or other fusion shooting functions. In some embodiments, camera assembly 1206 may also include a flash. The flash lamp can be a single-color temperature flash lamp or a double-color temperature flash lamp. The dual-color temperature flash lamp refers to a combination of a warm light flash lamp and a cold light flash lamp, and can be used for light compensation under different color temperatures.
The audio circuitry 1207 may include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 1201 for processing, or inputting the electric signals to the radio frequency circuit 1204 for voice communication. For purposes of stereo acquisition or noise reduction, a plurality of microphones may be respectively disposed at different portions of the terminal 1200. The microphone may also be an array microphone or an omni-directional pickup microphone. The speaker is used to convert electrical signals from the processor 1201 or the radio frequency circuit 1204 into sound waves. The speaker may be a conventional thin film speaker or a piezoelectric ceramic speaker. When the speaker is a piezoelectric ceramic speaker, not only the electric signal can be converted into a sound wave audible to humans, but also the electric signal can be converted into a sound wave inaudible to humans for ranging and other purposes. In some embodiments, the audio circuitry 1207 may also include a headphone jack.
The power supply 1208 is used to power the various components in the terminal 1200. The power source 1208 may be alternating current, direct current, disposable battery, or rechargeable battery. When the power source 1208 includes a rechargeable battery, the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery. The wired rechargeable battery is a battery charged through a wired line, and the wireless rechargeable battery is a battery charged through a wireless coil. The rechargeable battery may also be used to support fast charge technology.
In some embodiments, terminal 1200 also includes one or more sensors 1209. The one or more sensors 1209 include, but are not limited to: acceleration sensor 1210, gyro sensor 1211, pressure sensor 1212, optical sensor 1213, and proximity sensor 1214.
The acceleration sensor 1210 may detect the magnitudes of accelerations on three coordinate axes of a coordinate system established with the terminal 1200. For example, the acceleration sensor 1210 may be used to detect components of gravitational acceleration in three coordinate axes. The processor 1201 may control the display 1205 to display a user interface in a landscape view or a portrait view based on the gravitational acceleration signal acquired by the acceleration sensor 1210. The acceleration sensor 1210 may also be used for the acquisition of motion data of a game or a user.
The gyro sensor 1211 may detect a body direction and a rotation angle of the terminal 1200, and the gyro sensor 1211 may collect a 3D motion of the user to the terminal 1200 in cooperation with the acceleration sensor 1210. The processor 1201 can implement the following functions based on the data collected by the gyro sensor 1211: motion sensing (e.g., changing UI according to a tilting operation by a user), image stabilization at shooting, game control, and inertial navigation.
The pressure sensor 1212 may be disposed at a side frame of the terminal 1200 and/or at an underlying layer of the display 1205. When the pressure sensor 1212 is provided at a side frame of the terminal 1200, a grip signal of the terminal 1200 by a user may be detected, and the processor 1201 performs a left-right hand recognition or a shortcut operation according to the grip signal collected by the pressure sensor 1212. When the pressure sensor 1212 is provided at the lower layer of the display 1205, the processor 1201 realizes control of the operability control on the UI interface according to the pressure operation of the user on the display 1205. The operability controls include at least one of a button control, a scroll bar control, an icon control, and a menu control.
The optical sensor 1213 is used to collect the ambient light intensity. In one embodiment, processor 1201 may control the display brightness of display 1205 based on the intensity of ambient light collected by optical sensor 1213. Specifically, when the intensity of the ambient light is high, the display brightness of the display screen 1205 is turned up; when the ambient light intensity is low, the display brightness of the display screen 1205 is turned down. In another embodiment, processor 1201 may also dynamically adjust the shooting parameters of camera assembly 1206 based on the intensity of ambient light collected by optical sensor 1213.
A proximity sensor 1214, also referred to as a distance sensor, is typically provided on the front panel of the terminal 1200. The proximity sensor 1214 serves to collect a distance between the user and the front surface of the terminal 1200. In one embodiment, when the proximity sensor 1214 detects that the distance between the user and the front surface of the terminal 1200 gradually decreases, the processor 1201 controls the display 1205 to switch from the bright screen state to the off screen state; when the proximity sensor 1214 detects that the distance between the user and the front surface of the terminal 1200 gradually increases, the processor 1201 controls the display 1205 to switch from the off-screen state to the on-screen state.
It will be appreciated by those skilled in the art that the structure shown in fig. 12 is not limiting and that more or fewer components than shown may be included or certain components may be combined or a different arrangement of components may be employed.
Fig. 13 is a schematic structural diagram of a server according to an embodiment of the present application, where the server 1300 may have relatively large differences due to different configurations or performances, and may include one or more processors (Central Processing Units, CPU) 1301 and one or more memories 1302, where the memories 1302 are used to store executable program codes, and the processor 1301 is configured to execute the executable program codes to implement the training method of the anomaly detection model or the object anomaly detection method according to the foregoing method embodiments. Of course, the server may also have a wired or wireless network interface, a keyboard, an input/output interface, and other components for implementing the functions of the device, which are not described herein.
The embodiment of the application also provides a computer readable storage medium, wherein at least one section of program is stored in the computer readable storage medium, and the at least one section of program is loaded and executed by a processor so as to realize the training method of the anomaly detection model or the object anomaly detection method in any implementation mode.
The embodiment of the application also provides a computer program product, which comprises at least one section of program, the at least one section of program is stored in a computer readable storage medium, a processor of the computer device reads the at least one section of program from the computer readable storage medium, and the processor executes the at least one section of program, so that the computer device executes the training method of the abnormality detection model or the object abnormality detection method in any implementation mode.
In some embodiments, a computer program product according to embodiments of the present application may be deployed to be executed on one computer device or on multiple computer devices at one site or on multiple computer devices distributed across multiple sites and interconnected by a communication network, where the multiple computer devices distributed across multiple sites and interconnected by a communication network may constitute a blockchain system.
Any combination of the above optional solutions may be adopted to form an optional embodiment of the present application, which is not described herein. The foregoing is illustrative of the present application and is not to be construed as limiting thereof, but rather as various modifications, equivalent arrangements, improvements, etc., which fall within the spirit and principles of the present application.

Claims (27)

1. A method of training an anomaly detection model, the method comprising:
acquiring a plurality of second sample images, each of the plurality of second sample images including a target object;
for each second sample image, dividing the target object from the second sample image to obtain a first sample image;
Acquiring a plurality of groups of sample pairs based on a plurality of first sample images, wherein the plurality of first sample images comprise the target object, each group of sample pairs comprise a sample block and a sample text, the sample block comprises a local part of the target object, and the sample text is used for describing the abnormal situation of the local part in the sample block;
For each group of sample pairs, respectively extracting features of a sample block and a sample text in the sample pairs through an anomaly detection model to obtain image features of the sample block and text features of the sample text, and determining similarity between the image features and the text features, wherein the anomaly detection model is used for carrying out anomaly detection on part of the target object in the input block;
And iteratively training the anomaly detection model based on the respective similarity and the preset similarity of the plurality of groups of samples.
2. The method of claim 1, wherein the acquiring a plurality of pairs of samples based on the plurality of first sample images comprises:
For each first sample image, respectively dividing the first sample image based on sliding windows with a plurality of sizes to obtain a plurality of block sets with respective sizes, wherein a plurality of sample blocks included in each block set with the size are all of the sizes;
and obtaining the plurality of groups of sample pairs based on the sample blocks in the block sets of the plurality of first sample images and the sample text corresponding to each sample block.
3. The method of claim 1, wherein each sample tile corresponds to a plurality of sample texts that each describe a local anomaly in the sample tile in a different text;
extracting features of the sample text in the sample pair through the anomaly detection model to obtain text features of the sample text, wherein the method comprises the following steps:
and respectively extracting features of a plurality of sample texts in the sample pair through the anomaly detection model to obtain initial text features respectively corresponding to the plurality of sample texts, and determining the average value of the plurality of initial text features to obtain the text features.
4. A method according to claim 3, characterized in that the method further comprises:
For each sample block, filling a plurality of text templates based on local parts in the sample block and the local abnormal information to obtain a plurality of sample texts, wherein the abnormal information is used for describing the local abnormal conditions, and the plurality of text templates are different.
5. The method according to claim 1, wherein the method further comprises:
Acquiring a plurality of third sample images, each of which includes the target object;
And for each third sample image, calibrating the position of the target object in the third sample image based on the position of the target object in a template image to obtain the first sample image, wherein the position of the target object in the first sample image is matched with the position of the target object in the template image.
6. An object anomaly detection method, characterized in that the method comprises:
acquiring a plurality of tiles of an image, the image comprising a target object, each tile comprising a portion of the target object;
For each image block, determining a target text corresponding to the image block through an anomaly detection model, wherein the anomaly detection model is obtained through the training method of any one of claims 1 to 5, and the target text is used for describing the local anomaly condition of the target object in the image block;
And determining the abnormal information of the target object in the image based on the target texts corresponding to the multiple image blocks, wherein the abnormal information is used for describing the abnormal condition of the target object.
7. The method according to claim 6, wherein determining the anomaly information of the target object in the image based on the target text corresponding to each of the plurality of tiles includes:
Determining that an abnormality exists in the target object in the image and determining that the abnormality exists in the local part of the target object under the condition that the target text corresponding to at least one of the plurality of tiles indicates that the abnormality exists in the local part of the target object;
and under the condition that the target text corresponding to each of the plurality of tiles indicates that no abnormality exists in the part of the target object, determining that no abnormality exists in the target object in the image.
8. The method according to claim 6, wherein for each tile, determining, by an anomaly detection model, a target text corresponding to the tile includes:
For each image block, extracting image features of the image block through the anomaly detection model, determining similarity between the image features and a plurality of preset text features, and determining target text features with the similarity meeting preset requirements from the preset text features, wherein the preset text features respectively correspond to preset texts, and the target texts are preset texts corresponding to the target text features.
9. The method of detecting according to claim 8, wherein the capturing a plurality of tiles of an image comprises:
And respectively dividing the image based on sliding windows with a plurality of sizes to obtain block sets with respective sizes, wherein the block sets with each size comprise a plurality of blocks with the sizes.
10. The method according to claim 9, wherein each tile includes a plurality of pixels, and the determining the anomaly information of the target object in the image based on the target text corresponding to each of the plurality of tiles includes:
for each tile of each size, assigning a similarity between an image feature of the tile and a corresponding target text feature to a plurality of pixel points in the tile if the target text corresponding to the tile indicates that there is an abnormality in a portion of the target object in the tile;
for each pixel point, obtaining an abnormal value of the pixel point based on the similarity of the pixel point under the multiple sizes, wherein the abnormal value is used for indicating the probability of abnormality of the pixel point;
And determining the abnormal information of the target object in the image based on the abnormal value of each pixel point in the image, wherein the abnormal information of the target object comprises at least one of the position and the abnormal area of the pixel point where the target object is abnormal.
11. The method of detecting according to claim 6, wherein the image is plural, the method further comprising:
Determining a plurality of target images in the plurality of images based on respective abnormality information of the target objects in the plurality of images, wherein no abnormality exists in the target objects in the target images;
Determining non-abnormal characteristics of the target object based on the plurality of target images, wherein the non-abnormal characteristics are characteristics of the target object without abnormality;
And performing anomaly detection on the image comprising the target object based on the non-anomaly characteristic.
12. The method of detecting according to claim 6, wherein the image is plural, the method further comprising:
Determining a plurality of target tiles in the plurality of tiles based on target texts corresponding to the plurality of tiles respectively, wherein no abnormality exists in a part of the target object in the target tiles;
Determining a local non-abnormal characteristic of the target object based on a plurality of target tiles of a plurality of images, wherein the non-abnormal characteristic is a characteristic of the target object with no abnormal part;
abnormality detection is performed on a local tile including the target object based on the non-abnormal feature.
13. A training device for an anomaly detection model, the device comprising:
an acquisition module for acquiring a plurality of second sample images, each of the plurality of second sample images including a target object; for each second sample image, dividing the target object from the second sample image to obtain a first sample image;
The acquisition module is further configured to acquire a plurality of groups of sample pairs based on a plurality of first sample images, where the plurality of first sample images each include the target object, each group of sample pairs includes a sample block and a sample text, the sample block includes a part of the target object, and the sample text is used to describe an abnormal situation of the part in the sample block;
The extraction module is used for extracting characteristics of a sample block and a sample text in each group of sample pairs through an abnormality detection model, so as to obtain image characteristics of the sample block and text characteristics of the sample text, and determining similarity between the image characteristics and the text characteristics, wherein the abnormality detection model is used for carrying out abnormality detection on part of the target object in the input block;
and the training module is used for iteratively training the abnormal detection model based on the respective similarity and the preset similarity of the plurality of groups of samples.
14. The apparatus of claim 13, wherein the means for obtaining is configured to:
For each first sample image, respectively dividing the first sample image based on sliding windows with a plurality of sizes to obtain a plurality of block sets with respective sizes, wherein a plurality of sample blocks included in each block set with the size are all of the sizes;
and obtaining the plurality of groups of sample pairs based on the sample blocks in the block sets of the plurality of first sample images and the sample text corresponding to each sample block.
15. The apparatus of claim 13, wherein each sample tile corresponds to a plurality of sample texts that each describe a local anomaly in the sample tile in a different text; the extraction module is used for:
and respectively extracting features of a plurality of sample texts in the sample pair through the anomaly detection model to obtain initial text features respectively corresponding to the plurality of sample texts, and determining the average value of the plurality of initial text features to obtain the text features.
16. The apparatus of claim 15, wherein the apparatus further comprises:
The filling module is used for filling a plurality of text templates based on the local part in each sample block and the local abnormal information in the sample block to obtain a plurality of sample texts, the abnormal information is used for describing the local abnormal condition, and the text templates are different.
17. The apparatus of claim 13, wherein the acquisition module is further configured to acquire a plurality of third sample images, each of the plurality of third sample images including the target object;
The device further comprises a calibration module, which is used for calibrating the position of the target object in the third sample image based on the position of the target object in the template image for each third sample image, so as to obtain the first sample image, wherein the position of the target object in the first sample image is matched with the position of the target object in the template image.
18. An object abnormality detection apparatus, characterized by comprising:
An acquisition module to acquire a plurality of tiles of an image, the image comprising a target object, each tile comprising a portion of the target object;
the determining module is used for determining a target text corresponding to each block according to an abnormality detection model, wherein the abnormality detection model is obtained by the training method of any one of claims 1-5, and the target text is used for describing local abnormality of the target object in the block;
the determining module is further configured to determine, based on target texts corresponding to the multiple tiles, anomaly information of a target object in the image, where the anomaly information is used to describe an anomaly condition of the target object.
19. The apparatus of claim 18, wherein the means for determining is configured to:
Determining that an abnormality exists in the target object in the image and determining that the abnormality exists in the local part of the target object under the condition that the target text corresponding to at least one of the plurality of tiles indicates that the abnormality exists in the local part of the target object;
and under the condition that the target text corresponding to each of the plurality of tiles indicates that no abnormality exists in the part of the target object, determining that no abnormality exists in the target object in the image.
20. The apparatus of claim 18, wherein the means for determining is configured to:
For each image block, extracting image features of the image block through the anomaly detection model, determining similarity between the image features and a plurality of preset text features, and determining target text features with the similarity meeting preset requirements from the preset text features, wherein the preset text features respectively correspond to preset texts, and the target texts are preset texts corresponding to the target text features.
21. The apparatus of claim 20, wherein the means for obtaining is configured to:
And respectively dividing the image based on sliding windows with a plurality of sizes to obtain block sets with respective sizes, wherein the block sets with each size comprise a plurality of blocks with the sizes.
22. The apparatus of claim 21, wherein each tile comprises a plurality of pixels, the determining module to:
for each tile of each size, assigning a similarity between an image feature of the tile and a corresponding target text feature to a plurality of pixel points in the tile if the target text corresponding to the tile indicates that there is an abnormality in a portion of the target object in the tile;
for each pixel point, obtaining an abnormal value of the pixel point based on the similarity of the pixel point under the multiple sizes, wherein the abnormal value is used for indicating the probability of abnormality of the pixel point;
And determining the abnormal information of the target object in the image based on the abnormal value of each pixel point in the image, wherein the abnormal information of the target object comprises at least one of the position and the abnormal area of the pixel point where the target object is abnormal.
23. The apparatus of claim 18, wherein the plurality of images are provided, the determining module further configured to determine a plurality of target images in the plurality of images based on anomaly information for each of the target objects in the plurality of images, the target objects in the target images being free of anomalies; determining non-abnormal characteristics of the target object based on the plurality of target images, wherein the non-abnormal characteristics are characteristics of the target object without abnormality;
The apparatus further includes a first detection module for anomaly detection of an image including a target object based on the non-anomaly feature.
24. The apparatus of claim 18, wherein the image is a plurality of the determining module is further configured to determine a plurality of target tiles of the plurality of tiles based on target text corresponding to the plurality of tiles, respectively, wherein there is no anomaly in a portion of the target object in the target tiles; determining a local non-abnormal characteristic of the target object based on a plurality of target tiles of a plurality of images, wherein the non-abnormal characteristic is a characteristic of the target object with no abnormal part;
The apparatus also includes a second detection module to detect anomalies in a tile that includes a portion of the target object based on the non-anomalies.
25. A computer device, characterized in that the computer device comprises a processor and a memory for storing at least one program, which is loaded by the processor and which executes the training method of the abnormality detection model according to any one of claims 1 to 5 or the object abnormality detection method according to any one of claims 6 to 12.
26. A computer-readable storage medium storing at least one program for executing the training method of the abnormality detection model according to any one of claims 1 to 5 or the object abnormality detection method according to any one of claims 6 to 12.
27. A computer program product, characterized in that the computer program product comprises at least one program stored in a computer-readable storage medium, from which the at least one program is read by a processor of a computer device, the processor executing the at least one program causing the computer device to execute the training method of the abnormality detection model according to any one of claims 1 to 5 or the object abnormality detection method according to any one of claims 6 to 12.
CN202410405801.XA 2024-04-07 2024-04-07 Training method of anomaly detection model, object anomaly detection method and device Active CN117992898B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410405801.XA CN117992898B (en) 2024-04-07 2024-04-07 Training method of anomaly detection model, object anomaly detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410405801.XA CN117992898B (en) 2024-04-07 2024-04-07 Training method of anomaly detection model, object anomaly detection method and device

Publications (2)

Publication Number Publication Date
CN117992898A CN117992898A (en) 2024-05-07
CN117992898B true CN117992898B (en) 2024-06-14

Family

ID=90890840

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410405801.XA Active CN117992898B (en) 2024-04-07 2024-04-07 Training method of anomaly detection model, object anomaly detection method and device

Country Status (1)

Country Link
CN (1) CN117992898B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117011859A (en) * 2022-11-11 2023-11-07 腾讯科技(深圳)有限公司 Picture processing method and related device

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7657100B2 (en) * 2005-05-09 2010-02-02 Like.Com System and method for enabling image recognition and searching of images
US11151325B2 (en) * 2019-03-22 2021-10-19 Servicenow, Inc. Determining semantic similarity of texts based on sub-sections thereof
US20230281959A1 (en) * 2020-03-25 2023-09-07 Yissum Research Development Company Of The Hebrew University Of Jerusalem Ltd. Deep learning-based anomaly detection in images
CN111860674B (en) * 2020-07-28 2023-09-19 平安科技(深圳)有限公司 Sample category identification method, sample category identification device, computer equipment and storage medium
CN113159095B (en) * 2021-01-30 2024-04-30 华为技术有限公司 Model training method, image retrieval method and device
EP4266195A1 (en) * 2022-04-19 2023-10-25 Microsoft Technology Licensing, LLC Training of text and image models

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117011859A (en) * 2022-11-11 2023-11-07 腾讯科技(深圳)有限公司 Picture processing method and related device

Also Published As

Publication number Publication date
CN117992898A (en) 2024-05-07

Similar Documents

Publication Publication Date Title
CN109299315B (en) Multimedia resource classification method and device, computer equipment and storage medium
CN111091132B (en) Image recognition method and device based on artificial intelligence, computer equipment and medium
CN110555839A (en) Defect detection and identification method and device, computer equipment and storage medium
CN111476783B (en) Image processing method, device and equipment based on artificial intelligence and storage medium
CN110059685A (en) Word area detection method, apparatus and storage medium
CN111325699B (en) Image restoration method and training method of image restoration model
CN110490179B (en) License plate recognition method and device and storage medium
CN111192262A (en) Product defect classification method, device, equipment and medium based on artificial intelligence
CN110807361A (en) Human body recognition method and device, computer equipment and storage medium
CN114332530A (en) Image classification method and device, computer equipment and storage medium
CN112749613B (en) Video data processing method, device, computer equipment and storage medium
CN110991457B (en) Two-dimensional code processing method and device, electronic equipment and storage medium
CN111738365B (en) Image classification model training method and device, computer equipment and storage medium
CN114511864B (en) Text information extraction method, target model acquisition method, device and equipment
CN114283299A (en) Image clustering method and device, computer equipment and storage medium
CN114677350A (en) Connection point extraction method and device, computer equipment and storage medium
CN113761195A (en) Text classification method and device, computer equipment and computer readable storage medium
CN113516665A (en) Training method of image segmentation model, image segmentation method, device and equipment
CN113763931A (en) Waveform feature extraction method and device, computer equipment and storage medium
CN110232417B (en) Image recognition method and device, computer equipment and computer readable storage medium
CN112818979A (en) Text recognition method, device, equipment and storage medium
CN112053360A (en) Image segmentation method and device, computer equipment and storage medium
CN117992898B (en) Training method of anomaly detection model, object anomaly detection method and device
CN113569822B (en) Image segmentation method and device, computer equipment and storage medium
CN113343709B (en) Method for training intention recognition model, method, device and equipment for intention recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant