CN116977815A - Data processing method, device, intelligent equipment, storage medium and product - Google Patents

Data processing method, device, intelligent equipment, storage medium and product Download PDF

Info

Publication number
CN116977815A
CN116977815A CN202310202755.9A CN202310202755A CN116977815A CN 116977815 A CN116977815 A CN 116977815A CN 202310202755 A CN202310202755 A CN 202310202755A CN 116977815 A CN116977815 A CN 116977815A
Authority
CN
China
Prior art keywords
image
model
parameters
channel
enhancement
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310202755.9A
Other languages
Chinese (zh)
Inventor
林志文
鄢科
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202310202755.9A priority Critical patent/CN116977815A/en
Publication of CN116977815A publication Critical patent/CN116977815A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the application provides a data processing method, a device, intelligent equipment, a storage medium and a product, wherein the method comprises the following steps: acquiring a first image to be processed, and calling a target domain countermeasure enhancement model to process the first image to obtain image enhancement parameters of the first image; transforming the first image by using the image augmentation parameters to obtain a second image, and calling an initial image recognition model to process the second image to obtain an image recognition result of the second image; determining target difference data corresponding to the first image according to the image augmentation parameters of the first image, the image recognition results of the second image and the image labels; and updating the model parameters of the target domain countermeasure enhancement model and the model parameters of the initial image recognition model according to the target difference data, and determining the target image recognition model according to the initial image recognition model after updating the model parameters. By adopting the application, the image recognition capability of the image recognition model can be improved.

Description

Data processing method, device, intelligent equipment, storage medium and product
Technical Field
The present application relates to the field of computer application technologies, and in particular, to a data processing method, apparatus, intelligent device, storage medium, and product.
Background
With rapid development of computer and information technology, image recognition models are widely used in various fields, such as recognizing the category to which an image belongs, the position of a target in an image, the content contained in a target, and the like, through the image recognition models.
The data augmentation technology is to randomly change the form of input data to generate a large amount of different data, so as to alleviate the problem of over-fitting during model training. However, at present, a set of image augmentation parameters are set for a training sample set used for training an image recognition model, if the set image augmentation parameters are too low in intensity, the image change is possibly too small, the data diversity of the image recognition model in a training stage cannot be increased, the generalization capability of the model is further affected, and if the set image augmentation parameters are too high in intensity, the image cannot be recognized, the data noise is increased, and the recognition accuracy of the model is further affected.
Disclosure of Invention
The embodiment of the application provides a data processing method, a device, intelligent equipment, a storage medium and a product, which can determine unique image augmentation parameters according to the characteristics of an image, ensure that the image can be changed to the greatest extent on the premise of being identified, achieve the balance of the identification and the data diversity, and further improve the image identification capability of an image identification model.
In one aspect, an embodiment of the present application provides a data processing method, where the method includes:
acquiring a first image to be processed, and calling a target domain countermeasure enhancement model to process the first image to obtain image enhancement parameters of the first image;
transforming the first image by using the image augmentation parameters to obtain a second image, and calling an initial image recognition model to process the second image to obtain an image recognition result of the second image;
determining target difference data corresponding to the first image according to the image augmentation parameters of the first image, the image recognition result of the second image and the image label;
and updating the model parameters of the target domain countermeasure enhancement model and the model parameters of the initial image recognition model according to the target difference data, and determining a target image recognition model according to the initial image recognition model after updating the model parameters.
In one aspect, an embodiment of the present application provides a data processing apparatus, including:
the acquisition unit is used for acquiring a first image to be processed, and calling a target domain antagonism enhancement model to process the first image to obtain an image enhancement parameter of the first image;
The processing unit is used for carrying out transformation processing on the first image by utilizing the image augmentation parameters to obtain a second image, and calling an initial image recognition model to process the second image to obtain an image recognition result of the second image;
the processing unit is further used for determining target difference data corresponding to the first image according to the image augmentation parameters of the first image, the image recognition results of the second image and the image labels;
the processing unit is further configured to update model parameters of the target domain countermeasure enhancement model and model parameters of the initial image recognition model according to the target difference data, and determine a target image recognition model according to the initial image recognition model after updating the model parameters.
In one aspect, an embodiment of the present application provides an intelligent device, where the intelligent device includes a processor, a communication interface, and a memory, where the processor, the communication interface, and the memory are connected to each other, where the memory stores a computer program, and the processor is configured to invoke the computer program to execute a data processing method according to any of the possible implementations described above.
In one aspect, embodiments of the present application provide a computer readable storage medium storing a computer program which, when executed by a processor, implements a data processing method of any one of the possible implementations.
In one aspect, the embodiment of the present application further provides a computer program product, where the computer program product includes a computer program or computer instructions, and the computer program or computer instructions are executed by a processor to implement the steps of the data processing method provided by the embodiment of the present application.
In one aspect, an embodiment of the present application further provides a computer program, where the computer program includes computer instructions, where the computer instructions are stored in a computer readable storage medium, and a processor of an intelligent device reads the computer instructions from the computer readable storage medium, and the processor executes the computer instructions to implement the data processing method provided by the embodiment of the present application.
In the method provided by the application, the target domain countermeasure enhancement model takes a single image sample as granularity, the unique image enhancement parameters are determined according to the characteristics of the image, in addition, the target difference data are determined according to the image enhancement parameters, the image recognition result output by the initial image recognition model and the image label, the model parameters of the target domain countermeasure enhancement model and the model parameters of the initial image recognition model are updated by utilizing the target difference data, the image enhancement parameters determined by the target domain countermeasure enhancement model tend to be increased under the action of the target difference data, so that the data enhancement intensity is increased, and meanwhile, the increase of the image enhancement parameters is slowed down according to the difference between the image recognition result output by the initial image recognition model and the image label, the determined image enhancement parameters are changed to the greatest extent on the premise of ensuring the image recognition, the image is further transformed by utilizing the determined image enhancement parameters, the image recognition model is expected to be obtained, and the image recognition model is trained by utilizing the image, so that the image recognition model has stronger diversity and flexibility, the image recognition model has stronger generalization capability and robustness, and is favorable for the image recognition capability.
Drawings
In order to more clearly illustrate the technical method of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described, and it is apparent that the drawings in the following description are some embodiments of the present application, and other drawings may be obtained according to the drawings without inventive effort for those skilled in the art.
FIG. 1 is a schematic diagram of a data processing system according to an embodiment of the present application;
FIG. 2 is a schematic flow chart of a data processing method according to an embodiment of the present application;
FIG. 3 is a flowchart illustrating another data processing method according to an embodiment of the present application;
FIG. 4 is a schematic diagram of determining weight coefficients according to an embodiment of the present application;
FIG. 5 is a schematic flow chart of a model training method according to an embodiment of the present application;
FIG. 6 is a flowchart illustrating another data processing method according to an embodiment of the present application;
FIG. 7 is a flowchart of another model training method according to an embodiment of the present application;
FIG. 8 is a flowchart of another model training method according to an embodiment of the present application;
FIG. 9 is a schematic diagram of a data processing apparatus according to an embodiment of the present application;
Fig. 10 is a schematic structural diagram of a computer device according to an embodiment of the present application.
Detailed Description
The technical method according to the embodiments of the present application will be clearly and completely described in the following description with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The embodiment of the application provides a data processing method which can be used in a data processing device, wherein the data processing device can be integrated in intelligent equipment, and the intelligent equipment can be terminal equipment or a server. The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, network acceleration services (Content Delivery Network, CDN), basic cloud computing services such as big data and an artificial intelligent platform. Terminal devices include, but are not limited to: smart phones, tablet computers, smart wearable devices, smart voice interaction devices, smart home appliances, personal computers, vehicle terminals, smart cameras, and the like.
In a specific embodiment, the intelligent device may acquire a first image to be processed, call a target domain countermeasure enhancement model to process the first image to obtain an image enhancement parameter of the first image, further utilize the image enhancement parameter to perform transformation processing on the first image to obtain a second image, call an initial image recognition model to process the second image to obtain an image recognition result of the second image, determine target difference data corresponding to the first image according to the image enhancement parameter of the first image, the image recognition result of the second image and an image tag, update model parameters of the target domain countermeasure enhancement model and model parameters of the initial image recognition model according to the target difference data, and finally determine a target image recognition model according to the initial image recognition model after the model parameter update. It should be noted that the target domain antagonism enhancement model and the initial image recognition model mainly refer to a model that needs to be trained currently, and may be a newly constructed network model, or a model that has been trained or used but needs further optimization training currently.
In the above embodiment, the target domain countermeasure enhancement model uses a single image sample as granularity, determines its unique image enhancement parameters according to the characteristics of the image itself, determines target difference data according to the image enhancement parameters, the image recognition result output by the initial image recognition model, and the image tag, and updates the model parameters of the target domain countermeasure enhancement model and the model parameters of the initial image recognition model by using the target difference data.
The data processing method provided by the embodiment of the application relates to artificial intelligence (Artificial Intelligence, AI), which is a theory, a method, a technology and an application system for simulating, extending and expanding human intelligence by using a digital computer or a machine controlled by the digital computer, sensing environment, acquiring knowledge and using the knowledge to acquire an optimal result. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.
The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.
The application relates to a computer vision technology and a machine learning technology in an artificial intelligence technology, which can utilize an image recognition model to carry out image recognition, and meanwhile, the image recognition model can comprise a neural network, and the parameter adjustment of the image recognition model (namely, the adjustment of model parameters) specifically comprises parameter adjustment of various neural networks.
Computer Vision (CV) is a science of how to "look" at a machine, and more specifically, to replace a camera and a Computer to perform machine Vision such as identifying and measuring a target by human eyes, and further perform graphic processing, so that the Computer is processed into an image more suitable for human eyes to observe or transmit to an instrument to detect. As a scientific discipline, computer vision research-related theory and technology has attempted to build artificial intelligence systems that can acquire information from images or multidimensional data.
Machine Learning (ML) is a multi-domain interdisciplinary, involving multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, etc. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, confidence networks, reinforcement learning, transfer learning, induction learning, teaching learning, and the like.
Based on the above described data processing method, a data processing system as shown in fig. 1 may be provided, where the data processing system includes a database 101 and a smart device 102, and the database 101 may be connected to the smart device 102 by wired or wireless communication. The database 101 may be a cloud database or a local database, or may be a private database (i.e., a database in a privateization environment). Database 101 may be used to store image data, which may be collected and uploaded by a terminal device or may be a standard set of image training samples.
In an embodiment, the logic code of the data processing method may be deployed in the smart device 102, and when the smart device 102 receives a corresponding instruction, for example, the object clicks an operation button in the visual deployment interface, the function code may be executed, and the model parameters of the target domain countermeasure enhancement model and the model parameters of the initial image recognition model are updated according to the procedure described in the data processing method. The image acquired by the smart device 102 from the database 101 may be used as the first image to be processed, and in addition, the smart device 102 may also acquire the first image required for training from its own storage space. After training to obtain the target image recognition model, the target image recognition model may be deployed in the smart device 102 to provide image recognition services for the corresponding object. The image recognition services may include image classification services, object localization services, object detection services, object segmentation services, and so forth. For example, the target image recognition model may be used to perform an image classification task, when an object has a requirement of an image classification service, the corresponding terminal device sends a target image to the intelligent device 102, the intelligent device 102 obtains the target image, and invokes the target image recognition model to perform image classification processing on the target image, so as to obtain a category to which the target image belongs; similarly, if the target image recognition model is used for executing a target detection task, the target image recognition model can be called to perform target detection processing on the target image when the target image is acquired, so as to obtain the types and the numbers of targets appearing in the target image; if the target image recognition model is used for executing the target positioning task, the target image recognition model can be called to perform target positioning processing on the target image when the target image is acquired, so as to obtain the position of the target in the target image, and the like.
With reference to fig. 2, fig. 2 is a schematic flow chart of a data processing method according to an embodiment of the present application. The method may be performed by the smart device 102 of fig. 1 described above, and the method includes the following steps.
S201, acquiring a first image to be processed, and calling a target domain countermeasure enhancement model to process the first image to obtain image enhancement parameters of the first image.
The first image may be an image captured by the terminal device, an image downloaded from the network, or an image transmitted by another device. In an embodiment, a training sample set may be obtained, where the training sample set is a set of images that need to be used when performing model training, and the training sample set includes a plurality of images, and any one of the images obtained from the training sample set may be used as the first image to be processed. The number of images in the training sample set can be set according to requirements, for example, if the model needs to be trained 30 times, each time by using 1 ten thousand images, the training sample set can comprise 1 ten thousand images, and each time the model is trained, the training is performed by using the 1 ten thousand images.
The target domain antagonism enhancement model can determine the unique image enhancement parameters of the image according to the characteristics of the image, wherein the image enhancement parameters are used for realizing data enhancement, and the data enhancement refers to the transformation of the data form of the image to be transformed.
It should be noted that the target domain countermeasure enhancement model needs to be further trained to update the model parameters of the target domain countermeasure enhancement model. The goal of training the target domain contrast enhancement model is to obtain better model parameters to output the image enhancement parameters that are expected for data enhancement. The image augmentation parameters expected from data augmentation mean that the image can be changed to the greatest extent on the premise that the image is identifiable. For example, when brightness of an image with darker brightness is adjusted, if brightness is reduced to a larger extent, the image cannot be identified by the image identification model, and at the moment, the image enhancement parameters output by the target domain anti-enhancement model can select to perform brightness enhancement operation to the image to a larger extent, so that the image can be changed to a larger extent, and the image can be prevented from being unidentified. The structure type of the target domain antagonism enhancement model can be set according to the need, for example, the target domain antagonism enhancement model can be composed of a series of convolutional neural networks, and the application is not limited to the structure type; the number of channels of the output layer of the target domain contrast enhancement model is the number of image enhancement parameters that need to be acquired.
In the embodiment of the application, the input object of the target domain antagonism enhancement model is an image (i.e., a first image) in the training sample set, and the output object is a corresponding image enhancement parameter. In one embodiment, the target domain contrast enhancement model may be invoked to process the first image to obtain image enhancement parameters for the first image.
S202, performing transformation processing on the first image by using the image augmentation parameters to obtain a second image, and calling an initial image recognition model to process the second image to obtain an image recognition result of the second image.
The data morphology change involved in the data augmentation can be realized by a channel domain-based transformation process and a spatial domain-based transformation process. The channel domain-based transformation process, that is, the image is subjected to a transformation process of an image channel dimension, for example, the image channel dimension of the image may include brightness, hue, saturation, exposure, sharpness, transparency, contrast, and the like, and the corresponding transformation process manner may include brightness adjustment, hue adjustment, saturation adjustment, exposure adjustment, sharpness adjustment, transparency adjustment, contrast adjustment, and the like. The transformation processing based on the spatial domain, that is, the transformation processing of the image in the image spatial dimension, may be performed in a corresponding transformation processing manner including rotation, translation, scaling, and the like. It should be noted that, each transformation processing mode has a corresponding degree of change of the image augmentation parameter adjustment data form, for example, the image augmentation parameter may include adjustment size and direction of brightness (or hue, saturation, etc.), translation movement direction and distance, rotation direction and angle, etc. Therefore, after the target domain countermeasure enhancement model outputs the image enhancement parameters of the first image, the first image can be subjected to transformation processing by using the transformation mode and the degree of change indicated by the image enhancement parameters, so as to obtain the second image. That is, the first image is an image that requires data augmentation, and the second image is an image that is expected to be obtained after the data augmentation of the first image.
The initial image recognition model is an image recognition model that requires further model training to update model parameters. The image recognition model is used for executing an image recognition task, and the image recognition task may include an image classification task, a target detection task, a target positioning task, a semantic segmentation task, and the like, and specifically may be determined according to actual requirements, which is not limited by the present application. The image classification task aims at judging the category to which the image belongs; the object detection task may identify the category and location of all objects in the image (e.g., identified by a rectangular detection box); the object positioning task aims at identifying the position of an object in an image; the semantic segmentation task needs to judge which pixel points in the image belong to which target.
The objective of training the initial image recognition model is to obtain better model parameters to improve the image recognition capabilities of the image recognition model, which may include accuracy, robustness, and generalization capabilities. The accuracy is used for indicating whether the image recognition model can accurately perform the image recognition task, for example, if the image recognition task is an image classification task, the category to which each image belongs needs to be accurately determined, or if the image recognition task is a target positioning task, the position of the target in the image needs to be accurately positioned, and so on. Robustness means that the image recognition model can accurately recognize the image after the image is subjected to unintentional or intentional transformation; the transformation process may include dimensional changes, compression encoding, translation, cropping, warping, painting, and the like. The generalization capability refers to the adaptability to an unknown image (i.e. an image which is not seen in the training process), and the stronger the generalization capability is, the stronger the adaptability to the unknown image is, that is to say, the image recognition model can still accurately recognize the unknown image.
In the embodiment of the application, the initial image recognition model can be called to process the second image, so as to obtain the image recognition result of the second image. The image recognition result can be determined according to the image recognition task actually executed by the initial image recognition model; for example, if the image recognition model to be trained performs the image classification task, the image recognition result may include a category to which the image belongs, for example, if the second image is a cat image, the corresponding image recognition result is "cat"; if the image recognition model to be trained performs the target positioning task, the corresponding image recognition result may be the position information of the rectangular detection frame containing the target in the second image; if the image recognition model to be trained performs the target detection task, the corresponding image recognition result may include a category to which the target in the second image belongs and position information of a rectangular detection frame containing the target in the second image; if the image recognition model to be trained performs the semantic segmentation task, the corresponding image recognition result may include a category to which each pixel point in the second image belongs.
S203, determining target difference data corresponding to the first image according to the image augmentation parameters of the first image, the image recognition result of the second image and the image label.
The application performs supervised model training (i.e., supervised learning) on the initial image recognition model, wherein the supervised learning is a machine learning task for deducing model parameters from a labeled training sample set. In supervised learning, training samples in a training sample set include input objects and desired outputs. In the embodiment of the application, for the initial image recognition model, the input object is the second image, and the image tag of the second image is expected to be output. It should be explained that if the image recognition model to be trained performs the image classification task, the image tag may be classification tag information, for example, the second image is a cat image, and the corresponding classification tag information is "cat"; if the image recognition model to be trained performs the target positioning task, the image tag can be the position information of a rectangular detection frame containing the target; if the image recognition model to be trained executes the target detection task, the image tag can comprise the position information and the classification tag information of a rectangular detection frame of the target contained in the image; if the image recognition model to be trained performs a semantic segmentation task, the image label may include classification label information of each pixel point in the image.
In an embodiment, the training sample set may include image tags for the first image, and the image tags for the second image may be determined from the image tags for the first image. Specifically, when the second image is obtained after the transformation processing of the first image based on the channel domain, the image tag of the first image may be used as the image tag of the second image, for example, the classification tag information of the first image is "cat", the classification tag information of the second image is also "cat", the position of "cat" in the first image is [2,30], and the position of "cat" in the second image is also [2,30]. When the second image is obtained after the transformation processing of the first image based on the spatial domain, if the image tag comprises the classification tag information, the classification tag information of the first image is determined to be the classification tag information of the second image, and if the image tag comprises the position information of the rectangular detection frame, the corresponding mapping transformation processing is required to be performed on the position information included in the image tag of the first image, so as to generate the position information of the corresponding rectangular detection frame in the second image.
Through the steps, the target difference data corresponding to the first image can be determined according to the image augmentation parameters of the first image, the image recognition results of the second image and the image labels. Specifically, the first difference data is determined according to the image augmentation parameters of the first image. The image augmentation parameter and the first difference data are in an inverse relation, namely, the larger the image augmentation parameter is, the smaller the first difference data is, the smaller the image augmentation parameter is, and the larger the first difference data is. Since the present application tends to change the image to a maximum extent, i.e. tends to increase the image augmentation parameters, the first difference data may be used to indicate the gap between the target domain contrast enhancement model and the desired target domain contrast enhancement model under the current model parameters. Further, second difference data is determined according to the image recognition result of the second image and the image tag. The image tag of the second image may be more accurate than the image recognition result of the second image, and thus the accuracy of the image recognition result of the second image may be evaluated based on the image tag of the second image. In particular implementations, the second difference data may be determined based on a difference between an image recognition result of the second image and the image tag. It will be appreciated that the smaller the difference between the predicted output (i.e. image recognition result) and the desired output (i.e. image tag) for the second image, the more accurate the image recognition of the second image is illustrated, the smaller the second difference data, and thus the second difference data may be used to indicate the gap between the image recognition model under the current model parameters and the desired image recognition model. And finally, determining target difference data corresponding to the first image according to the first difference data and the second difference data, wherein the sum of the first difference data and the second difference data can be specifically determined as the target difference data corresponding to the first image.
S204, updating model parameters of the target domain countermeasure enhancement model and model parameters of the initial image recognition model according to the target difference data, and determining the target image recognition model according to the initial image recognition model after updating the model parameters.
Training the target domain countermeasure enhancement model and the initial image recognition model, that is, updating model parameters of the target domain countermeasure enhancement model and model parameters of the initial image recognition model according to the target difference data, specifically may be: and updating the model parameters of the target domain contrast enhancement model and the model parameters of the initial image recognition model according to the direction of reducing the target difference data. The "direction in which the target difference data is reduced" means that: model optimization direction targeting minimizing target variance data. The model optimization is carried out in the direction, so that the target difference data generated by the target domain countermeasure enhancement model and the initial image recognition model after the model parameters are updated is required to be smaller than the target difference data generated by the target domain countermeasure enhancement model and the initial image recognition model before the model parameters are updated. For example, the target difference data obtained by this calculation is 0.85, and then the target difference data generated by the target domain antagonism enhancement model and the initial image recognition model after updating the model parameters of the target domain antagonism enhancement model and the initial image recognition model according to the direction of reducing the target difference data should be less than 0.85.
In the above steps S201-S204, taking the first image as an example, a training process of the target domain countermeasure-enhancement model and the initial image recognition model is described, in the actual training process of the target domain countermeasure-enhancement model and the initial image recognition model, it is necessary to continuously acquire images from the training sample set to train the target domain countermeasure-enhancement model and the initial image recognition model, and update model parameters of the target domain countermeasure-enhancement model and the initial image recognition model once every training, if the target difference data generated by the target domain countermeasure-enhancement model and the initial image recognition model after multiple updates satisfies the model training stop condition, it may be determined that the training process of the target domain countermeasure-enhancement model and the initial image recognition model is ended, and the initial image recognition model obtained by the last update may be determined as a trained image recognition model (i.e., the target image recognition model). Wherein the target difference data satisfying the model training stop condition includes: the target difference data is smaller than one or more of the set difference threshold and the training times reaching the preset times. In addition, the target domain countermeasure enhancement model obtained by the last update can be used as a pre-training model to carry out data augmentation on the input image in the training process of other image recognition models.
The target image recognition model obtained after training can be used for executing an image recognition task, the target image can be an image to be recognized, which is sent to the intelligent device by other devices, or an image generated by the intelligent device, and the intelligent device can call the target image recognition model to process the target image, so that an image recognition result of the target image is obtained. For example, the terminal device may send an image to the intelligent device, and after receiving the image, the intelligent device may call the target image recognition model to obtain a corresponding image recognition result, and return the image recognition result to the terminal device.
It should be noted that, in the training process of the target domain countermeasure enhancement model and the initial image recognition model, the target difference data is gradually reduced, that is, the first difference data and the second difference data tend to be gradually reduced, and because the first difference data and the image enhancement parameter are in an inverse relation, the image enhancement parameter tends to be increased under the action of the first difference data, and when the image enhancement parameter is increased, the data enhancement intensity of the image is also increased, so that the image becomes difficult to recognize gradually. For example, when the adjustment amplitude of the brightness of the image is too large, the image may become blurred. At this time, the difference between the predicted output and the expected output of the initial image recognition model is increased, that is, the second difference data is increased due to the increase of the image augmentation parameter, but the second difference data tends to be reduced simultaneously under the effect of the reduction of the target difference data, so that the image augmentation parameter can be slowed down to be further increased, and finally the image is changed to the greatest extent on the premise of being recognized, so that the balance of data diversity and recognizability is achieved. It is understood that the identifiable of the present application means that the image recognition model can accurately perform an image recognition task with respect to an image.
In the method provided by the application, the target domain countermeasure enhancement model takes a single image sample as granularity, the unique image enhancement parameters are determined according to the characteristics of the image, in addition, the target difference data are determined according to the image enhancement parameters, the image recognition result output by the initial image recognition model and the image label, the model parameters of the target domain countermeasure enhancement model and the model parameters of the initial image recognition model are updated by utilizing the target difference data, the image enhancement parameters determined by the target domain countermeasure enhancement model tend to be increased under the action of the target difference data, so that the data enhancement intensity is increased, and meanwhile, the increase of the image enhancement parameters is slowed down according to the difference between the image recognition result output by the initial image recognition model and the image label, the determined image enhancement parameters are changed to the greatest extent on the premise of ensuring the image recognition, the image is further transformed by utilizing the determined image enhancement parameters, the image recognition model is expected to be obtained, and the image recognition model is trained by utilizing the image, so that the image recognition model has stronger diversity and flexibility, the image recognition model has stronger generalization capability and robustness, and is favorable for the image recognition capability.
The target domain antagonism enhancement model includes one or both of a channel domain antagonism enhancement model and a spatial domain antagonism enhancement model. The transform processing based on the channel domain in S202 described above can be implemented by the channel domain antagonism enhancement model. The transformation processing based on the spatial domain in S202 described above can be implemented by the spatial domain countermeasure enhancement model. In one embodiment, when the target domain countermeasure enhancement model includes the channel domain countermeasure enhancement model, the present application provides a flowchart of the data processing method shown in fig. 3. The method may be performed by the smart device 102 of fig. 1 described above, and the method includes the following steps.
S301, acquiring a first image to be processed, and calling a channel domain countermeasure enhancement model to process the first image to obtain channel domain enhancement parameters of the first image.
In one embodiment, a training sample set may be obtained, and a plurality of images may be obtained from the training sample set as training samples for a training batch (batch), where the first image may be any of the current training batches. It should be noted that, one training round (epoch) is used for performing a full-scale training on the model for all images in the training sample set, and all images in the training sample set can be divided into N (positive integer) training batches. If there are S (positive integer) training rounds, there are s×n training batches, and the number of updating the model parameters is s×n, that is, once the training of one training batch is completed, updating the model parameters is completed.
The channel domain contrast enhancement model can determine the unique channel domain enhancement parameters of the image according to the characteristics of the image, and the channel domain enhancement parameters are one of the image enhancement parameters and are mainly used for carrying out transformation processing on the image to be transformed in the dimension of the image channel. The type of channel domain antagonism enhancement model may be composed of a number of convolutional neural networks, as the application is not limited in this regard.
It should be noted that the channel domain countermeasure enhancement model needs to be further trained to update the model parameters of the channel domain countermeasure enhancement model. The goal of training the channel domain countermeasure enhancement model is to obtain better model parameters to output channel domain enhancement parameters that are expected for data enhancement. The channel domain augmentation parameters expected by data augmentation refer to the fact that the image can be changed to the greatest extent in the dimension of the image channel on the premise that the image is identifiable. For example, when brightness of an image with darker brightness is adjusted, if brightness is reduced to a larger extent, the image cannot be identified by the image identification model, and at the moment, the channel domain amplifying parameters output by the channel domain anti-amplifying model can select to perform brightness enhancement operation to the image to a larger extent, so that the image can be changed to a larger extent, and the image can be prevented from being unidentified.
In a possible embodiment, after the first image is acquired, the channel domain countermeasure enhancement model may be invoked to process the first image to obtain channel domain enhancement parameters of the first image. In particular, the first image may include a plurality of image channel dimensions, for example, the first image may include three image channel dimensions of Red (Red, R), green (Green, G), blue (B), or three image channel dimensions of Hue (Hue, H), saturation (S), and brightness (Value, V), or four image channel dimensions of Red (R), green (G), blue (B), transparency (Alpha, a), and so on, which is not limited by the present application. It should be noted that, each image channel dimension has a corresponding channel image, and the first image may be synthesized by the channel images corresponding to the respective image channel dimensions included in the first image, for example, the first image may be synthesized by the channel images corresponding to the three image channel dimensions of hue (H), saturation (S) and brightness (V) (i.e., hue image, saturation image, brightness image), or synthesized by the channel images corresponding to the three image channel dimensions of red (R), green (G), and blue (B) (i.e., R image, G image, and B image). Considering that the meaning of the representation of the components of the respective image channel dimensions of the HSV space (including the three image channel dimensions of hue, saturation and brightness) is more intuitive, the present application will be described later taking the example in which the first image includes three image channel dimensions of hue (H), saturation (S) and brightness (V). At this time, if the first image is an RGB image or an image with other formats, the first image is first converted into HSV space, and then the channel domain countermeasure enhancement model is called to process the first image, so as to obtain the channel domain enhancement parameters of the first image. In a specific embodiment, an attention mechanism may be introduced into the channel domain countermeasure enhancement model, for example, the channel domain countermeasure enhancement model is a convolutional neural network based on the attention mechanism, at this time, channel images corresponding to each image channel dimension may be extracted from the first image, the channel domain countermeasure enhancement model may process channel images corresponding to different image channel dimensions of the first image based on the attention mechanism, and generate a channel adjustment parameter set corresponding to each image channel dimension, so as to explicitly generate a channel adjustment parameter set applicable to each of different image channel dimensions for a difference between different channel images.
That is, the channel domain enhancement parameters of the first image include a set of channel adjustment parameters for each image channel dimension in the first image. The channel adjusting parameter set comprises a forward adjusting parameter and a reverse adjusting parameter; the forward adjustment parameter is used for indicating how to forward adjust the channel image corresponding to the image channel dimension, for example, the brightness, the saturation and the brightness are all forward adjusted; the inverse adjustment parameter is used to indicate how to inversely adjust the channel image corresponding to the image channel dimension, for example, reducing brightness, saturation, brightness are all inversely adjusted. In an embodiment, the forward direction adjustment parameters and the reverse direction adjustment parameters included in the three channel adjustment parameter sets may be processed by a sigmoid function to obtain an output with a value of [0,1], and the output is used as the forward direction adjustment parameters and the reverse direction adjustment parameters actually used by the channel adjustment parameter sets. For example, the first image includes three image channel dimensions, hue (H), saturation (S) and brightness (V), the image channel dimensions: the original tone includes a forward regulation parameter of 0, and after being processed by a sigmoid function, the tone is output as 0.5, and 0.5 can be used as the dimension of an image channel: the hue actually includes the forward adjustment parameter.
In cognitive neurology, attention is an indispensable complex cognitive function of humans, referring to the ability of one to choose to ignore some information while focusing on it. In daily life, people can receive a large number of sensory inputs by means of visual, auditory, tactile, etc., but the brain can also work orderly in the bombing of a large amount of external information, because the brain can deliberately or unintentionally select a small part of useful information from the large amount of input information for emphasis processing and ignore other information, e.g. people usually only have a small number of words to be focused and processed when reading. Similarly, the attention mechanism can also enable the neural network to have the capability of focusing on the input features of the neural network, namely selecting specific input features, so that under the condition of limited computing capability, computing resources are distributed to more important tasks by adopting the attention mechanism, and the problem of information overload is effectively solved.
S302, performing channel domain-based transformation processing on the first image by utilizing the channel domain augmentation parameters to obtain a second image.
The channel domain-based transformation processing, namely the image is transformed in the dimension of the image channel, can be realized by the channel domain augmentation parameters determined by the channel domain countermeasure enhancement module. In one embodiment, performing a channel domain based transform process on the first image using the channel domain enhancement parameters to obtain a second image includes: the adjustment direction of each image channel is determined, for example, a forward and reverse direction parameter [ -1,1] is generated, the adjustment direction is determined to be forward adjustment when 1 is selected from the forward and reverse direction parameter [ -1,1], and the adjustment direction is determined to be reverse adjustment when-1 is selected from the forward and reverse direction parameter [ -1,1 ]. The selected adjusting direction, namely the process of selecting 1 or 1 from the forward and reverse amplification direction parameters [ -1,1], can be automatically and randomly selected by the intelligent equipment, or can be determined by the intelligent equipment according to the parameters configured by people. And selecting a reference adjustment parameter from the channel adjustment parameter sets of each image channel dimension according to the adjustment direction of each image channel, wherein when the adjustment direction is forward adjustment, the reference adjustment parameter is a forward adjustment parameter in the channel adjustment parameter sets, and when the adjustment direction is reverse adjustment, the reference adjustment parameter is a reverse adjustment parameter in the channel adjustment parameter sets.
And further determining the adjustment amplitude parameter of each image channel dimension according to the adjustment direction, the reference adjustment parameter and the set amplitude parameter of each image channel. The set amplitude parameter may be set manually, for example, 100. The value selected from the forward and reverse direction parameters may be determined based on the adjustment direction of each image channel, and at this time, the adjustment amplitude parameter for each image channel dimension is determined by multiplying the reference adjustment parameter by the set amplitude parameter by the value selected from the forward and reverse direction parameters. For example, with reference to an adjustment parameter of 0.5, setting an amplitude parameter of 100, and selecting a value of 1 from the forward and reverse amplification direction parameters, it indicates that the adjustment amplitude parameter is 50: and if the dimension of the corresponding image channel is brightness, the brightness is required to be improved, the brightness improvement amplitude is 50, and when the corresponding value selected from the positive and negative amplification direction parameters is-1, the adjustment amplitude parameter is-50, and the brightness is required to be reduced, wherein the brightness reduction amplitude is 50. Therefore, after the adjustment amplitude parameters of the image channel dimensions are obtained, the channel images of the first image in the image channel dimensions can be extracted, and the channel images corresponding to the image channel dimensions are subjected to transformation processing according to the adjustment amplitude parameters of the image channel dimensions, so that the transformed channel images corresponding to the image channel dimensions are obtained.
According to the adjustment amplitude parameter of each image channel dimension, carrying out transformation processing on the channel image corresponding to each image channel dimension to obtain a transformed channel image corresponding to each image channel dimension, including: and adding the color value of each pixel point in the first image in the channel image corresponding to any image channel dimension with the adjustment amplitude parameter of any image channel dimension to obtain a channel image corresponding to any image channel dimension after transformation. It should be noted that, when the dimension of the image channel is brightness, the color value of the pixel point in the corresponding channel image is brightness value; when the dimension of the image channel is tone, the color value of the pixel point in the corresponding channel image is tone value; when the dimension of the image channel is saturation, the color value of the pixel point in the corresponding channel image is the saturation value. For example, if the brightness value 15 of a certain pixel point in the channel image is 50, the brightness value of the pixel point in the transformed channel image is 65. It should be noted that, no matter whether brightness, hue or saturation, the color value of the pixel point in the transformed image channel needs to be within the corresponding value range. For example, the hue has an angular range of [0,360], and if the hue value is 363, 360 is the final hue value, and similarly, if the hue value is-34, 0 is the final hue value.
It should be noted that, when the transformation processing is performed on the image channel dimensions, one or more of the image channel dimensions included in the first image may be selected to be transformed, for example, only the brightness or the tone of the image may be selected to be changed, and at this time, the adjustment amplitude parameter corresponding to the image channel dimension that does not need to be transformed may be 0.
And finally, integrating the channel images corresponding to the channel dimensions of each transformed image to obtain a second image. It should be noted that, one image may be disassembled into channel images corresponding to multiple image channel dimensions, or may be integrated into one image by channel images corresponding to multiple image channel dimensions.
It can be understood that when the model is trained, the channel domain antagonism enhancement model can generate corresponding output for each image in the training sample set, so that data enhancement can be realized at the image sample level, the situation that the intensity of image data enhancement is too low due to uniform configuration of image enhancement parameters can be avoided to a certain extent, the data diversity can be increased, the image recognition model can learn richer and more changeable images in the training process, and the generalization capability of the image recognition model is improved.
S303, calling an initial image recognition model to process the second image to obtain an image recognition result of the second image, and determining target difference data corresponding to the first image according to the channel domain augmentation parameter of the first image, the image recognition result of the second image and the image label.
The step of calling the initial image recognition model to process the second image may refer to the related description in S202, which is not repeated in this embodiment. The following description will be made by taking an initial image recognition model as a deep convolutional neural network and performing an image classification task as an example.
When the channel domain countermeasure enhancement model is trained, target difference data corresponding to the first image is required to be determined according to channel domain enhancement parameters of the first image, image recognition results of the second image and image labels. Specifically, the first difference data is determined according to the channel domain augmentation parameter of the first image. The dimension of each image channel has positive and negative augmentation directions, and only the corresponding direction in the selected direction is optimized when model training is carried out. That is, the first difference data is determined based on only the reference adjustment parameters selected from the channel adjustment parameter sets of the respective image channel dimensions of the first image, and the specific calculation expression is the following expression (1):
L att =-∑sigmoid(θ i ) (1)
Wherein L is att Representing the first difference data, Σ () represents summation, sigmoid () represents a sigmoid function, sigmoid (θ) i ) Representing selected reference adjustment parameters, i.e. theta i Representing the forward or reverse tuning parameters of the channel domain against the actual output of the enhancement model.
The target domain contrast enhancement model and the initial image recognition model are trained subsequently in the direction in which the first difference data becomes smaller, that is, the loss function expression shown in the formula (1) is used to make sigmoid (θ i ) Approaching 1, the image enhancement parameters tend to be larger at this time, so that the enhancement amplitude in each image channel dimension can be increased.
When the initial image recognition model performs the image classification task, second difference data is determined according to the image recognition result and the image label of the second image, and the specific calculation expression can be the following formula (2):
wherein L is ce Representing the second difference data, y i The image tag is represented by a graphic label,representing the image recognition result, N representing the total number of categories.
It should be noted that, the loss function expression shown in the formula (2) may be used to measure the difference between the image recognition result of the second image and the image label, and may be used to perform model classification training, so that the image recognition model may accurately perform image recognition. Accordingly, when the image recognition model is used to perform other types of image recognition tasks, the second difference data may be determined according to a loss function actually used by the image recognition model, for example, when the image recognition model is used to predict position information of a rectangular detection frame, a regression loss function may be used to evaluate a difference between the predicted position information and the true position information, thereby determining the second difference data.
And finally, determining target difference data corresponding to the first image according to the first difference data and the second difference data. In a specific implementation, the weight coefficient of the first difference data may be determined first. Specifically, a current training round of a training sample set including a first image may be obtained, where the current training round refers to a number of times the training sample set is currently performing a full-scale training. And determining the weight coefficient of the first difference data according to the positive correlation relation between the current training round of the training sample set and the weight coefficient of the first difference data, namely that the weight coefficient of the first difference data gradually increases along with the increase of the current training round. As shown in fig. 4, the current training round may be specifically processed by a cosine function to obtain a corresponding weight coefficient.
After obtaining the weight coefficient of the first difference data, the product of the first difference data and the weight coefficient may be obtained, and the sum of the product and the second difference data is used as the target difference data corresponding to the first image, where the specific calculation expression is the following formula (3):
L=L ce +αL att1 (3)
where L represents target difference data, and α represents a weight coefficient of the first difference data.
It should be noted that, because the weight coefficient of the first difference data may be dynamically adjusted, the weight coefficient of the first difference data may be smaller at the initial stage of model training, and at this time, the second difference data may be adjusted with emphasis (even if the second difference data tends to be smaller), so that the intensity of data augmentation may be lower, and because the degree of image change is not large, the learning difficulty of the image recognition model may be reduced at an early stage, and at the later stage of model training, the weight coefficient of the first difference data may be increased, so that the intensity of data augmentation may be increased, and at this time, the degree of image change may be increased, and the intensity of data augmentation may be increased at a later stage, that is, the intensity of image recognition model may be dynamically adjusted by the weight coefficient of the first difference data, so that the robustness and the recognition accuracy of the image recognition model may be improved, and the model training stage may be ensured to be stable.
S304, updating model parameters of the channel domain countermeasure enhancement model and model parameters of the initial image recognition model according to the target difference data, and determining the target image recognition model according to the initial image recognition model updated by the model parameters.
In an embodiment, each image included in the training batch where the first image in the training sample set is located may be acquired, and the target difference data corresponding to each image may be acquired through the steps S301 to S303. And updating the model parameters of the channel domain countermeasure enhancement model and the model parameters of the initial image recognition model according to the target difference data corresponding to each image, for example, solving the (average value of the) sum of the target difference data corresponding to each image in the training batch as the target difference data corresponding to the training batch, and updating the model parameters of the channel domain countermeasure enhancement model and the model parameters of the initial image recognition model according to the direction of reducing the target difference data corresponding to the training batch.
The channel domain countermeasure enhancement model and the initial image recognition model can be trained once by one training batch, in the actual training process of the channel domain countermeasure enhancement model and the initial image recognition model, the channel domain countermeasure enhancement model and the initial image recognition model need to be trained by a plurality of training batches, model parameters of the channel domain countermeasure enhancement model and the initial image recognition model are updated once every training, if target difference data corresponding to the training batch generated by the channel domain countermeasure enhancement model and the initial image recognition model after a plurality of updating meets model training stop conditions, the training process of the channel domain countermeasure enhancement model and the initial image recognition model can be determined to be finished, and the initial image recognition model obtained by the last updating can be determined to be a trained image recognition model (namely a target image recognition model). Wherein, the target difference data corresponding to the training batch meets the model training stopping condition comprises: the target difference data corresponding to the training batch is smaller than one or more of the set difference threshold value and the training times reaching the preset times. In addition, the channel domain antagonism enhancement model obtained by the last update can be used as a pre-training model to transform the input image in the dimension of the image channel in the training process of other image recognition models.
It should be noted that, in the process of training the channel domain contrast enhancement model and the initial image recognition model, the target difference data is gradually reduced, that is, the first difference data and the second difference data tend to be gradually reduced, and because the first difference data and the channel domain enhancement parameter are in an inverse correlation relationship, the channel domain enhancement parameter tends to be increased under the action of the first difference data, and when the channel domain enhancement parameter is increased, the data enhancement intensity of the image is also increased, so that the image is gradually difficult to recognize. For example, when the adjustment amplitude of the brightness of the image is too large, the image may become blurred. At this time, the difference between the predicted output and the expected output of the initial image recognition model is increased, that is, the channel domain expansion parameter is increased to increase the second difference data, but the second difference data tends to be reduced simultaneously under the effect of reducing the target difference data, so that the channel domain expansion parameter can be further increased, and finally the image is changed to the greatest extent on the premise of being recognizable, so that the balance of data diversity and recognizability is achieved. It is understood that the identifiable of the present application means that the image recognition model can accurately perform an image recognition task with respect to an image.
In summary, a flow chart of a model training method shown in fig. 5 is provided, and the method includes steps 1 to 4. Step 1: and converting the first image into HSV space to obtain a channel image of the first image in H, S, V three image channel dimensions. Step 2: and processing the channel images of each image channel dimension by the channel domain antagonism enhancement model to obtain a channel adjustment parameter set of each image channel dimension, and carrying out transformation processing on the channel images of the corresponding image channel dimension by using the selected reference adjustment parameters to obtain channel images corresponding to each image channel dimension after the transformation processing. Sigmoid () in fig. 5 represents the conversion of the selected reference adjustment parameter to a value between 0,1 using a sigmoid function. Step 3: the first difference data is determined using the selected individual reference adjustment parameters. Step 4: and integrating the channel images corresponding to the channel dimensions of each image after the conversion processing to obtain a second image, inputting the second image into an initial image recognition model to obtain an image recognition result of the second image, and determining second difference data according to the image recognition result and the image label of the second image. And finally training the channel domain antagonism enhancement model and the initial image recognition model according to the target difference data determined by the first difference data and the second difference data.
In the method provided by the application, a channel domain countermeasure enhancement model takes a single image sample as granularity, a unique channel domain enhancement parameter is determined according to the characteristics of an image, in addition, target difference data is determined according to the channel domain enhancement parameter, an image recognition result output by an initial image recognition model and an image label, the model parameter of the channel domain countermeasure enhancement model and the model parameter of the initial image recognition model are updated by utilizing the target difference data, the channel domain enhancement parameter determined by the channel domain countermeasure enhancement model tends to be increased under the action of the target difference data, so that the data enhancement intensity is increased, meanwhile, the increase of the channel domain enhancement parameter is slowed down according to the difference between the image recognition result output by the initial image recognition model and the image label, the determined channel domain enhancement parameter is changed to the greatest extent on the premise of ensuring the image recognition, the image is further transformed by utilizing the determined channel domain enhancement parameter, the image recognition model is trained by utilizing the image recognition model, and the image recognition model is endowed with more diversity and flexibility, so that the image recognition model has stronger robustness and is beneficial to the image recognition model.
In one embodiment, when the target domain countermeasure enhancement model includes a spatial domain countermeasure enhancement model, the present application provides a flow chart of a data processing method as shown in fig. 6. The method may be performed by the smart device 102 of fig. 1 described above, and the method includes the following steps.
S601, acquiring a first image to be processed, and calling a spatial domain countermeasure enhancement model to process the first image to obtain spatial domain enhancement parameters of the first image.
In one embodiment, a training sample set may be obtained, and a plurality of images may be obtained from the training sample set as training samples for a training batch (batch), where the first image may be any of the current training batches. It should be noted that, one training round (epoch) is used for performing a full-scale training on the model for all images in the training sample set, and all images in the training sample set can be divided into N (positive integer) training batches. If there are S (positive integer) training rounds, there are s×n training batches, and the number of updating the model parameters is s×n, that is, once the training of one training batch is completed, updating the model parameters is completed.
The spatial domain countermeasure enhancement model can determine the unique spatial domain enhancement parameters of the image according to the characteristics of the image, and the spatial domain enhancement parameters are one of the image enhancement parameters and are mainly used for carrying out transformation processing on the image to be transformed in the spatial dimension of the image. The type of spatial domain contrast enhancement model may be composed of some convolutional neural network, such as one based on the mechanism of attention, as the application is not limited in this regard.
It should be noted that the spatial domain countermeasure enhancement model needs to be further trained to update the model parameters of the spatial domain countermeasure enhancement model. The goal of training the spatial domain countermeasure enhancement model is to obtain better model parameters to output the spatial domain augmentation parameters that are expected for data augmentation. The spatial domain augmentation parameters expected from data augmentation refer to the maximum degree of change that can be made to an image in its spatial dimension on the premise that the image is identifiable. For example, an image containing a "cat" may not accurately perform image classification recognition by the image recognition model if an excessive degree of magnification is performed, resulting in an image containing only a small portion of the "cat" region.
In a possible embodiment, after the first image is acquired, the spatial domain countermeasure enhancement model may be invoked to process the first image to obtain the spatial domain enhancement parameters of the first image. Wherein the first image may comprise a plurality of image space dimensions, and the spatial domain augmentation parameters of the first image comprise a set of spatial transformation parameters for each image space dimension. In a specific implementation, the first image includes a plurality of image space dimensions: position, size and angle; for the image space dimension: the position, its correspondent space transformation parameter set can include the position moves left parameter, moves right parameter, moves up parameter, moves down parameter, or include the space transformation parameter set of the parameter and parameter of the position moves up parameter and parameter of the position moves down to move up parameter and space transformation parameter set including parameter and parameter of the position move down to the position to move left; for the image space dimension: the size, its corresponding set of spatial transformation parameters may include a size reduction parameter and a size enlargement parameter; for the image space dimension: the angle, its corresponding set of spatial transformation parameters may include an angle left-hand parameter and an angle right-hand parameter. It is understood that the position left shift (or right shift, up shift, down shift) parameter may indicate a distance the image has moved to the left (or right, up shift, down shift), the size reduction (or magnification) parameter may indicate a scale of reduction (or magnification) of the image, and the angle left-hand (or right-hand) parameter may indicate an angle of left-hand (or right-hand) of the image.
S602, performing spatial domain based transformation processing on the first image by using the spatial domain augmentation parameters to obtain a second image.
The transformation processing based on the spatial domain, namely the transformation processing on the image in the image spatial dimension, can be realized by the spatial domain augmentation parameters determined by the spatial domain countermeasure enhancement module. In one embodiment, performing spatial domain based transformation processing on the first image using spatial domain augmentation parameters to obtain a second image includes: and acquiring original position information and original pixel values of each pixel point in the first image. The original position information of each pixel point in the first image refers to the position information of each pixel point in the first image before data augmentation is not performed, for example, a coordinate system of the first image may be established, for example, a coordinate system is established with the lower left corner of the image as the origin of coordinates, and then the position information of each pixel point is determined according to the position of each pixel point in the coordinate system. The original pixel values of the pixels in the first image refer to the pixel values of the pixels in the first image before the data augmentation.
Further, the reference transformation parameters are selected from the set of spatial transformation parameters for each image spatial dimension, and it is noted that, since the position shift left and the position shift right are opposite, only one of the position shift left and the position shift right parameters can be selected; similarly, the position up-shift and the position down-shift are opposite, and only one of the position up-shift parameter and the position down-shift parameter can be selected; the downsizing and the upsizing are opposite, and only one of the downsizing parameter and the upsizing parameter can be selected; the angle left-hand and angle right-hand are opposite, and only one of the angle left-hand parameter and the angle right-hand parameter can be selected. In addition, each time the transformation processing of the image space dimension is performed, the transformation processing may be selectively performed on one or more of the three image space dimensions of position, size, and angle, for example, only the position and angle of the image may be selectively changed, and at this time, the reference transformation parameter selected for the corresponding image space dimension may be 0.
After the respective reference transformation parameters are selected, an affine transformation matrix may be constructed using the respective reference transformation parameters selected. Affine transformation is a linear transformation from two-dimensional coordinates to two-dimensional coordinates, and keeps the straightness (i.e. straight lines or straight lines will not bend after transformation, and circular arcs or circular arcs) and parallelism (i.e. keeps the relative position relation between two-dimensional patterns unchanged, parallel lines or parallel lines, and the intersection angle of intersecting straight lines unchanged) of two-dimensional patterns. Affine transformations may be implemented by a composite of a series of atomic transformations, such as translation, scaling, rotation, etc. Whereas the process of transforming an original image into a transformed image by translation, scaling, rotation, etc. can be described by an affine transformation matrix.
Therefore, the application can firstly construct an affine transformation matrix by using each selected reference transformation parameter, and then determine the mapping position information of each pixel point according to the affine transformation matrix and the original position information of each pixel point, wherein the mapping position information of each pixel point refers to the position information of each pixel point in the first image after data augmentation. For example, (x) i ,y i ) Is the original position information of the pixel point i, and the selected reference transformation parameter indicates that the first image needs to be rotated by alpha degrees, and then the mapping position information of the pixel point i is obtained Can be calculated by the following formula (4):
wherein, the liquid crystal display device comprises a liquid crystal display device,representing the determined affine transformation matrix.
The image is a matrix composed of pixel values of 0-255 for a computer, and the pixel value of each pixel point can be changed after the image is transformed in the image space dimension. In an embodiment, the target pixel value of each pixel point may be determined according to the original position information, the mapping position information, and the original pixel value of each pixel point. The target pixel value of each pixel point refers to the pixel value of each pixel point in the first image after data augmentation. The specific calculation expression is shown in the following formula (5):
wherein, the liquid crystal display device comprises a liquid crystal display device,representing a first diagramThe position information (i.e., original position information) in the c-th image channel dimension in the image is (x) i ,y i ) Pixel values of the pixels of (i.e. the original pixel values), are added to the pixel values of the pixels of (a) the pixel values of (b) the pixel values of (c)>Representing the position information (i.e. mapping position information) in the c-th image channel dimension in the second image as +.>The pixel values of the pixels of (i.e., the target pixel values), H, W, represent the image size of the first image.
Finally, after the mapping position information and the target pixel value of each pixel point are known, a second image can be generated according to the mapping position information and the target pixel value of each pixel point.
It can be understood that when the model is trained, the spatial domain antagonism enhancement model can be utilized to generate corresponding output for each image in the training sample set, so that data enhancement can be realized at the image sample level, the situation that the intensity of image data enhancement is too low due to uniform configuration of image enhancement parameters can be avoided to a certain extent, the data diversity can be increased, the image recognition model can learn richer and more changeable images in the training process, and the generalization capability of the image recognition model is improved.
S603, calling an initial image recognition model to process the second image to obtain an image recognition result of the second image, and determining target difference data corresponding to the first image according to the image augmentation parameter of the first image, the image recognition result of the second image and the image label.
The step of calling the initial image recognition model to process the second image may refer to the related description in S202, which is not repeated in this embodiment. The following description will be made by taking an initial image recognition model as a deep convolutional neural network and performing an image classification task as an example.
In training the spatial domain countermeasure enhancement model, it is necessary to use the first image as a basisAnd determining target difference data corresponding to the first image by using the spatial domain augmentation parameters, the image recognition result of the second image and the image label. First, the first difference data may be determined according to the spatial domain expansion parameter of the first image, specifically, the first difference data may be determined according to only the reference transformation parameter selected from the spatial transformation parameter group of each image spatial dimension of the first image, the calculation expression may be represented by the above formula (1), and at this time, θ i Representing the selected individual reference transformation parameters, sigmoid (θ i ) Representing the conversion of selected individual reference transformation parameters to [0,1 ] using a sigmoid function]And a value in between.
And then determining second difference data according to the image recognition result of the second image and the image label, wherein the second difference data can be determined according to the difference between the image recognition result of the second image and the image label. When the initial image recognition model performs the image classification task, the loss function calculation expression of the second difference data may be as shown in the above formula (2). Accordingly, when the image recognition model is used to perform other types of image recognition tasks, the second difference data may be determined according to a loss function actually used by the image recognition model, for example, when the image recognition model is used to predict position information of a rectangular detection frame, a regression loss function may be used to evaluate a difference between the predicted position information and the true position information, thereby determining the second difference data.
And finally, determining target difference data corresponding to the first image according to the first difference data and the second difference data. In a specific implementation, the weight coefficient of the first difference data may be determined first. The step of determining the weight coefficient of the first difference data comprises: a current training round of a training sample set including the first image may be obtained, the current training round being a number of times the training sample set is currently performing a full amount of training. And determining the weight coefficient of the first difference data according to the positive correlation relation between the current training round of the training sample set and the weight coefficient of the first difference data, namely that the weight coefficient of the first difference data gradually increases along with the increase of the current training round. As shown in fig. 4, the current training round may be specifically processed by a cosine function to obtain a corresponding weight coefficient.
After obtaining the weight coefficient of the first difference data, the product of the first difference data and the weight coefficient may be obtained, and the sum of the product and the second difference data is used as the target difference data corresponding to the first image, where the specific calculation expression is as shown in the above formula (3).
It should be noted that, because the weight coefficient of the first difference data may be dynamically adjusted, the weight coefficient of the first difference data may be smaller at the initial stage of model training, and at this time, the second difference data may be adjusted with emphasis (even if the second difference data tends to be smaller), so that the intensity of data augmentation may be lower, and because the degree of image change is not large, the learning difficulty of the image recognition model may be reduced at an early stage, and at the later stage of model training, the weight coefficient of the first difference data may be increased, so that the intensity of data augmentation may be increased, and because the degree of image change is large, the intensity of data augmentation may be adjusted dynamically at a later stage, that is, the robustness and recognition accuracy of the image recognition model may be improved, and meanwhile, the model training stage may be ensured to be stable.
S604, updating model parameters of the space domain countermeasure enhancement model and model parameters of the initial image recognition model according to the target difference data, and determining the target image recognition model according to the initial image recognition model after updating the model parameters.
In an embodiment, each image included in the training batch where the first image in the training sample set is located may be obtained through the steps S601-S603, and the target difference data corresponding to each image may be obtained. And updating model parameters of the spatial domain countermeasure-enhancement model and model parameters of the initial image recognition model according to the target difference data corresponding to each image, for example, solving a (mean value of) the sum of the target difference data corresponding to each image in the training batch as the target difference data corresponding to the training batch, and updating model parameters of the spatial domain countermeasure-enhancement model and model parameters of the initial image recognition model according to the direction of reducing the target difference data corresponding to the training batch.
One training batch can train the space domain countermeasure enhancement model and the initial image recognition model once, in the actual training process of the space domain countermeasure enhancement model and the initial image recognition model, a plurality of training batches are needed to train the space domain countermeasure enhancement model and the initial image recognition model once, model parameters of the space domain countermeasure enhancement model and the initial image recognition model are updated once every training, if target difference data corresponding to the training batch generated by the space domain countermeasure enhancement model and the initial image recognition model after multiple updating meets model training stop conditions, the training process of the space domain countermeasure enhancement model and the initial image recognition model can be determined to be finished, and the initial image recognition model obtained by the last updating can be determined to be a trained image recognition model (namely a target image recognition model). Wherein, the target difference data corresponding to the training batch meets the model training stopping condition comprises: the target difference data corresponding to the training batch is smaller than one or more of the set difference threshold value and the training times reaching the preset times. In addition, the spatial domain countermeasure enhancement model obtained by the last update can be used as a pre-training model to transform the input image in the image space dimension in the training process of other image recognition models.
It should be noted that, in the training process of the spatial domain countermeasure enhancement model and the initial image recognition model, the target difference data is gradually reduced, that is, the first difference data and the second difference data tend to be gradually reduced, and because the first difference data and the spatial domain enhancement parameter are in an inverse relationship, the spatial domain enhancement parameter tends to be increased under the action of the first difference data, and when the spatial domain enhancement parameter becomes large, the data enhancement intensity of the image also becomes large, so that the image becomes difficult to recognize gradually. For example, when the adjustment amplitude of the brightness of the image is too large, the image may become blurred. At this time, the difference between the predicted output and the expected output of the initial image recognition model is increased, that is, the second difference data is increased due to the increase of the spatial domain augmentation parameter, but the second difference data tends to be reduced simultaneously under the effect of the reduction of the target difference data, so that the spatial domain augmentation parameter can be slowed down to be further increased, and finally the image is changed to the greatest extent on the premise of being recognizable, so that the balance of data diversity and recognizability is achieved. It is understood that the identifiable of the present application means that the image recognition model can accurately perform an image recognition task with respect to an image.
In summary, a flow chart of a model training method shown in fig. 7 is provided, and the method includes steps 1-4. Step 1: and processing the first image by the space domain antagonism enhancement model to obtain a space transformation parameter set of each image space dimension. Step 2: selecting reference transformation parameters from the space transformation parameter groups of each image space dimension, constructing an affine transformation matrix by using each selected reference transformation parameter, and outputting the position mapping relation of each pixel point in the first image and the second image by using the affine transformation matrix (namely T in the figure θ (G) The above-mentioned process of obtaining the mapping position information of each pixel point in the first image, and generating the target position information of each pixel point according to the mapping position information, the original pixel value and the original position information of each pixel point (i.e. in the figure)) And finally, generating a second image according to the target position information and the mapping position information of each pixel point. Step 3: first difference data is determined using the selected respective reference transformation parameters. Step 4: inputting the second image into the initial image recognition model to obtain an image recognition result of the second image, and determining second difference data according to the image recognition result and the image label of the second image. And finally training the spatial domain countermeasure enhancement model and the initial image recognition model according to target difference data determined by the first difference data and the second difference data.
In the method provided by the application, a single image sample is taken as granularity of the spatial domain countermeasure enhancement model, the unique spatial domain enhancement parameters are determined according to the characteristics of the image, in addition, the target difference data are determined according to the spatial domain enhancement parameters, the image recognition result output by the initial image recognition model and the image label, the model parameters of the spatial domain countermeasure enhancement model and the model parameters of the initial image recognition model are updated by utilizing the target difference data, the spatial domain enhancement parameters determined by the spatial domain countermeasure enhancement model tend to be increased under the action of the target difference data, so that the data enhancement intensity is increased, and meanwhile, the increase of the spatial domain enhancement parameters is slowed down according to the difference between the image recognition result output by the initial image recognition model and the image label, the determined spatial domain enhancement parameters are changed to the greatest extent on the premise of ensuring the image recognition, the image is further transformed by utilizing the determined spatial domain enhancement parameters, the image recognition model is trained by utilizing the image recognition model, so that the image recognition model has more diversity and flexibility, the image recognition model has stronger robustness and is improved, and the image recognition capability is beneficial to the image recognition model.
In one embodiment, when the target domain countermeasure enhancement model includes a channel domain countermeasure enhancement model and a spatial domain countermeasure enhancement model, the present application provides a model structure diagram as shown in fig. 8. In the model structure, the channel domain countermeasure enhancement model and the spatial domain countermeasure enhancement model are in a parallel structure, any one or more of the channel domain countermeasure enhancement model and the spatial domain countermeasure enhancement model can be selected to perform data augmentation on the input image, and an image after the data augmentation is obtained.
The input image may be a first image to be processed, and in one embodiment, a training sample set may be obtained, and multiple images may be obtained from the training sample set as training samples of one training batch (batch), where the first image may be any one of the current training batches.
When the channel domain contrast enhancement model is selected to perform data enhancement on the input image, the transformation processing (i.e., the transformation processing based on the channel domain) of the first image in the image channel dimension may be performed according to the foregoing steps S301-S302, so as to obtain the second image (i.e., the image after data enhancement). Further, as shown in the above step S303, the first difference data is determined according to the channel domain enhancement parameters of the first image, that is, according to the reference adjustment parameters selected from the channel adjustment parameter sets of the respective image channel dimensions of the first image. And then calling the initial image recognition model to process the second image to obtain an image recognition result of the second image. And determining second difference data according to the image recognition result of the second image and the image label, namely determining the second difference data according to the difference between the image recognition result of the second image and the image label. Finally, the step (3) is adopted to obtain the product of the first difference data and the weight coefficient of the first difference data, and the sum of the product and the second difference data is used as the target difference data corresponding to the first image, and at the moment, the target difference data corresponding to the first image determined by the channel domain contrast enhancement model is called as first target difference data.
When the selection domain contrast enhancement model implements data enhancement on the input image, the transformation processing (i.e., the transformation processing based on the spatial domain) may be performed on the first image in the image space dimension according to the foregoing steps S601-S602, so as to obtain the second image (i.e., the image after data enhancement). Further, as shown in step S603, the first difference data is determined based on the spatial domain expansion parameter of the first image, that is, the first difference data is determined based on the reference transformation parameter selected from the spatial transformation parameter group of each image spatial dimension of the first image. And then calling the initial image recognition model to process the second image to obtain an image recognition result of the second image. And determining second difference data according to the image recognition result of the second image and the image label, namely determining the second difference data according to the difference between the image recognition result of the second image and the image label. Finally, the step (3) is adopted to obtain the product of the first difference data and the weight coefficient of the first difference data, and the sum of the product and the second difference data is used as the target difference data corresponding to the first image, and at the moment, the target difference data corresponding to the first image determined by the spatial domain contrast enhancement model is called as the second target difference data.
It should be noted that, the weight coefficient of the first difference data related to the first target difference data is determined to be the same as the weight coefficient of the first difference data related to the second target difference data, that is, the weight coefficient of the first difference data is determined according to the current training round including the training sample set of the first image, that is, the weight coefficient of the first difference data and the current training round form a positive correlation, and the weight coefficient of the first difference data gradually increases with the increase of the current training round. In the initial stage of model training, the weight coefficient of the first difference data is smaller, the intensity of data augmentation of the first difference data is lower no matter whether the first difference data is in the image space dimension or the image channel dimension, the degree of image change is not large, the learning difficulty of the image recognition model can be reduced in the early stage, in the later stage of model training, the intensity of data augmentation of the first difference data is increased, the degree of image change is large at the moment, the learning difficulty of the image recognition model can be increased in the later stage, and therefore, the intensity of data augmentation (comprising the image channel dimension and the image space dimension) can be dynamically adjusted through the weight coefficient of the first difference data, and the stability of the model training stage can be ensured while the robustness and the recognition accuracy of the image recognition model are improved.
When training the channel domain countermeasure enhancement model, the spatial domain countermeasure enhancement model and the initial image recognition model, in an embodiment, each image included in a training batch where a first image in a training sample set is located may be acquired, and one or more of the channel domain countermeasure enhancement model and the spatial domain countermeasure enhancement model and the initial image recognition model are adopted to process each image, so as to obtain one or more of first target difference data and second target difference data corresponding to each image. Understandably, the channel domain antagonism enhancement model and the initial image recognition model are adopted to process the image, so that first target difference data corresponding to the image can be obtained; and processing the image by adopting the space domain antagonism enhancement model and the initial image recognition model to obtain second target difference data corresponding to the image. And then taking the obtained (average value of) one or more of the first target difference data and the second target difference data corresponding to each image as target difference data corresponding to the training batch, and updating model parameters of the channel domain countermeasure enhancement model, model parameters of the space domain countermeasure enhancement model and model parameters of the initial image recognition model according to the direction of reducing the target difference data corresponding to the training batch.
It can be understood that when the model is trained, the channel domain countermeasure enhancement model and the space domain countermeasure enhancement model can be utilized to generate corresponding output for each image in the training sample set, so that data augmentation is realized at the image sample level, the situation that the intensity of image data augmentation is too low due to unified configuration of image augmentation parameters can be avoided to a certain extent, the data diversity is favorably increased, the image recognition model can learn richer and more changeable images in the training process, and the generalization capability of the image recognition model is improved.
The channel domain countermeasure enhancement model, the spatial domain countermeasure enhancement model and the initial image recognition model can be trained once by one training batch, in the actual training process of the channel domain countermeasure enhancement model, the spatial domain countermeasure enhancement model and the initial image recognition model, the channel domain countermeasure enhancement model, the spatial domain countermeasure enhancement model and the initial image recognition model need to be trained by a plurality of training batches, model parameters of the channel domain countermeasure enhancement model, the spatial domain countermeasure enhancement model and the initial image recognition model are updated once every training, if the target difference data corresponding to the training batches generated by the channel domain countermeasure enhancement model, the spatial domain countermeasure enhancement model and the initial image recognition model after a plurality of times of updating meet the model training stopping condition, the training process of the channel domain countermeasure enhancement model, the spatial domain countermeasure enhancement model and the initial image recognition model can be determined, and the initial image recognition model obtained by the last updating can be determined as a trained image recognition model (namely the target image recognition model). Wherein, the target difference data corresponding to the training batch meets the model training stopping condition comprises: the target difference data corresponding to the training batch is smaller than one or more of the set difference threshold value and the training times reaching the preset times. In addition, the channel domain countermeasure enhancement model and the spatial domain countermeasure enhancement model which are updated for the last time can be used as a pre-training model to perform transformation processing on the input image in the image channel dimension and the image space dimension in the training process of other image recognition models.
It will be appreciated that other data augmentation approaches than channel and spatial domain data augmentation may be handled in a similar fashion to sample-level antagonism. For example, the Gaussian noise parameters generated for the image are determined through the countermeasure enhancement model, and then the Gaussian noise parameters are used for adding Gaussian noise to the image, so that the image after data enhancement is obtained.
In the embodiment of the application, in the process of training the channel domain countermeasure enhancement model, the spatial domain countermeasure enhancement model and the initial image recognition model, the first target difference data (or the second target difference data) is gradually reduced, that is, the first difference data and the second difference data are gradually reduced, and because the first difference data and the channel domain enhancement parameter (or the spatial domain difference data) form an inverse correlation relationship, the channel domain enhancement parameter (or the spatial domain enhancement parameter) is gradually increased under the action of the first difference data, and when the channel domain enhancement parameter (or the spatial domain enhancement parameter) is increased, the data enhancement intensity of the image is also increased, so that the image is gradually difficult to recognize. At this time, the difference between the predicted output and the expected output of the initial image recognition model is increased, that is, the channel domain augmentation parameter is increased (or the spatial domain augmentation parameter) so that the second difference data is increased, but the second difference data tends to be reduced simultaneously under the effect that the first target difference data (or the second target difference data) is reduced, so that the channel domain augmentation parameter (or the spatial domain augmentation parameter) can be further increased, and finally, the image is changed to the greatest extent on the premise of being recognizable, so that the balance between the data diversity and the recognizability is achieved. In addition, the channel domain countermeasure enhancement model and the space domain countermeasure enhancement model can be utilized to carry out data augmentation on the input image in different modes, so that the number of the images can be enhanced while richer and diversified images are generated, the image recognition model is trained through the images, more diversity and flexibility can be given to the image recognition model, the image recognition model has stronger generalization capability and robustness, and the image recognition capability of the image recognition model is improved.
It will be appreciated that in the specific embodiment of the present application, related data such as training sample sets and first images are involved, and when the above embodiments of the present application are applied to specific products or technologies, user permissions or consents need to be obtained, and the collection, use and processing of related data need to comply with relevant laws and regulations and standards of relevant countries and regions.
The foregoing details of the method of the present application and, in order to facilitate better practice of the method of the present application, a device of the present application is provided below. Referring to fig. 9, fig. 9 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application, and the data processing apparatus 90 may include:
an obtaining unit 901, configured to obtain a first image to be processed, and call a target domain countermeasure enhancement model to process the first image, so as to obtain an image enhancement parameter of the first image;
the processing unit 902 is configured to perform a transformation process on the first image by using the image augmentation parameter to obtain a second image, and call an initial image recognition model to process the second image, so as to obtain an image recognition result of the second image;
The processing unit 902 is further configured to determine target difference data corresponding to the first image according to the image augmentation parameter of the first image, the image recognition result of the second image, and the image tag;
the processing unit 902 is further configured to update model parameters of the target domain countermeasure enhancement model and model parameters of the initial image recognition model according to the target difference data, and determine a target image recognition model according to the initial image recognition model after updating the model parameters.
In an embodiment, the target domain antagonism enhancement model comprises one or both of a channel domain antagonism enhancement model and a spatial domain antagonism enhancement model; the processing unit 902 is specifically configured to:
when the target domain countermeasure enhancement model comprises the channel domain countermeasure enhancement model, performing channel domain-based transformation processing on the first image by utilizing channel domain enhancement parameters comprising the image enhancement parameters to obtain a second image;
and when the target domain countermeasure enhancement model comprises the spatial domain countermeasure enhancement model, performing spatial domain-based transformation processing on the first image by utilizing the spatial domain enhancement parameters comprising the image enhancement parameters to obtain a second image.
In an embodiment, the first image comprises a plurality of image channel dimensions, the channel domain enhancement parameters comprise a set of channel adjustment parameters for each image channel dimension, the set of channel adjustment parameters comprising a forward adjustment parameter and a reverse adjustment parameter; the processing unit 902 is specifically configured to:
determining the adjustment direction of each image channel, and selecting a reference adjustment parameter from a channel adjustment parameter group of each image channel dimension according to the adjustment direction;
determining the adjustment amplitude parameter of each image channel dimension according to the adjustment direction of each image channel, the reference adjustment parameter and the set amplitude parameter;
extracting channel images of the first image in each image channel dimension, and carrying out transformation processing on the channel images corresponding to each image channel dimension according to the adjustment amplitude parameters of each image channel dimension to obtain transformed channel images corresponding to each image channel dimension;
and integrating the channel images corresponding to the channel dimensions of the transformed images to obtain a second image.
In an embodiment, the first image comprises a plurality of image space dimensions, the spatial domain augmentation parameters comprising a set of spatial transformation parameters for each image space dimension; the acquiring unit 901 is specifically configured to:
Acquiring original position information and original pixel values of each pixel point in the first image;
the processing unit 902 is specifically configured to:
selecting reference transformation parameters from the space transformation parameter group of each image space dimension, and constructing an affine transformation matrix by utilizing each selected reference transformation parameter;
determining mapping position information of each pixel point according to the affine transformation matrix and the original position information of each pixel point, and determining a target pixel value of each pixel point according to the original position information, the mapping position information and the original pixel value of each pixel point;
and generating a second image according to the mapping position information of each pixel point and the target pixel value.
In an embodiment, the processing unit 902 is specifically configured to:
determining first difference data according to an image augmentation parameter of the first image;
determining second difference data according to an image recognition result and an image label of the second image;
and determining target difference data corresponding to the first image according to the first difference data and the second difference data.
In an embodiment, the obtaining unit 901 is specifically configured to:
Acquiring a current training round of a training sample set comprising the first image;
the processing unit 902 is specifically configured to:
determining a weight coefficient of the first difference data according to the current training round, wherein a positive correlation is formed between the weight coefficient and the current training round;
and determining the product of the first difference data and the weight coefficient, and taking the sum of the product and the second difference data as target difference data corresponding to the first image.
In an embodiment, the processing unit 902 is specifically configured to:
determining first difference data according to reference adjustment parameters selected from a set of channel adjustment parameters for respective image channel dimensions of the first image when the target domain countermeasure enhancement model includes the channel domain countermeasure enhancement model;
when the target domain countermeasure enhancement model includes the spatial domain countermeasure enhancement model, first difference data is determined from reference transformation parameters selected from a set of spatial transformation parameters for respective image spatial dimensions of the first image.
In an embodiment, the obtaining unit 901 is specifically configured to:
acquiring each image included in a training batch where the first image is located in a training sample set;
Acquiring target difference data corresponding to each image;
the processing unit 902 is specifically configured to:
and updating the model parameters of the target domain antagonism enhancement model and the model parameters of the initial image recognition model according to the target difference data corresponding to each image.
It may be understood that the functions of each functional unit of the data processing apparatus described in the embodiments of the present application may be specifically implemented according to the method in the embodiments of the method, and the specific implementation process may refer to the relevant description of the embodiments of the method and will not be repeated herein.
In the method provided by the application, the target domain countermeasure enhancement model takes a single image sample as granularity, the unique image enhancement parameters are determined according to the characteristics of the image, in addition, the target difference data are determined according to the image enhancement parameters, the image recognition result output by the initial image recognition model and the image label, the model parameters of the target domain countermeasure enhancement model and the model parameters of the initial image recognition model are updated by utilizing the target difference data, the image enhancement parameters determined by the target domain countermeasure enhancement model tend to be increased under the action of the target difference data, so that the data enhancement intensity is increased, and meanwhile, the increase of the image enhancement parameters is slowed down according to the difference between the image recognition result output by the initial image recognition model and the image label, the determined image enhancement parameters are changed to the greatest extent on the premise of ensuring the image recognition, the image is further transformed by utilizing the determined image enhancement parameters, the image recognition model is expected to be obtained, and the image recognition model is trained by utilizing the image, so that the image recognition model has stronger diversity and flexibility, the image recognition model has stronger generalization capability and robustness, and is favorable for the image recognition capability.
As shown in fig. 10, fig. 10 is a schematic structural diagram of an intelligent device according to an embodiment of the present application, where an internal structure of the intelligent device 100 is shown in fig. 10, and includes: one or more processors 1001, memory 1002, a communication interface 1003. The processor 1001, the memory 1002, and the communication interface 1003 may be connected by a bus 1004 or otherwise, and the embodiment of the present application is exemplified by connection via the bus 1004.
The processor 1001 (or CPU (Central Processing Unit, central processing unit)) is a computing core and a control core of the smart device 100, and may parse various instructions in the smart device 100 and process various data of the smart device 100, for example: the CPU may be configured to parse the power-on/off instruction sent to the intelligent device 100, and control the intelligent device 100 to perform a power-on/off operation; and the following steps: the CPU may transmit various types of interaction data between the internal structures of the smart device 100, and so on. Communication interface 1003 may optionally include a standard wired interface, a wireless interface (e.g., wi-Fi, mobile communication interface, etc.), controlled by processor 1001 for transceiving data. Memory 1002 (Memory) is a Memory device in smart device 100 for storing computer programs and data. It is understood that the memory 1002 herein may include a built-in memory of the smart device 100, or may include an extended memory supported by the smart device 100. Memory 1002 provides storage space that stores the operating system of smart device 100, which may include, but is not limited to: windows system, linux system, android system, iOS system, etc., the application is not limited in this regard. In one embodiment, the processor 1001 performs the following operations by running a computer program stored in the memory 1002:
Acquiring a first image to be processed, and calling a target domain countermeasure enhancement model to process the first image to obtain image enhancement parameters of the first image;
transforming the first image by using the image augmentation parameters to obtain a second image, and calling an initial image recognition model to process the second image to obtain an image recognition result of the second image;
determining target difference data corresponding to the first image according to the image augmentation parameters of the first image, the image recognition result of the second image and the image label;
and updating the model parameters of the target domain countermeasure enhancement model and the model parameters of the initial image recognition model according to the target difference data, and determining a target image recognition model according to the initial image recognition model after updating the model parameters.
In an embodiment, the target domain antagonism enhancement model comprises one or both of a channel domain antagonism enhancement model and a spatial domain antagonism enhancement model; the processor 1001 is specifically configured to:
when the target domain countermeasure enhancement model comprises the channel domain countermeasure enhancement model, performing channel domain-based transformation processing on the first image by utilizing channel domain enhancement parameters comprising the image enhancement parameters to obtain a second image;
And when the target domain countermeasure enhancement model comprises the spatial domain countermeasure enhancement model, performing spatial domain-based transformation processing on the first image by utilizing the spatial domain enhancement parameters comprising the image enhancement parameters to obtain a second image.
In an embodiment, the first image comprises a plurality of image channel dimensions, the channel domain enhancement parameters comprise a set of channel adjustment parameters for each image channel dimension, the set of channel adjustment parameters comprising a forward adjustment parameter and a reverse adjustment parameter; the processor 1001 is specifically configured to:
determining the adjustment direction of each image channel, and selecting a reference adjustment parameter from a channel adjustment parameter group of each image channel dimension according to the adjustment direction;
determining the adjustment amplitude parameter of each image channel dimension according to the adjustment direction of each image channel, the reference adjustment parameter and the set amplitude parameter;
extracting channel images of the first image in each image channel dimension, and carrying out transformation processing on the channel images corresponding to each image channel dimension according to the adjustment amplitude parameters of each image channel dimension to obtain transformed channel images corresponding to each image channel dimension;
And integrating the channel images corresponding to the channel dimensions of the transformed images to obtain a second image.
In an embodiment, the first image comprises a plurality of image space dimensions, the spatial domain augmentation parameters comprising a set of spatial transformation parameters for each image space dimension; the processor 1001 is specifically configured to:
acquiring original position information and original pixel values of each pixel point in the first image;
selecting reference transformation parameters from the space transformation parameter group of each image space dimension, and constructing an affine transformation matrix by utilizing each selected reference transformation parameter;
determining mapping position information of each pixel point according to the affine transformation matrix and the original position information of each pixel point, and determining a target pixel value of each pixel point according to the original position information, the mapping position information and the original pixel value of each pixel point;
and generating a second image according to the mapping position information of each pixel point and the target pixel value.
In one embodiment, the processor 1001 is specifically configured to:
determining first difference data according to an image augmentation parameter of the first image;
Determining second difference data according to an image recognition result and an image label of the second image;
and determining target difference data corresponding to the first image according to the first difference data and the second difference data.
In one embodiment, the processor 1001 is specifically configured to:
acquiring a current training round of a training sample set comprising the first image;
determining a weight coefficient of the first difference data according to the current training round, wherein a positive correlation is formed between the weight coefficient and the current training round;
and determining the product of the first difference data and the weight coefficient, and taking the sum of the product and the second difference data as target difference data corresponding to the first image.
In one embodiment, the processor 1001 is specifically configured to:
determining first difference data according to reference adjustment parameters selected from a set of channel adjustment parameters for respective image channel dimensions of the first image when the target domain countermeasure enhancement model includes the channel domain countermeasure enhancement model;
when the target domain countermeasure enhancement model includes the spatial domain countermeasure enhancement model, first difference data is determined from reference transformation parameters selected from a set of spatial transformation parameters for respective image spatial dimensions of the first image.
In one embodiment, the processor 1001 is specifically configured to:
acquiring each image included in a training batch where the first image is located in a training sample set;
acquiring target difference data corresponding to each image;
and updating the model parameters of the target domain antagonism enhancement model and the model parameters of the initial image recognition model according to the target difference data corresponding to each image.
The processor 1001, the memory 1002 and the communication interface 1003 described in the embodiment of the present application may execute an implementation manner described in a data processing method provided in the embodiment of the present application, or may execute an implementation manner described in a data processing device provided in the embodiment of the present application, which is not described herein again.
In the method provided by the application, the target domain countermeasure enhancement model takes a single image sample as granularity, the unique image enhancement parameters are determined according to the characteristics of the image, in addition, the target difference data are determined according to the image enhancement parameters, the image recognition result output by the initial image recognition model and the image label, the model parameters of the target domain countermeasure enhancement model and the model parameters of the initial image recognition model are updated by utilizing the target difference data, the image enhancement parameters determined by the target domain countermeasure enhancement model tend to be increased under the action of the target difference data, so that the data enhancement intensity is increased, and meanwhile, the increase of the image enhancement parameters is slowed down according to the difference between the image recognition result output by the initial image recognition model and the image label, the determined image enhancement parameters are changed to the greatest extent on the premise of ensuring the image recognition, the image is further transformed by utilizing the determined image enhancement parameters, the image recognition model is expected to be obtained, and the image recognition model is trained by utilizing the image, so that the image recognition model has stronger diversity and flexibility, the image recognition model has stronger generalization capability and robustness, and is favorable for the image recognition capability.
The embodiment of the application also provides a computer readable storage medium, wherein a computer program is stored in the computer readable storage medium, and when the computer program runs on the intelligent device, the intelligent device is caused to execute the data processing method of any possible implementation manner. The specific implementation manner may refer to the foregoing description, and will not be repeated here.
The embodiment of the application also provides a computer program product, which comprises a computer program or computer instructions, and the computer program or computer instructions realize the steps of the data processing method provided by the embodiment of the application when being executed by a processor. The specific implementation manner may refer to the foregoing description, and will not be repeated here.
The embodiment of the application also provides a computer program, which comprises computer instructions, wherein the computer instructions are stored in a computer readable storage medium, the computer instructions are read from the computer readable storage medium by a processor of the intelligent device, and the computer instructions are executed by the processor, so that the intelligent device executes the data processing method provided by the embodiment of the application. The specific implementation manner may refer to the foregoing description, and will not be repeated here.
It should be noted that, for simplicity of description, the foregoing method embodiments are all expressed as a series of action combinations, but it should be understood by those skilled in the art that the present application is not limited by the order of action described, as some steps may be performed in other order or simultaneously according to the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required for the present application.
Those of ordinary skill in the art will appreciate that all or part of the steps in the various methods of the above embodiments may be implemented by a program to instruct related hardware, the program may be stored in a computer readable storage medium, and the storage medium may include: flash disk, read-Only Memory (ROM), random-access Memory (Random Access Memory, RAM), magnetic or optical disk, and the like.
The above disclosure is illustrative only of some embodiments of the application and is not intended to limit the scope of the application, which is defined by the claims and their equivalents.

Claims (12)

1. A method of data processing, the method comprising:
acquiring a first image to be processed, and calling a target domain countermeasure enhancement model to process the first image to obtain image enhancement parameters of the first image;
transforming the first image by using the image augmentation parameters to obtain a second image, and calling an initial image recognition model to process the second image to obtain an image recognition result of the second image;
determining target difference data corresponding to the first image according to the image augmentation parameters of the first image, the image recognition result of the second image and the image label;
and updating the model parameters of the target domain countermeasure enhancement model and the model parameters of the initial image recognition model according to the target difference data, and determining a target image recognition model according to the initial image recognition model after updating the model parameters.
2. The method of claim 1, wherein the target domain antagonism enhancement model comprises one or both of a channel domain antagonism enhancement model and a spatial domain antagonism enhancement model; the transforming the first image by using the image augmentation parameter to obtain a second image includes:
When the target domain countermeasure enhancement model comprises the channel domain countermeasure enhancement model, performing channel domain-based transformation processing on the first image by utilizing channel domain enhancement parameters comprising the image enhancement parameters to obtain a second image;
and when the target domain countermeasure enhancement model comprises the spatial domain countermeasure enhancement model, performing spatial domain-based transformation processing on the first image by utilizing the spatial domain enhancement parameters comprising the image enhancement parameters to obtain a second image.
3. The method of claim 2, wherein the first image comprises a plurality of image channel dimensions, the channel domain enhancement parameters comprising a channel adjustment parameter set for each image channel dimension, the channel adjustment parameter set comprising a forward adjustment parameter and a reverse adjustment parameter; the performing channel domain-based transformation processing on the first image by using the channel domain augmentation parameters included in the image augmentation parameters to obtain a second image, including:
determining the adjustment direction of each image channel, and selecting a reference adjustment parameter from a channel adjustment parameter group of each image channel dimension according to the adjustment direction;
Determining the adjustment amplitude parameter of each image channel dimension according to the adjustment direction of each image channel, the reference adjustment parameter and the set amplitude parameter;
extracting channel images of the first image in each image channel dimension, and carrying out transformation processing on the channel images corresponding to each image channel dimension according to the adjustment amplitude parameters of each image channel dimension to obtain transformed channel images corresponding to each image channel dimension;
and integrating the channel images corresponding to the channel dimensions of the transformed images to obtain a second image.
4. The method of claim 2, wherein the first image comprises a plurality of image space dimensions, and the spatial domain augmentation parameters comprise a set of spatial transformation parameters for each image space dimension; the performing spatial domain based transformation processing on the first image by using the spatial domain augmentation parameters included in the image augmentation parameters to obtain a second image, including:
acquiring original position information and original pixel values of each pixel point in the first image;
selecting reference transformation parameters from the space transformation parameter group of each image space dimension, and constructing an affine transformation matrix by utilizing each selected reference transformation parameter;
Determining mapping position information of each pixel point according to the affine transformation matrix and the original position information of each pixel point, and determining a target pixel value of each pixel point according to the original position information, the mapping position information and the original pixel value of each pixel point;
and generating a second image according to the mapping position information of each pixel point and the target pixel value.
5. The method according to any one of claims 2-4, wherein the determining the target difference data corresponding to the first image according to the image augmentation parameter of the first image, the image recognition result of the second image, and the image tag includes:
determining first difference data according to an image augmentation parameter of the first image;
determining second difference data according to an image recognition result and an image label of the second image;
and determining target difference data corresponding to the first image according to the first difference data and the second difference data.
6. The method of claim 5, wherein determining target difference data corresponding to the first image from the first difference data and the second difference data comprises:
Acquiring a current training round of a training sample set comprising the first image;
determining a weight coefficient of the first difference data according to the current training round, wherein a positive correlation is formed between the weight coefficient and the current training round;
and determining the product of the first difference data and the weight coefficient, and taking the sum of the product and the second difference data as target difference data corresponding to the first image.
7. The method of claim 5, wherein the determining first difference data from the image enhancement parameters of the first image comprises:
determining first difference data according to reference adjustment parameters selected from a set of channel adjustment parameters for respective image channel dimensions of the first image when the target domain countermeasure enhancement model includes the channel domain countermeasure enhancement model;
when the target domain countermeasure enhancement model includes the spatial domain countermeasure enhancement model, first difference data is determined from reference transformation parameters selected from a set of spatial transformation parameters for respective image spatial dimensions of the first image.
8. The method of claim 1, wherein updating model parameters of the target domain contrast enhancement model and model parameters of the initial image recognition model based on the target difference data comprises:
Acquiring each image included in a training batch where the first image is located in a training sample set;
acquiring target difference data corresponding to each image;
and updating the model parameters of the target domain antagonism enhancement model and the model parameters of the initial image recognition model according to the target difference data corresponding to each image.
9. A data processing apparatus, the apparatus comprising:
the acquisition unit is used for acquiring a first image to be processed, and calling a target domain antagonism enhancement model to process the first image to obtain an image enhancement parameter of the first image;
the processing unit is used for carrying out transformation processing on the first image by utilizing the image augmentation parameters to obtain a second image, and calling an initial image recognition model to process the second image to obtain an image recognition result of the second image;
the processing unit is further used for determining target difference data corresponding to the first image according to the image augmentation parameters of the first image, the image recognition results of the second image and the image labels;
the processing unit is further configured to update model parameters of the target domain countermeasure enhancement model and model parameters of the initial image recognition model according to the target difference data, and determine a target image recognition model according to the initial image recognition model after updating the model parameters.
10. An intelligent device, characterized in that the intelligent device comprises a memory, a communication interface and a processor, wherein the memory, the communication interface and the processor are mutually connected; the memory stores a computer program, and the processor invokes the computer program stored in the memory for implementing the data processing method according to any one of claims 1 to 8.
11. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, implements the data processing method according to any of claims 1-8.
12. A computer program product, characterized in that the computer program product comprises a computer program or computer instructions which, when executed by a processor, implement the data processing method according to any of claims 1-8.
CN202310202755.9A 2023-02-23 2023-02-23 Data processing method, device, intelligent equipment, storage medium and product Pending CN116977815A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310202755.9A CN116977815A (en) 2023-02-23 2023-02-23 Data processing method, device, intelligent equipment, storage medium and product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310202755.9A CN116977815A (en) 2023-02-23 2023-02-23 Data processing method, device, intelligent equipment, storage medium and product

Publications (1)

Publication Number Publication Date
CN116977815A true CN116977815A (en) 2023-10-31

Family

ID=88482030

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310202755.9A Pending CN116977815A (en) 2023-02-23 2023-02-23 Data processing method, device, intelligent equipment, storage medium and product

Country Status (1)

Country Link
CN (1) CN116977815A (en)

Similar Documents

Publication Publication Date Title
CN111950453B (en) Random shape text recognition method based on selective attention mechanism
WO2019100724A1 (en) Method and device for training multi-label classification model
CN109902548B (en) Object attribute identification method and device, computing equipment and system
CN112651438A (en) Multi-class image classification method and device, terminal equipment and storage medium
CN111310775A (en) Data training method and device, terminal equipment and computer readable storage medium
CN113822951B (en) Image processing method, device, electronic equipment and storage medium
CN111739027B (en) Image processing method, device, equipment and readable storage medium
CN112215171B (en) Target detection method, device, equipment and computer readable storage medium
CN113704531A (en) Image processing method, image processing device, electronic equipment and computer readable storage medium
CN108537168A (en) Human facial expression recognition method based on transfer learning technology
CN110222629A (en) Bale No. recognition methods and Bale No. identifying system under a kind of steel scene
CN110674826A (en) Character recognition method based on quantum entanglement
CN114091554A (en) Training set processing method and device
CN112614110A (en) Method and device for evaluating image quality and terminal equipment
CN116778148A (en) Target detection method, target detection device, electronic equipment and storage medium
CN113947613B (en) Target area detection method, device, equipment and storage medium
CN110245669B (en) Palm key point identification method, device, terminal and readable storage medium
CN116704324A (en) Target detection method, system, equipment and storage medium based on underwater image
CN116229188A (en) Image processing display method, classification model generation method and equipment thereof
CN116977815A (en) Data processing method, device, intelligent equipment, storage medium and product
CN117011416A (en) Image processing method, device, equipment, medium and program product
CN116777929A (en) Night scene image semantic segmentation method, device and computer medium
CN115147469A (en) Registration method, device, equipment and storage medium
CN115620054A (en) Defect classification method and device, electronic equipment and storage medium
CN114511877A (en) Behavior recognition method and device, storage medium and terminal

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication