CN113065635A - Model training method, image enhancement method and device - Google Patents

Model training method, image enhancement method and device Download PDF

Info

Publication number
CN113065635A
CN113065635A CN202110221444.8A CN202110221444A CN113065635A CN 113065635 A CN113065635 A CN 113065635A CN 202110221444 A CN202110221444 A CN 202110221444A CN 113065635 A CN113065635 A CN 113065635A
Authority
CN
China
Prior art keywords
image
training
loss function
network
trained
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110221444.8A
Other languages
Chinese (zh)
Inventor
张依曼
陈汉亭
陈醒濠
王云鹤
许春景
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN202110221444.8A priority Critical patent/CN113065635A/en
Publication of CN113065635A publication Critical patent/CN113065635A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4046Scaling of whole images or parts thereof, e.g. expanding or contracting using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4053Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the application discloses a model training method, an image enhancement method and equipment, which can be applied to the field of image processing in the field of artificial intelligence, and can be particularly used for a super-resolution reconstruction task. In the aspect of improving the training effect of the generated network, based on the characteristics of the super-resolution reconstruction task (that is, the super-resolution image has all information of the low-resolution image and contains more detailed information), a loss function for training the generated network is constructed for the super-resolution reconstruction task, and the training effect of the model on the super-resolution reconstruction task is improved; in the aspect of improving the training effect of the student network, because the knowledge without data is difficult to distill, in order to reduce the difficulty of distillation, the student network is trained in a progressive distillation mode until a complete student network is trained, and the difficulty of distillation is reduced.

Description

Model training method, image enhancement method and device
Technical Field
The application relates to the field of machine learning, in particular to a model training method, an image enhancement method and equipment.
Background
In order to apply neural networks to small mobile devices (e.g., cell phones) with limited computational resources, they need to be compressed and accelerated. Knowledge distillation is a model compression technology, which distills out the complex and strong learning ability characteristic expression 'knowledge' learned by a network and transmits the knowledge to the network with small parameter and weak learning ability. Knowledge distillation can transfer knowledge from one network to another by first training a teacher (teacher) network with training data and then training a student (student) network using the output of the teacher network.
Training data is usually required due to compression of neural networks, but in practical application, the training data is often unavailable due to restrictions of privacy policies, laws and the like. Much work is currently devoted to developing dataless model compression techniques that are typically achieved by building a model framework from a generative network, a teacher network, and a student network, and training the student network using distillation techniques. The training process comprises the following steps: the generation network receives the random noise and then generates images, which are input into the teacher network and the student network simultaneously. Then, a distillation technology is utilized to train the student network so that the output of the student network is similar to the output of the teacher network, and a generation network is trained so that the distribution of the image generated by the generation network is similar to the distribution of the data set.
Current data-less compression techniques focus primarily on classification and segmentation tasks.
Disclosure of Invention
The embodiment of the application provides a training method and an image enhancement method and device of a model, wherein the model comprises a generation network, a teacher network and a student network, a new loss function (namely a first loss function) of the training generation network is provided for a super-resolution reconstruction task, and the training effect of the model on the super-resolution reconstruction task is improved.
Based on this, the embodiment of the present application provides the following technical solutions:
in a first aspect, an embodiment of the present application first provides a method for training a model, which can be used in the field of artificial intelligence, and the method includes: firstly, training a generation network in a model by using a first loss function and a second loss function which are constructed, wherein the first loss function is used for representing the difference between a first image and a second image, the first image is an image generated by the network according to input random initialization noise, the second image is an image obtained by down-sampling a third image, and the third image is an image obtained by performing super-resolution reconstruction on the input first image by a teacher network. It should be noted that, in the embodiment of the present application, the teacher network is a pre-trained neural network, the model parameters of the teacher network are always kept unchanged, and the model parameters of the generation network at the first training time may be randomly initialized. After the training equipment performs one iteration training on the generated network, the generated network after the training is performed in the current round is obtained, and then the training equipment further performs training on the student network in the model by using a third loss function, wherein the third loss function is used for representing the difference between a fourth image and a fifth image, the fifth image is an image obtained by performing super-resolution reconstruction on an input sixth image by the teacher network, the sixth image is an image generated by the trained generated network according to input random initialization noise, and the fourth image is an image obtained by performing super-resolution reconstruction on the input sixth image by the student network. It should also be noted that in the embodiment of the present application, the model parameters of the teacher network remain unchanged, and the model parameters of the generation network are the generation network obtained by the training, and the model parameters of the student network at the first training time may also be randomly initialized. Finally, the training device repeatedly executes the process of alternately training the generation network and the student network until an iteration termination condition (which may be referred to as a first iteration termination condition) is reached.
In the above embodiments of the present application, a model training method is provided, where the model includes a generation network, a teacher network, and a student network, and a new loss function (i.e., a first loss function) for training the generation network is provided for a specific super-resolution reconstruction task, so as to improve a training effect of the model on the super-resolution reconstruction task.
In a possible implementation manner of the first aspect, the training of the student network may be performed hierarchically, specifically, the training device trains the generation network by using a first loss function and a second loss function, where the first loss function is used to represent a difference between a first image and a second image, the first image is an image generated by the network according to the input random initialization noise, the second image is an image obtained by down-sampling a third image, the third image is an image obtained by performing super-resolution reconstruction on the input first image by the teacher network, and the second loss function is related to the third loss function; after the training device performs one round of training on the generated network by using the first loss function and the second loss function, only the first sub-layer (which may be referred to as Bx) and the second neural network layer (which may be referred to as T) of the student network are trained by using the third loss function, in this embodiment, a sub-network formed by the first sub-layer Bx and the second neural network layer T of the student network may be referred to as a student network S1. In the round of training, the model parameters of the first sublayer Bx and the initialized model parameters of the second neural network layer T are both randomly initialized values. The training device repeatedly executes the steps until the first iteration termination condition is reached, and after the first iteration termination condition is reached, the training of the student network S1 is considered to be completed. After the training device has performed the above steps, a trained student network S1 is obtained, that is, the training device may obtain the model parameter values of the trained first sublayer (which may be referred to as first model parameter values) and the model parameter values of the trained second neural network layer (which may be referred to as second model parameter values). Then, the training device continues to train the generated network By using the first loss function and the second loss function, and trains the trained first sublayer Bx, the trained second sublayer (may be referred to as By), and the trained second neural network layer T By using the third loss function, in this embodiment, a sub-network formed By the first sublayer Bx, the second sublayer By, and the second neural network layer T of the student network may be referred to as a student network S2. In the training process, the initial values of the model parameters of the first sublayer Bx and the model parameters of the second neural network layer T are the first model parameter values and the second model parameter values obtained By the training, respectively, and the initialized model parameters of the second sublayer By are the randomly initialized values because the second sublayer By is not trained in the previous training round. And finally, the training equipment repeatedly executes the process of training the generated network by using the first loss function and the second loss function and the process of training the trained first sublayer, the trained second sublayer and the trained second neural network layer by using the third loss function until a second iteration termination condition is reached. In the embodiment of the present application, since the student network is trained in two stages, that is, the student network is divided into the student network S1 and the student network S2, when the student network S2 is trained, the training of the whole student network is completed, and thus, the values of the model parameters of the whole student network can be obtained.
In the above embodiment of the present application, the student network is divided into 2 parts, one part is a first neural network layer, and the other part is a second neural network layer, in the process of the progressive training, the first sublayer and the second neural network layer of the first neural network layer are trained first (at this time, the model parameters of the second sublayer of the first neural network layer are kept unchanged), after the part is trained, the model parameter values (obtained after the training) of the first sublayer and the second neural network layer of the first neural network layer are used as the initialization values (the value of the model parameter of the second sublayer is a random initialization value) of the model parameters of the first sublayer and the second neural network layer of the first neural network layer of the whole student network, and the first neural network layer and the second neural network layer are trained continuously, so as to obtain the finally trained student network. This progressive training process reduces the difficulty of training the student network.
In one possible implementation manner of the first aspect, the reaching the second iteration end condition includes: the preset training round is reached, or the third loss function converges, or the third loss function reaches a preset threshold.
In the above embodiments of the present application, several ways of determining the second iteration termination condition are described, which have flexibility and wide applicability.
In one possible implementation manner of the first aspect, the reaching the first iteration termination condition includes: the preset training round is reached, or the third loss function converges, or the third loss function reaches a preset threshold.
In the above embodiments of the present application, several ways of determining the termination condition of the first iteration are also set forth, which have flexibility and wide applicability.
In a possible implementation manner of the first aspect, the randomly initialized noise is generally a plurality of noise input together into the generating network during training, so that a plurality of first images can be obtained, for example, when the randomly initialized noise is n (n ≧ 2), the obtained first images and second images are both n, in which case, the first loss function is used for characterizing a mean difference or a mean square error between the n first images and the n second images.
In the above embodiments of the present application, it is specifically described how the first loss function is characterized in a plurality of first images and second images, which covers a plurality of practical situations and is realizable.
In one possible implementation manner of the first aspect, when the randomly initialized noise is m (m ≧ 2), the obtained fourth image and fifth image are m, in which case the third loss function is used to characterize the mean difference or mean square error between the m fourth images and the m fifth images.
In the above embodiments of the present application, it is specifically described how the third loss function can be characterized in a plurality of fourth images and fifth images, which covers a plurality of practical situations and is realizable.
In one possible implementation manner of the first aspect, the training device may further deploy the trained student network to the target device. For example, the method can be deployed on edge devices with limited computing resources, such as mobile phones and smart wearable devices (e.g., smart bands, smart watches), because the amount of networks of pre-trained teacher networks is generally large and is not suitable for being deployed on edge devices with limited computing resources, and the amount of networks of student networks is generally small and is suitable for being deployed.
In the above embodiment of the application, the student network obtained after training can be deployed on the target device for practical application, and generally, as the network size of the student network is smaller than that of the teacher network, the inference speed of the target device can be improved, and the use experience of the user can be enhanced.
The second aspect of the embodiments of the present application further provides an execution device, where the method includes: first, the performing apparatus may acquire an input image, which may be referred to as a target image, on which super-resolution reconstruction is to be performed. Then, the execution device performs super-resolution reconstruction on the target image through the trained student network, so as to obtain a reconstructed enhanced image, and it should be noted that, in this embodiment of the application, the trained student network is the student network obtained by the training in the training stage.
In the above embodiments of the present application, it is stated that the student network obtained after training is deployed on the target device to perform the super-resolution reconstruction task, and generally, because the network size of the student network is smaller than that of the teacher network, the trained student network can improve the inference speed of the target device and enhance the use experience of the user.
A third aspect of embodiments of the present application provides a training apparatus, where the training apparatus has a function of implementing the method of the first aspect or any one of the possible implementation manners of the first aspect. The function can be realized by hardware, and can also be realized by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the functions described above.
A fourth aspect of the embodiments of the present application provides an execution device, which has a function of implementing the method according to any one of the second aspect and the second possible implementation manner. The function can be realized by hardware, and can also be realized by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the functions described above.
A fifth aspect of the present embodiment provides a training device, which may include a memory, a processor, and a bus system, where the memory is used to store a program, and the processor is used to call the program stored in the memory to execute the method according to the first aspect of the present embodiment or any one of the possible implementation manners of the first aspect.
A sixth aspect of the present embodiment provides an execution device, which may include a memory, a processor, and a bus system, where the memory is configured to store a program, and the processor is configured to call the program stored in the memory to execute the method according to any one of the second aspect and the possible implementation manner of the second aspect of the present embodiment.
A sixth aspect of the present application provides a computer-readable storage medium having stored therein instructions, which, when run on a computer, cause the computer to perform the method of the first aspect or any one of the possible implementations of the first aspect, or cause the computer to perform the method of the second aspect or any one of the possible implementations of the second aspect.
A seventh aspect of embodiments of the present application provides a computer program, which when run on a computer, causes the computer to perform the method of any one of the above first aspect or first possible implementation manner, or causes the computer to perform the method of any one of the above second aspect or second possible implementation manner.
An eighth aspect of the embodiments of the present application provides a chip, where the chip includes at least one processor and at least one interface circuit, the interface circuit is coupled to the processor, the at least one interface circuit is configured to perform a transceiving function and send an instruction to the at least one processor, and the at least one processor is configured to execute a computer program or an instruction, where the at least one processor has a function of implementing the method according to the first aspect or any one of the possible implementations of the second aspect, or a function of implementing the method according to any one of the possible implementations of the second aspect, and the function may be implemented by hardware, or by software, or by a combination of hardware and software, where the hardware or software includes one or more modules corresponding to the above functions. In addition, the interface circuit is used for communicating with other modules except the chip, for example, the interface circuit can send the student network obtained after training on the chip to various edge devices (such as a mobile phone, a smart watch, smart glasses and the like) to perform a super-resolution image reconstruction task.
Drawings
FIG. 1 is a schematic structural diagram of an artificial intelligence body framework provided by an embodiment of the present application;
FIG. 2 is a schematic diagram of a model framework provided by an embodiment of the present application;
FIG. 3 is a schematic diagram of an overall framework of an image enhancement system provided by an embodiment of the present application;
FIG. 4 is a schematic flowchart of a method for training a model according to an embodiment of the present disclosure;
FIG. 5 is a schematic flow chart illustrating a method for training a model according to an embodiment of the present disclosure;
fig. 6 is a schematic diagram of a structure split of a student network according to an embodiment of the present application;
FIG. 7 is a schematic diagram of progressive training for a student network according to an embodiment of the present application;
fig. 8 is another schematic diagram of performing progressive training for a student network according to an embodiment of the present application;
FIG. 9 is a system architecture diagram of an application provided by an embodiment of the present application;
fig. 10 is a schematic flowchart of an image enhancement method according to an embodiment of the present application;
fig. 11 is a schematic diagram of an application scenario provided in an embodiment of the present application;
FIG. 12 is a schematic view of a training apparatus provided in accordance with an embodiment of the present application;
FIG. 13 is a schematic diagram of an execution device provided by an embodiment of the present application;
FIG. 14 is another schematic view of a training apparatus provided in an embodiment of the present application;
FIG. 15 is another schematic diagram of an execution device provided in an embodiment of the present application;
fig. 16 is a schematic structural diagram of a chip according to an embodiment of the present disclosure.
Detailed Description
The embodiment of the application provides a training method and an image enhancement method and device of a model, wherein the model comprises a generation network, a teacher network and a student network, a new loss function (namely a first loss function) of the training generation network is provided for a super-resolution reconstruction task, and the training effect of the model on the super-resolution reconstruction task is improved.
The terms "first," "second," and the like in the description and in the claims of the present application and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances and are merely descriptive of the various embodiments of the application and how objects of the same nature can be distinguished. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of elements is not necessarily limited to those elements, but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
The embodiments of the present application relate to a lot of related knowledge about neural networks, and in order to better understand the scheme of the embodiments of the present application, the following first introduces related terms and concepts that the embodiments of the present application may relate to. It should be understood that the related conceptual explanations may be limited by the specific details of the embodiments of the present application, but do not mean that the present application is limited to the specific details, and that the specific details of the embodiments may vary from one embodiment to another, and are not limited herein.
(1) Neural network
The neural network may be composed of neural units, in particular with inputsThe neural network of the input layer, the hidden layer and the output layer generally comprises an input layer as the first layer, an output layer as the last layer and hidden layers as the middle layers. Among them, a neural network with many hidden layers is called a Deep Neural Network (DNN). The operation of each layer in the neural network can be expressed mathematically
Figure BDA0002955260870000051
Describing, from the physical level, the work of each layer in the neural network can be understood as performing the transformation of the input space to the output space (i.e. the row space to the column space of the matrix) through five operations on the input space (the set of input vectors), which include: 1. ascending/descending dimensions; 2. zooming in/out; 3. rotating; 4. translating; 5. "bending". Wherein 1, 2, 3 are operated by
Figure BDA0002955260870000052
The operation of 4 is completed by "+ b", and the operation of 5 is realized by "a ()". The expression "space" is used here because the object being classified is not a single thing, but a class of things, space refers to the set of all individuals of such things, where W is the weight matrix of each layer of the neural network, and each value in the matrix represents the weight value of one neuron of that layer. The matrix W determines the spatial transformation of the input space to the output space described above, i.e. W at each layer of the neural network controls how the space is transformed. The purpose of training the neural network is to finally obtain the weight matrix of all layers of the trained neural network. Therefore, the training process of the neural network is essentially a way of learning the control space transformation, and more specifically, the weight matrix.
(2) Loss function (loss function)
In the process of training the neural network, because the output of the neural network is expected to be as close as possible to the value really expected to be predicted, the weight matrix of each layer of the neural network can be updated according to the difference between the predicted value of the current network and the really expected target value (of course, an initialization process is usually carried out before the first updating, namely, parameters are configured in advance for each layer in the neural network), for example, if the predicted value of the network is high, the weight matrix is adjusted to be lower in prediction, and the adjustment is carried out continuously until the neural network can predict the really expected target value. Therefore, it is necessary to define in advance "how to compare the difference between the predicted value and the target value", which are loss functions (loss functions) or objective functions (objective functions), which are important equations for measuring the difference between the predicted value and the target value. Taking the loss function as an example, if the higher the output value (loss) of the loss function indicates the larger the difference, the training of the neural network becomes a process of reducing the loss as much as possible.
(3) Back propagation algorithm
In the training process of the neural network, a Back Propagation (BP) algorithm can be adopted to correct the size of parameters in the initial neural network model, so that the reconstruction error loss of the neural network model is smaller and smaller. Specifically, the error loss is generated by transmitting the input signal in the forward direction until the output, and the parameters in the initial neural network model are updated by reversely propagating the error loss information, so that the error loss is converged. The back propagation algorithm is a back propagation motion with error loss as a dominant factor, aiming at obtaining the optimal parameters of the neural network model, such as a weight matrix.
Embodiments of the present application are described below with reference to the accompanying drawings. As can be known to those skilled in the art, with the development of technology and the emergence of new scenarios, the technical solution provided in the embodiments of the present application is also applicable to similar technical problems.
First, the general workflow of the artificial intelligence system is described, please refer to fig. 1, fig. 1 shows a structural diagram of an artificial intelligence subject framework, which is explained below from two dimensions of "intelligent information chain" (horizontal axis) and "IT value chain" (vertical axis). Where "intelligent information chain" reflects a list of processes processed from the acquisition of data. For example, the general processes of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision making and intelligent execution and output can be realized. In this process, the data undergoes a "data-information-knowledge-wisdom" refinement process. The 'IT value chain' reflects the value of the artificial intelligence to the information technology industry from the bottom infrastructure of the human intelligence, information (realization of providing and processing technology) to the industrial ecological process of the system.
(1) Infrastructure
The infrastructure provides computing power support for the artificial intelligent system, realizes communication with the outside world, and realizes support through a foundation platform. Communicating with the outside through a sensor; the computing power is provided by intelligent chips (hardware acceleration chips such as CPU, NPU, GPU, ASIC, FPGA and the like); the basic platform comprises distributed computing framework, network and other related platform guarantees and supports, and can comprise cloud storage and computing, interconnection and intercommunication networks and the like. For example, sensors and external communications acquire data that is provided to intelligent chips in a distributed computing system provided by the base platform for computation.
(2) Data of
Data at the upper level of the infrastructure is used to represent the data source for the field of artificial intelligence. The data relates to graphs, images, voice and texts, and also relates to the data of the Internet of things of traditional equipment, including service data of the existing system and sensing data such as force, displacement, liquid level, temperature, humidity and the like.
(3) Data processing
Data processing typically includes data training, machine learning, deep learning, searching, reasoning, decision making, and the like.
The machine learning and the deep learning can perform symbolized and formalized intelligent information modeling, extraction, preprocessing, training and the like on data.
Inference means a process of simulating an intelligent human inference mode in a computer or an intelligent system, using formalized information to think about and solve a problem by a machine according to an inference control strategy, and a typical function is searching and matching.
The decision-making refers to a process of making a decision after reasoning intelligent information, and generally provides functions of classification, sequencing, prediction and the like.
(4) General capabilities
After the above-mentioned data processing, further based on the result of the data processing, some general capabilities may be formed, such as algorithms or a general system, e.g. translation, analysis of text, computer vision processing, speech recognition, recognition of images, etc.
(5) Intelligent product and industrial application
The intelligent product and industry application refers to the product and application of an artificial intelligence system in various fields, and is the encapsulation of an artificial intelligence integral solution, the intelligent information decision is commercialized, and the landing application is realized, and the application field mainly comprises: intelligent terminal, intelligent manufacturing, intelligent transportation, intelligent house, intelligent medical treatment, intelligent security protection, autopilot, wisdom city etc..
The embodiment of the application can be applied to the optimization design of a model frame consisting of a generation network, a teacher network and a student network, and particularly can be applied to the optimization design of a loss function of the model frame. The model framework for optimizing the over-loss function can be applied to the field of computer vision in the field of artificial intelligence, and can be applied to the field of super-resolution reconstruction in the field of image processing. Specifically, referring to fig. 1, data in a data set acquired by an infrastructure in the embodiment of the present application may be a plurality of image data (which may also be referred to as training samples or training data, and a plurality of training data form a training set) acquired by a sensor such as a monitoring camera and a mobile phone camera module or image data obtained based on acquired video data, as long as the training set satisfies a function for performing iterative training on a neural network, and the type of data in the training set is not limited in the embodiment of the present application.
For easy understanding of the present disclosure, first, a principle of a model training method provided in an embodiment of the present disclosure and a related model framework are introduced, and referring to fig. 2, fig. 2 is a model framework 200 provided in an embodiment of the present disclosure (in the following embodiments, the model framework may also be referred to as a model), the model framework 200 is composed of a generating network 201, a teacher network 202, and a student network 203, in a training stage, the generating network 201 is used to generate an image from randomly initialized noise, the randomly initialized noise may be denoted as z, the image generated by the generating network 201 may be denoted as g (z), a generated image g (z) is further input to the teacher network 202, the teacher network 202 is a pre-trained neural network in advance, the teacher network 202 may perform super-resolution reconstruction on the input image g (z), and then the generating network 201 is iteratively trained according to a first loss function and a second loss function constructed in an embodiment of the present disclosure, it should be noted that in the process of training the generating network 201, the model parameters of the student network 203 are kept unchanged, which may also be called as the fixed student network 203 (the teacher network 202 has been trained in advance, and the model parameters are kept unchanged all the time); the trained generation network 201 can record an image generated by randomly initialized noise as g (z) ', then the image g (z) ' generated by the trained generation network 201 is input into the student network 203, the student network 203 can perform super-resolution reconstruction on the input image g (z) ', and then the student network 203 is iteratively trained according to a third loss function constructed in the embodiment of the application, and it should be noted that in the process of training the student network 203, the model parameters of the generation network 201 are kept unchanged, that is, the generation network 201 is fixed; finally, during the alternate training of the generation network 201 and the student network 203, an iteration termination condition of the training is reached, at which point the model framework 200 is considered to be trained.
It should be noted that the network types of the generation network, the teacher network, and the student network are not limited in the embodiments of the present application, and may be, for example, a Convolutional Neural Network (CNN) or a Recurrent Neural Network (RNN), which is specifically selected by a user according to a target task to be performed.
Referring to fig. 3, the overall architecture of the image enhancement system according to the embodiment of the present application will be described, and fig. 3 is a system architecture diagram of the task processing system according to the embodiment of the present application, in fig. 3, the image enhancement system 300 includes an execution device 210, a training device 220, a database 230, a client device 240, a data storage system 250, and a data acquisition device 260 (in this embodiment, the data acquisition device 260 may be used to generate randomly initialized noise), the execution device 210 includes a computation module 211 and an input/output (I/O) interface 212 therein, the calculation module 211 includes a model framework 201 therein, for example, the model framework 201 may be the model framework 200 described above with respect to the embodiment of fig. 2, in the embodiment of the present application, the model framework 201 includes a generation network 2011, a teacher network 2012, and a student network 2013.
Wherein, during the training phase, the data acquisition device 260 may be used to generate noise for random initialization. The training device 220 generates an initial image to be processed by the generation network 2011 in the model framework 201 based on the randomly initialized noise generated by the data acquisition device 260. The generation network 2011 in the model frame 201 is trained by using the first loss function and the second loss function constructed in the embodiment of the present application, the student network 2013 in the model frame 201 is trained by using the third loss function constructed in the embodiment of the present application, and finally, in the alternate training process of the generation network 2011 and the student network 2013, the iteration termination condition of the training is reached, and at this time, the model frame 201 is considered to be trained completely. The student network 2013 in the trained model frame 201 may be applied to different systems or devices (i.e., the execution device 210), for example, may be applied to end-side devices such as a mobile phone, a tablet, and a notebook computer with limited computing resources, or may be applied to wearable devices such as a smart watch, a smart bracelet, and smart glasses, or may be applied to smart vehicles such as a smart car, a networked car, and a robot, or may be applied to edge devices such as a monitoring system (e.g., a camera), a networked camera (IPC) in a security system, an Augmented Reality (AR), a Virtual Reality (VR), an identity recognition device (e.g., a work attendance machine, a card punch, and the like), and a smart sound box, and specifically, the student network obtained by the training is not limited to be applied only to devices with limited computing resources, but also may be applied to a cloud server, and a mobile terminal device such as a mobile phone, a tablet computer, and a notebook computer, And a computing platform and any other device capable of deploying a neural network, which is not limited in the present application.
During the inference phase, the execution device 210 may invoke data, code, etc. from the data storage system 250 and may store data, instructions, etc. in the data storage system 250. The data storage system 250 may be disposed in the execution device 210 or the data storage system 250 may be an external memory with respect to the execution device 210. The calculation module 211 performs super-resolution reconstruction on each input real image data through the trained model frame 201.
In FIG. 3, the execution device 210 is configured with an I/O interface 212 to interact with data from an external device, and a "user" may input data to the I/O interface 212 via a client device 240. For example, the client device 240 may be an image capturing device of a monitoring system, an image captured by the image capturing device is input to the computing module 211 of the execution device 210 as input data, the computing module 211 performs super-resolution reconstruction on the input image to obtain an enhanced image, and then outputs the enhanced image to the image capturing device, or directly displays the enhanced image on a display interface (if any) of the execution device 210, or stores the enhanced image in a storage module of the execution device 210 for subsequent use.
In addition, in some embodiments of the present application, the client device 240 may also be integrated in the execution device 210, for example, when the execution device 210 is a mobile phone, the target image to be processed (for example, an image that can be captured by a camera of the mobile phone, or an image obtained based on a video captured by the camera of the mobile phone) or the target image to be processed sent by another device (for example, another mobile phone) may be directly obtained through the mobile phone, and then the computing module 211 in the mobile phone performs super-resolution reconstruction on the target image to obtain an enhanced result (i.e., an enhanced image) of the image, and directly presents the enhanced image on a display interface of the mobile phone or stores the enhanced image. The product forms of the execution device 210 and the client device 240 are not limited herein.
It should be noted that fig. 3 is only a schematic diagram of a system architecture provided in the embodiment of the present application, and the position relationship between the devices, modules, etc. shown in the diagram does not constitute any limitation, for example, in fig. 3, the data storage system 250 is an external memory with respect to the execution device 210, and in other cases, the data storage system 250 may be disposed in the execution device 210; in fig. 3, the client device 240 is an external device with respect to the execution device 210, and in other cases, the client device 240 may be integrated in the execution device 210.
In some embodiments of the present application, for example, in fig. 3, the training device 220 and the executing device 210 are distributed independent devices, but fig. 3 is only a schematic structural diagram of an image enhancement system provided by an embodiment of the present invention, and the positional relationship between the devices, modules, and the like shown in the diagram does not constitute any limitation. In other embodiments of the present application, training device 220 and performance device 210 may be integrated into the same device. Further, the example in FIG. 3 is not intended to limit the number of each device, e.g., database 230 may be in communication with a plurality of client devices 240.
It should be further noted that the training of the model framework 201 according to the embodiment of the present application may be implemented on the cloud side, for example, the training device 220 on the cloud side (the training device 220 may be disposed on one or more servers or virtual machines) may obtain a training set, train the model framework 201 according to multiple sets of training data (or referred to as training samples) in the training set, obtain the trained model framework 201, and then send the trained model framework 201 to the execution device 210 for application, or send the trained student networks 2013 in the model framework 201 to the execution device 210 for application, for example, the target task may be an image enhancement task, and the student networks 2013 in the trained model framework 201 may be sent to the execution device 210 for image super-resolution reconstruction. Illustratively, in the system architecture corresponding to fig. 3, the training device 220 trains the model framework 201, and the student networks 2013 in the trained model framework 201 are sent to the execution device 210 for use; the training of the model frame 201 described in the above embodiment may also be implemented on the terminal side, that is, the training device 220 may be located on the terminal side, for example, a training set may be obtained by a terminal device (e.g., a mobile phone, a smart watch, etc.), a wheel-type mobile device (e.g., an autonomous vehicle, an assisted driving vehicle, etc.), etc., and the model frame 201 may be trained according to multiple sets of training data in the training set to obtain the trained model frame 201, and the student network 2013 in the trained model frame 201 may be directly used by the terminal device, or may be sent to another device by the terminal device for use. The embodiment of the present application does not specifically limit on which device (cloud side or terminal side) the model framework 201 is trained or applied.
With reference to the above description, a specific implementation flow of the training phase and the application phase of the image enhancement method provided by the embodiment of the present application is set forth below.
First, training phase
In this embodiment of the present application, a training phase describes a process of the training device 220 in fig. 3 performing a training operation on the model framework 201 by using training data in a training set, specifically referring to fig. 4, fig. 4 is a flowchart of a training method of a model provided in this embodiment of the present application, where the model includes a generation network, a teacher network, and a student network, and the training method of the model may specifically include the following steps:
401. the training equipment trains the generation network by using a first loss function and a second loss function, wherein the first loss function is used for representing the difference between a first image and a second image, the first image is an image generated by the generation network according to input random initialization noise, the second image is an image obtained by down-sampling a third image, the third image is an image obtained by performing super-resolution reconstruction on the input first image by a teacher network, and the second loss function is related to the third loss function.
Firstly, training a generation network in a model by using a first loss function and a second loss function which are constructed, wherein the first loss function is used for representing the difference between a first image and a second image, the first image is an image generated by the network according to input random initialization noise, the second image is an image obtained by down-sampling a third image, and the third image is an image obtained by performing super-resolution reconstruction on the input first image by a teacher network. It should be noted that, in the embodiment of the present application, the teacher network is a pre-trained neural network, the model parameters of the teacher network are always kept unchanged, and the model parameters of the generation network at the first training time may be randomly initialized.
For ease of understanding, the physical meaning of the first loss function characterization is described below in connection with the model framework 200 in FIG. 2. Firstly, the training device obtains noise z initialized randomly, the noise z is input into the generating network 201 (the generating network can be represented as G), a first image G (z) is obtained, and then the first image G (z) is input into the teacher network 202 (the teacher network can be represented as G)
Figure BDA0002955260870000101
) Performing super-resolution reconstruction on the input first image G (z) by the teacher network 202 to obtain a third image
Figure BDA0002955260870000102
Then, the third image is processed
Figure BDA0002955260870000103
Down-sampling (the down-sampling operation can be expressed as
Figure BDA0002955260870000104
) Obtaining a second image
Figure BDA0002955260870000105
Accordingly, a representation of the first loss function L1 can be obtained, and the first loss function L1 can be represented by the following formula (1):
Figure BDA0002955260870000106
it should be noted that, in the case that the first image g (z) related to the first loss function L1 is expressed by equation (1), in practical applications, the noise z is generally input into the generation network together during training, so as to obtain a plurality of first images, for example, when the randomly initialized noise z is n (n ≧ 2), the obtained first images and second images are both n, in this case, the first loss function L1 is used to characterize the average difference or mean square difference between the n first images and the n second images, and the first loss function L1 can be expressed by equation (2) below by taking the average difference as an example:
Figure BDA0002955260870000107
the embodiment of the application is a first loss function constructed aiming at the basic characteristics of a super-resolution reconstruction task, because a teacher network is trained in advance by using a super-resolution data set
Figure BDA0002955260870000108
Then, the teacher network is started
Figure BDA0002955260870000109
Inputting a low resolution image I in a data setLThe result I of the super-resolution reconstruction can be obtainedSI.e. by
Figure BDA00029552608700001010
Considering that the low-resolution images in the hyper-resolution data set are generally obtained by down-sampling the high-resolution images, if the teacher's network training is good, ISShould be and ILCorresponding high resolution image IHKeep consistent, for ISDown-sampled obtained ISLShould also be mixed with ILRemain consistent, i.e.:
Figure BDA00029552608700001011
in the data-free super-resolution reconstruction task, the images generated by the generation network in the expected model can obey the distribution of the training set, so for the generation network, the input random noise z, the generated image of which is G (z), should be matched with the super-resolution result
Figure BDA00029552608700001012
Down sampling stationTo obtain
Figure BDA00029552608700001013
Remain consistent, i.e.:
Figure BDA00029552608700001014
therefore, according to the characteristics, the embodiment of the present application constructs a first loss function (i.e., represented by equation (1) or equation (2) above) for constraining the training of the generation network.
It should be noted that, in the embodiment of the present application, in addition to the first loss function described above, the constraint generation network training process further provides a second loss function for jointly constraining the training of the generation network with the first loss function, the second loss function (denoted by L2) is related to a third loss function (denoted by L3), for example, the second loss function L2 may be a function that performs negative logarithm processing on the third loss function L3, wherein the second loss function may also be referred to as a counterloss function, and the third loss function is a loss function for constraining the training of the student network, which will be specifically set forth in the following step 402.
It should be noted that, in some embodiments of the present application, the negative logarithm processing on the third loss function L3 to obtain the second loss function L2 may be obtained by the following formula (3):
L2=-log L3 (3)
in addition, during the iterative training of the model framework, the value of the third loss function L3 is gradually decreased, and therefore, in some embodiments of the present application, in order to reduce the error of the second loss function L2, the second loss function L2 obtained by performing negative logarithm processing on the third loss function L3 may also be obtained by the following formula (4):
L2=-log(L3+p) (4)
where p is a preset positive number, for example, p is 1.
It should be further noted that, in this embodiment of the application, the training device utilizes the first loss function and the second loss function to train the generated network is an iterative training process, and for example, may reach a preset training round (for example, the training is stopped after 100 times of training), or other training termination conditions (for example, convergence of any one of the first loss function and the second loss function, reaching of a preset threshold by any one of the first loss function and the second loss function, reaching of a value obtained by adding the first loss function and the second loss function, reaching of a preset threshold, and the like), which is not limited herein.
402. The training equipment trains the student network by using a third loss function, the third loss function is used for representing the difference between a fourth image and a fifth image, the fifth image is an image obtained by performing super-resolution reconstruction on an input sixth image by the teacher network, the sixth image is an image generated by the trained generation network according to input random initialization noise, and the fourth image is an image obtained by performing super-resolution reconstruction on the input sixth image by the student network.
After the training equipment performs one iteration training on the generated network, the generated network after the training is performed in the current round is obtained, and then the training equipment further performs training on the student network in the model by using a third loss function, wherein the third loss function is used for representing the difference between a fourth image and a fifth image, the fifth image is an image obtained by performing super-resolution reconstruction on an input sixth image by the teacher network, the sixth image is an image generated by the trained generated network according to input random initialization noise, and the fourth image is an image obtained by performing super-resolution reconstruction on the input sixth image by the student network. It should also be noted that in the embodiment of the present application, the model parameters of the teacher network remain unchanged, and the model parameters of the generation network are the generation network obtained by the training in step 401, and the model parameters of the student network at the first training time may also be randomly initialized.
For ease of understanding, the physical meaning of the third loss function characterization is still described in connection with the model framework 200 in fig. 2. First, the training device acquires a noise z that is initialized randomly, the noise z is input to the generation network 201 after training,a sixth image g (z) ' is obtained, and in the embodiment of the present application, the sixth image g (z) ' is different from the first image g (z) in that the first image g (z) is training data of the generation network 201 and is used for training model parameters of the generation network 201, and the sixth image g (z) ' may be regarded as an image obtained by the generation network 201 based on input test data, and is generated for the random initialization noise z input again after the generation network 201 performs the current round of iterative training based on the first image g (z). Then, the sixth image g (z) 'is input to the teacher network 202, and the teacher network 202 performs super-resolution reconstruction on the input sixth image g (z)' to obtain a fifth image
Figure BDA0002955260870000111
Further, the sixth image g (z) ' is input to the student network 203 in addition to the teacher network 202, and the fourth image S (g (z) ') is obtained by performing super-resolution reconstruction on the input sixth image g (z) ' by the student network 203, whereby the expression form of the third loss function L3 can be obtained, and the third loss function L3 can be expressed as the following expression (5):
Figure BDA0002955260870000112
it should also be noted that, in the case that the fourth image related to the third loss function L3 is a single image, and when the randomly initialized noise z is m (m ≧ 2), the obtained fourth image and fifth image are m, in this case, the third loss function L3 is used to represent the average difference or mean square difference between the m fourth images and the m fifth images, and the expression form is similar to the above equation (2), which is not repeated herein.
It should be noted that, in the embodiment of the present application, the training device utilizes the third loss function to train the student network is also an iterative training process, for example, the training may be performed until a preset training turn is reached (for example, the training is stopped after 200 times of training), or other training termination conditions (for example, the third loss function converges, the third loss function reaches a preset threshold, and the like), which is not limited herein.
403. The training device repeatedly executes the steps until a first iteration termination condition is reached.
The training device repeatedly executes the process of alternately training the generation network and the student network until an iteration termination condition (which may be referred to as a first iteration termination condition) is reached. It should be noted that, when the network is generated by each training, the model parameters of the student network keep the model parameter values obtained by the previous training; similarly, each time the student network is trained, the model parameters that generate the network remain unchanged from the values of the model parameters obtained in the previous round. In the whole alternate training process (the alternate training of the generated network and the student network is called a round of training), the value of the model parameter of the teacher network is always kept unchanged (the value of the model parameter obtained in the pre-training process).
It should be noted that, in some embodiments of the present application, the first iteration termination condition may have various determining manners, including but not limited to:
(1) the third loss function reaches a preset threshold.
In the embodiment of the application, a threshold (e.g., 0.03) may be set for the third loss function in advance, in the process of performing iterative training on the entire model, after each training, it is determined whether the value of the third loss function obtained by the current training round reaches the threshold, if not, the training is continued, if the preset threshold is reached, the training is terminated, and then the value of the model parameter of each network in the model determined by the current training round is taken as the value of the model parameter of the finally trained model.
(2) The third loss function begins to converge.
In the embodiment of the present application, in the process of performing iterative training on the entire model, if a difference between a value of a third loss function obtained by a current round of training and a value of a third loss function obtained by a previous round of training is within a preset range (e.g., 0.01), the third loss function is considered to be converged, and training may be terminated, so that values of model parameters of each network in the model determined by the current round of training are used as model parameter values of a finally trained model.
(3) The training reaches the preset round.
In the embodiment of the application, iteration rounds (for example, 1000 times) for training a model may be preconfigured, after each round of training is finished, values of model parameters of each network in the whole model corresponding to the round may be stored until the number of training iterations reaches a preset number, then, the model obtained in each round is verified by using test data, and values of the set of model parameters with the best performance are selected as values of the final model parameters of the model, or the set of model parameters obtained in the last training is used as values of the final model parameters of the model.
Finally, in some embodiments of the present application, the student network obtained after the training in steps 401 to 403 may be deployed on a target device, for example, an edge device with limited computing resources, such as a mobile phone, a smart wearable device (e.g., a smart bracelet, a smart watch, etc.), because the amount of the pre-trained teacher network is generally large and is not suitable for being deployed on the edge device with limited computing power, and the amount of the student network is generally small and is suitable for being deployed.
It should be noted that, in the above embodiments of the present application, all the neural network layers of the student network are trained together as a whole. Because knowledge distillation without data is difficult, in order to reduce the difficulty of knowledge distillation, in other embodiments of the present application, a student network may be trained in a layered training manner (i.e., progressive distillation). Referring to fig. 5, fig. 5 is another schematic flow chart of a training method of a model provided in an embodiment of the present application, where the model includes a generation network, a teacher network, and a student network, in the method, training of the student network is performed hierarchically, in the embodiment of the present application, the student network is illustrated as being divided into two parts, taking fig. 6 as an example, in fig. 6, these two parts may be respectively referred to as a first neural network layer (including a multilayer structure, as shown in a block in fig. 6) and a second neural network layer (including a multilayer structure), where the first neural network layer includes a first sub-layer and a second sub-layer that are adjacent to each other, and in particular, referring to the hierarchical illustration shown in fig. 6, the training method of the model may specifically include the following steps:
501. the training equipment trains the generation network by using a first loss function and a second loss function, wherein the first loss function is used for representing the difference between a first image and a second image, the first image is an image generated by the generation network according to input random initialization noise, the second image is an image obtained by down-sampling a third image, the third image is an image obtained by performing super-resolution reconstruction on the input first image by a teacher network, and the second loss function is related to the third loss function.
In the present embodiment, step 501 is similar to step 401 in the above embodiment, and please refer to step 401 specifically, which is not described herein again.
502. And the training equipment trains the first sublayer and the second neural network layer of the student network by using a third loss function, wherein the model parameters of the first sublayer and the model parameters of the second neural network layer are random initialization values.
After the training device performs one round of training on the generated network by using the first loss function and the second loss function, only the first sub-layer (which may be referred to as Bx) and the second neural network layer (which may be referred to as T) of the student network are trained by using the third loss function, in this embodiment, a sub-network formed by the first sub-layer Bx and the second neural network layer T of the student network may be referred to as a student network S1. In the round of training, the model parameters of the first sublayer Bx and the initialized model parameters of the second neural network layer T are both randomly initialized values.
503. The training device repeatedly executes the steps until a first iteration termination condition is reached.
The training device repeatedly executes the above steps 501 and 502 until the first iteration termination condition is reached, and after the first iteration termination condition is reached, the training of the student network S1 is considered to be completed. In the embodiment of the present application, the first iteration termination condition is similar to the first iteration termination condition described above, and please refer to the description in step 503 for details, which are not described herein again.
504. The training equipment obtains a first model parameter value of the first trained sublayer and a second model parameter value of the second trained neural network layer.
After the training device goes through the above steps 501 and 503, the trained student network S1 is obtained, that is, the training device can obtain the model parameter values of the trained first sub-layer (which may be referred to as first model parameter values) and the model parameter values of the trained second neural network layer (which may be referred to as second model parameter values).
505. The training equipment trains the generated network by using a first loss function and a second loss function, and trains the trained first sublayer, the trained second sublayer and the trained second neural network layer by using a third loss function, wherein the model parameter of the trained first sublayer is a first model parameter value, the model parameter of the trained second neural network layer is a second model parameter value, and the model parameter value of the second sublayer is a random initialization value.
Then, the training device continues to train the generated network By using the first loss function and the second loss function, and trains the trained first sublayer Bx, the trained second sublayer (may be referred to as By), and the trained second neural network layer T By using the third loss function, in this embodiment, a sub-network formed By the first sublayer Bx, the second sublayer By, and the second neural network layer T of the student network may be referred to as a student network S2. In the training process, the initial values of the model parameters of the first sublayer Bx and the model parameters of the second neural network layer T are the first model parameter values and the second model parameter values obtained By the training, respectively, and the initialized model parameters of the second sublayer By are the randomly initialized values because the second sublayer By is not trained in the previous training round.
506. And the training equipment repeatedly executes the process of training the generated network by using the first loss function and the second loss function and the process of training the trained first sublayer, the trained second sublayer and the trained second neural network layer by using the third loss function until a second iteration termination condition is reached.
And finally, the training equipment repeatedly executes the process of training the generated network by using the first loss function and the second loss function and the process of training the trained first sublayer, the trained second sublayer and the trained second neural network layer by using the third loss function until a second iteration termination condition is reached. In the embodiment of the present application, since the student network is trained in two stages, that is, the student network is divided into the student network S1 and the student network S2, when the student network S2 is trained, the training of the whole student network is completed, and thus, the values of the model parameters of the whole student network can be obtained.
It should be noted that, in some embodiments of the present application, similar to the first iteration termination condition, the second iteration termination condition may also have various determination manners, including but not limited to: the method for determining the first iteration termination condition includes that the third loss function reaches a preset threshold, the third loss function starts to converge, the training reaches a preset round, and the like, which is specifically similar to the above determination method for the first iteration termination condition, and reference may be specifically made to the above description on the first iteration termination condition, and details are not repeated here.
It should be noted that, in the embodiment corresponding to fig. 5, the student network is divided into 2 parts, one part is the first neural network layer, one part is the second neural network layer, in the process of progressive training, a first sub-layer and a second sub-layer of a first neural network layer are trained (at the moment, model parameters of a second sub-layer of the first neural network layer are kept unchanged), and after the training is finished, and taking the model parameter values (obtained after training) of the first sublayer and the second neural network layer of the first neural network layer as the initialization values of the model parameters (the value of the model parameter of the second sublayer is a random initialization value) of the first sublayer and the second neural network layer of the first neural network layer of the whole student network, and continuing training the first neural network layer and the second neural network layer to obtain the finally trained student network. In other embodiments of the present application, the first neural network layer of the student network to be trained may be further divided into more sub-layers, and after one sub-layer is completed in each training, one sub-layer is added, and the training continues by adding the previous sub-layer until all sub-layers are completed.
In the embodiment of the present application, in combination with the above-mentioned embodiment corresponding to fig. 5, as shown in fig. 7, the student network to be trained may still be divided into 2 parts, the first part is a first neural network layer, the second part is a second neural network layer, except that the first neural network layer may be further divided into p parts (which may be respectively referred to as B1, B2, … …, Bp), p ≧ 2, so as to be divided into p steps for training, at the k step, the student network to be trained is composed of a first sub-layer B1 of the first neural network layer, a second sub-layer B2, … …, a k sub-layer Bk, and a second neural network layer T, so that the student network to be trained in the k step may be referred to as student network k, and the first sub-layer B1, the second sub-layer B2, … …, the k-1 Bk-1, and the second neural network layer T of the student network layer k-1 are trained by the parameter model parameter of the k-1 step Is initialized, and in each step the training strategy of the student network to be trained is similar.
The following describes the training process of the student network to be trained based on the step division shown in fig. 7:
1) the generating network is initialized, when k is 1.
2) Initializing the student network Sk to be trained, specifically, when k is 1, randomly initializing all model parameters in the student network S1, and loading the model parameter values trained in the previous step to the rest student networks S2, S3, … …, Sp, and the like.
3) Initializing noise z randomly, inputting the noise z into a generation network to obtain a first image G (z), and inputting the first image G (z) into a teacher network to obtain a third image
Figure BDA0002955260870000151
For the third image
Figure BDA0002955260870000152
Lower miningObtaining a second image
Figure BDA0002955260870000153
The model parameters of the generation network are updated (i.e., the generation network is trained) by using the first loss function L1 constructed by the above formula (1) or formula (2) and the second loss function L2 constructed by the above formula (3) or formula (4) (in combination with the third loss function L3 of the formula (5)).
4) And repeating the step 3) for a plurality of times until a first preset iteration condition is reached.
5) Randomly initializing noise z, inputting the noise z into a trained generation network to obtain a sixth image G (z) ', and inputting the sixth image G (z)' into a teacher network to obtain a fifth image
Figure BDA0002955260870000154
Further, the sixth image g (z) 'is input to the student network Sk to obtain a fourth image S (g (z)'), and the model parameters of the student network Sk are updated by the third loss function L3 shown in the above equation (5) (that is, the student network Sl is trained).
6) And 5) repeating the step for a plurality of times until a second preset iteration condition is reached.
7) And repeating the steps 3) to 6) for multiple times until the training of the student network Sk is completed.
8) Judging whether k is equal to p, if Sk is equal to Sp, indicating that the training of the last student network is also finished, and stopping the training at the moment; if Sk ≠ Sp, starting from step 2), assigning k ≠ k +1, and continuing training Sk until the next round of student network Sk training is completed.
In the above embodiments of the present application, step 2) and step 8) are core steps for performing progressive training on the student network.
It should be noted that, in the above embodiments of the present application, the student network is divided into 2 parts, one part is the first neural network layer, the other part is the second neural network layer, and in fact, the second neural network layer is the tail part of the student network, considering that most of the intermediate layers of the super-divided network have hop connections, in some embodiments of the present application, the student network may be further divided according to a new division manner, for example, as shown in fig. 8, the student network may be divided into 3 parts, respectively, a head part (head part, abbreviated as H), a middle part (body part, abbreviated as B), and a tail part (tail part, abbreviated as T), and the body part may be divided according to the sub-layer division manner of the first neural network layer, for example, the body part may be divided into p sub-parts (respectively referred to as B1, B2, B, … …, Bp) ≧ 2, so as to be trained in p steps, at the k-th step, the student network to be trained is composed of a head part, a first subsection B1 of a body part, a second subsection B2, … …, a k-th subsection Bk, and a tail part, similarly, the student network to be trained in the k-th step may be referred to as student network k, and the head part, the first subsection B1 of the body part, the second subsection B2, … …, the k-1-th subsection Bk-1, and the tail part in the student network k are initialized by the values of the model parameters trained in the k-1 step, and in each step, the training strategy of the student network to be trained is similar. The specific training process and the above steps 1) to 8), please refer to the above steps 1) to 8), which is not described herein again.
In the above embodiment of the present application, in terms of improving the training effect of the generated network, a new loss function (i.e., a first loss function) for training the generated network is provided for the super-resolution reconstruction task based on the basic characteristics of the super-resolution reconstruction task (i.e., the super-resolution image has all information of the low-resolution image and contains more detailed information), so that the training effect of the model on the super-resolution reconstruction task is improved. In the aspect of improving the training effect of the student network, because the knowledge distillation without data is very difficult, in order to reduce the difficulty of the knowledge distillation, the student network is trained progressively (one part of the student network is trained each time), and is gradually expanded to train the final whole student network, so that the difficulty of the distillation is reduced.
Generally, the network size of the teacher network (e.g., the number of model parameters, the number of network layers, etc.) is relatively large, and the network size of the student network is generally relatively small, so the student network obtained by the training method of the model described above in this application can be applied to end-side devices such as a mobile phone, a tablet, a notebook computer, etc., and can also be applied to wearable devices such as a smart watch, a smart bracelet, smart glasses, etc., and can also be applied to smart mobile bodies such as a smart car, an internet protocol car, a robot, etc., and can also be applied to edge devices such as a monitoring system (e.g., a camera), an internet protocol camera (IP camera, IPC) in a security system, Augmented Reality (AR), Virtual Reality (VR), an identity recognition device (e.g., a work attendance machine, a card punch, etc.), a smart speaker, etc.
Specifically, the student network obtained after the training is not limited to be only applied to equipment with limited computing resources, but also can be applied to any other equipment capable of deploying a neural network, such as a cloud server and a computing platform, and the application is not limited to the equipment.
It should be further noted that the trained student network, in addition to being deployed on the target device for performing the super-resolution reconstruction task, may also be used as a teacher network in other model training processes for training other neural networks, and the application of the trained student network is not particularly limited.
A typical application scenario of the model training method provided in the embodiment of the present application is as follows: friends or clients have model compression requirements on the super-resolution reconstruction task, but training data and the like cannot be transmitted due to personal privacy or legal terms and the like, and only one pre-trained model (namely, a teacher network) can be provided. In this case, the model training method provided by the embodiment of the application can be used for compressing the hyper-score model, namely, a trained lightweight hyper-score model (namely, a trained student network) is obtained by combining the generated network based on the teacher network given by the friend or the client. Specifically, referring to fig. 9, fig. 9 is a system architecture diagram of an application provided in the embodiment of the present application, as shown in fig. 9, in an actual application, first, a structure of a lightweight hyper-separation model (i.e., a network structure of a student network is determined according to a pre-trained hyper-separation model (i.e., a teacher network) provided by a friend or a customer and a target performance, then, a training architecture of the whole model is formed by using the pre-trained hyper-separation model provided by the friend or the customer, a randomly initialized lightweight hyper-separation model, and a randomly initialized generation network, and a trained lightweight hyper-separation model (i.e., a trained student network) is obtained by using the model training method provided in the embodiment of the present application.
Second, reasoning phase
In an embodiment of the present application, a training phase describes a process in which the executing device 210 in fig. 3 performs super-resolution reconstruction on an input target image by using a trained student network 2013, specifically referring to fig. 10, where fig. 10 is a flowchart of an image enhancement method provided in an embodiment of the present application, and specifically includes the following steps:
1001. the execution device acquires a target image.
First, the performing device may also acquire an input image, which may be referred to as a target image, on which super-resolution reconstruction is to be performed.
1002. And performing super-resolution reconstruction on the target image through the trained student network by the execution equipment to obtain an enhanced image.
After that, the execution device performs super-resolution reconstruction on the target image through the trained student network, so as to obtain a reconstructed enhanced image, it should be noted that, in this embodiment of the application, the trained student network is the student network obtained by training in the training stage, and the training process of the student network specifically refers to the training stage, which is not described herein again.
It should be noted that the trained student network obtained through training in the embodiment of the present application may be applied to image super-resolution reconstruction tasks in various occasions, and a plurality of application scenarios of a plurality of products falling to the ground will be described below.
(1) Camera picture restoration
The camera photo restoration is an important technology and has great use value in processing scenes such as mobile phone imaging effect and the like, the current camera image restoration method is mainly carried out by adopting a neural network model with large network size (such as a teacher network), consumed computing resources are large, restoration speed is low, a student network obtained by training is deployed, computing and storage expenses are low, operation speed is high, and real-time super-resolution reconstruction of an image to be reconstructed can be realized with extremely low computing/storage expenses.
(2) Mobile phone photographing optimization
The trained student network can be used for photographing optimization of a terminal (such as a mobile phone, a smart watch, a personal computer and the like), as shown in fig. 11, taking the terminal as the mobile phone as an example, when a user uses the mobile phone to photograph, the user can automatically grab objects such as faces, animals and the like, and can help the mobile phone to automatically focus, beautify and the like. If the distance between the mobile phone and the shot object is far, the image shot by the mobile phone is possibly not clear, so that the trained student network can be applied to the mobile phone, the image quality of the optimized image is clear, and due to the fact that the network is small in size, the optimization speed is high, better user experience can be brought to a user, and the quality of mobile phone products is improved.
It should be noted that the trained student network described in the present application can be applied not only to the above application scenarios, but also to various segmentation fields in the field of artificial intelligence. For example, in super-resolution reconstruction tasks of images/videos in various application scenes such as electronic screen display, automatic driving, video monitoring and the like, the method and the trained student network can be used, and the trained student network provided by the embodiment of the application can be used as long as the field and the equipment of the neural network can be used, which is not illustrated here.
In order to more intuitively recognize the beneficial effects brought by the embodiments of the present application, the following further compares the technical effects brought by the embodiments of the present application. The effectiveness of the method provided by the application is tested on a super-resolution super-deep network (VDSR) and an enhanced deep super-resolution network (EDSR). The specific test process is as follows: given a model and data set of a already trained VDSR (or EDSR) (for comparison purposes only, and not required for practical application of the present application), the target network structure (i.e., student network) is obtained by halving the number of channels of the original model (i.e., teacher network). Then, a training frame is formed according to the randomly initialized generation network, the pre-trained teacher network and the randomly initialized student network, the student network is trained by the model training method provided by the application, and finally the trained student network is obtained. In comparing performance, the present application compares the difference between student networks trained with the present application and student networks trained with the data set. In comparison, according to the performance of the network on data sets disclosed in the industry, such as Set5, Set14, B100 and Urban100, evaluation indexes are peak signal-to-noise ratio (PSNR) and Structural Similarity (SSIM), and a higher value of both indicates better performance of the network.
The beneficial effects brought by the present application are shown in tables 1 and 2. In tables 1 and 2, Teacher refers to the performance obtained by direct training of the Teacher's network with a data set; student refers to the performance gained by directly training the Student network with a data set; bicubic means that a Bicubic technology is directly adopted to perform an up-sampling technology on a low-resolution image to obtain a super-resolution image without using a student network; noise refers to the performance obtained by directly inputting Noise into a teacher network and a student network and training the student network by using the Noise; ours refers to the performance obtained by training a student network by using the model training method provided by the embodiment of the application. As can be seen from tables 1 and 2, the student network trained by the model training method provided in the embodiment of the present application can obtain performance similar to that of the student network trained by the data Set (PSNR is only 0.16dB lower on Set 5X 2), which is significantly better than the super-score result obtained by bicubic interpolation and the super-score result obtained by noise.
TABLE 1 comparison of Performance on VDSR in this application with other existing approaches
Figure BDA0002955260870000181
TABLE 2 comparison of the Performance of the present application in EDSR with other existing approaches
Figure BDA0002955260870000182
On the basis of the above embodiments, in order to better implement the above aspects of the embodiments of the present application, the following also provides related equipment for implementing the above aspects. Referring to fig. 12 in detail, fig. 12 is a schematic diagram of a training apparatus according to an embodiment of the present disclosure, where the training apparatus 1200 may specifically include: the system comprises a first training module 1201, a second training module 1202 and an iteration triggering module 1203, wherein the first training module 1201 is configured to train the generation network by using a first loss function and a second loss function, the first loss function is used for representing a difference between a first image and a second image, the first image is an image generated by the generation network according to input random initialization noise, the second image is an image obtained by down-sampling a third image, the third image is an image obtained by performing super-resolution reconstruction on the input first image by the teacher network, and the second loss function is related to the third loss function; a second training module 1202, configured to train the student network by using the third loss function, where the third loss function is used to represent a difference between a fourth image and a fifth image, the fifth image is an image obtained by performing super-resolution reconstruction on an input sixth image by the teacher network, the sixth image is an image generated by the trained generation network according to input random initialization noise, and the fourth image is an image obtained by performing super-resolution reconstruction on the input sixth image by the student network; an iteration triggering module 1203 is configured to trigger the first training module 1201 and the second training module 1202 to repeatedly execute the respective steps until a first iteration termination condition is reached.
In the above embodiments of the present application, a model training method is provided, where the model includes a generation network, a teacher network, and a student network, and a new loss function (i.e., a first loss function) for training the generation network is provided for a specific super-resolution reconstruction task, so as to improve a training effect of the model on the super-resolution reconstruction task.
In one possible design, the student network may include a first neural network layer and a second neural network layer, the first neural network layer includes a first sublayer and a second sublayer that are adjacent to each other, and the second training module 1202 is specifically configured to: training the first sublayer and the second neural network layer by using the third loss function, wherein the model parameters of the first sublayer and the model parameters of the second neural network layer are random initialization values;
in this case, the training apparatus 1200 further includes an obtaining module 1204, configured to obtain first model parameter values of the first sub-layer after training and second model parameter values of the second neural network layer after training; in addition, the training apparatus 1200 further includes a third training module 1205, configured to train the generated network by using the first loss function and the second loss function, and train the trained first sublayer, the trained second sublayer, and the trained second neural network layer by using the third loss function, where a model parameter of the trained first sublayer is the first model parameter value, a model parameter of the trained second neural network layer is the second model parameter value, and a model parameter value of the second sublayer is a random initialization value; the third training module 1205 is further configured to repeatedly perform the process of training the generated network by using the first loss function and the second loss function, and the process of training the trained first sublayer, the trained second sublayer, and the trained second neural network layer by using the third loss function until a second iteration termination condition is reached.
In the above embodiment of the present application, the student network is divided into 2 parts, one part is a first neural network layer, and the other part is a second neural network layer, in the process of the progressive training, the first sublayer and the second neural network layer of the first neural network layer are trained first (at this time, the model parameters of the second sublayer of the first neural network layer are kept unchanged), after the part is trained, the model parameter values (obtained after the training) of the first sublayer and the second neural network layer of the first neural network layer are used as the initialization values (the value of the model parameter of the second sublayer is a random initialization value) of the model parameters of the first sublayer and the second neural network layer of the first neural network layer of the whole student network, and the first neural network layer and the second neural network layer are trained continuously, so as to obtain the finally trained student network. This progressive training process reduces the difficulty of training the student network.
In one possible design, the reaching the second iteration termination condition includes: the preset training round is reached, or the third loss function converges, or the third loss function reaches a preset threshold.
In the above embodiments of the present application, several ways of determining the second iteration termination condition are described, which have flexibility and wide applicability.
In one possible design, the reaching the first iteration termination condition includes: the preset training round is reached, or the third loss function converges, or the third loss function reaches a preset threshold.
In the above embodiments of the present application, several ways of determining the termination condition of the first iteration are also set forth, which have flexibility and wide applicability.
In one possible design, the randomly initialized noise is typically input into the generation network together during training, so that a plurality of first images can be obtained, for example, when the randomly initialized noise is n (n ≧ 2), the obtained first and second images are both n, in which case the first loss function is used to characterize the mean difference or mean square error between the n first images and the n second images.
In the above embodiments of the present application, it is specifically described how the first loss function is characterized in a plurality of first images and second images, which covers a plurality of practical situations and is realizable.
In one possible design, when the randomly initialized noise is m (m ≧ 2), then the resulting fourth and fifth images are m, in which case the third loss function is used to characterize the mean difference or mean square error between the m fourth images and the m fifth images.
In the above embodiments of the present application, it is specifically described how the third loss function can be characterized in a plurality of fourth images and fifth images, which covers a plurality of practical situations and is realizable.
In one possible design, the iteration triggering module 1203 is further configured to: and deploying the trained student network on the target equipment. For example, the method can be deployed on edge devices with limited computing resources, such as mobile phones and smart wearable devices (e.g., smart bands, smart watches), because the amount of networks of pre-trained teacher networks is generally large and is not suitable for being deployed on edge devices with limited computing resources, and the amount of networks of student networks is generally small and is suitable for being deployed.
In the above embodiment of the application, the student network obtained after training can be deployed on the target device for practical application, and generally, as the network size of the student network is smaller than that of the teacher network, the inference speed of the target device can be improved, and the use experience of the user can be enhanced.
It should be noted that, the contents of information interaction, execution process, and the like between the modules/units in the training apparatus 1200 are based on the same concept as the method embodiment corresponding to fig. 4 or fig. 5 in the present application, and specific contents may refer to the description in the foregoing method embodiment in the present application, and are not described herein again.
An execution device is further provided in the embodiment of the present application, specifically referring to fig. 13, where fig. 13 is a schematic diagram of an execution device provided in the embodiment of the present application, and the execution device 1300 may specifically include: an obtaining module 1301 and a reconstruction module 1302, wherein the obtaining module 1301 is used for obtaining a target image; the reconstruction module 1302 is configured to perform super-resolution reconstruction on the target image through the trained student network to obtain an enhanced image, and it should be noted that, in this embodiment of the application, the trained student network is the student network trained in the training phase, and the training process of the student network specifically refers to the training phase.
In the above embodiments of the present application, it is stated that the student network obtained after training is deployed on the target device to perform the super-resolution reconstruction task, and generally, because the network size of the student network is smaller than that of the teacher network, the trained student network can improve the inference speed of the target device and enhance the use experience of the user.
It should be noted that, the contents of information interaction, execution process, and the like between the modules/units in the execution device 1300 are based on the same concept as the method embodiment corresponding to fig. 10 in the present application, and specific contents may refer to the description in the foregoing method embodiment in the present application, and are not described herein again.
Referring to fig. 14, fig. 14 is a schematic structural diagram of a training device provided in an embodiment of the present application, and the training device 1400 may be disposed with the training device 1200 described in the embodiment corresponding to fig. 12, for implementing functions of the training device 1200 in the embodiment corresponding to fig. 12, specifically, the training device 1400 is implemented by one or more servers, and the training device 1400 may generate relatively large differences due to different configurations or performances, and may include one or more Central Processing Units (CPUs) 1422 and a memory 1432, and one or more storage media 1430 (e.g., one or more mass storage devices) for storing applications 1442 or data 1444. Memory 1432 and storage media 1430, among other things, may be transient or persistent storage. The program stored on storage medium 1430 may include one or more modules (not shown), each of which may include a sequence of instructions for operating on exercise device 1400. Still further, central processor 1422 may be configured to communicate with storage medium 1430 to perform a series of instructional operations on training device 1400 from storage medium 1430.
Training apparatus 1400 may also include one or more power supplies 1426, one or more wired or wireless network interfaces 1450, one or more input-output interfaces 1458, and/or one or more operating systems 1441, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.
In this embodiment of the application, the central processing unit 1422 is configured to execute the training method for the training device to execute the model in the embodiment corresponding to fig. 4 or fig. 5. For example, central processor 1422 may be used to: firstly, training a generation network in a model by using a first loss function and a second loss function which are constructed, wherein the first loss function is used for representing the difference between a first image and a second image, the first image is an image generated by the network according to input random initialization noise, the second image is an image obtained by down-sampling a third image, and the third image is an image obtained by performing super-resolution reconstruction on the input first image by a teacher network. It should be noted that, in the embodiment of the present application, the teacher network is a pre-trained neural network, the model parameters of the teacher network are always kept unchanged, and the model parameters of the generation network at the first training time may be randomly initialized. After one iteration training of the generated network is performed, the generated network after the current round of training is obtained, and then, a student network in the model is further trained by using a third loss function, wherein the third loss function is used for representing the difference between a fourth image and a fifth image, the fifth image is an image obtained by performing super-resolution reconstruction on an input sixth image by a teacher network, the sixth image is an image generated by the trained generated network according to input random initialization noise, and the fourth image is an image obtained by performing super-resolution reconstruction on the input sixth image by the student network. It should also be noted that in the embodiment of the present application, the model parameters of the teacher network remain unchanged, and the model parameters of the generation network are the generation network obtained by the training, and the model parameters of the student network at the first training time may also be randomly initialized. Finally, the process of alternately training the generation network and the student network is repeatedly executed until an iteration termination condition (which can be called a first iteration termination condition) is reached.
It should be noted that, the specific manner in which the cpu 1422 executes the above steps is based on the same concept as that of the method embodiment corresponding to fig. 4 or fig. 5 in the present application, and the technical effect brought by the method embodiment is also the same as that of the above embodiment in the present application, and specific contents may refer to the description in the foregoing method embodiment in the present application, and are not described herein again.
Referring to fig. 15, fig. 15 is a schematic structural diagram of an execution device provided in the embodiment of the present application, and the execution device 1500 may be embodied as various terminal devices, such as a virtual reality VR device, a mobile phone, a tablet, a notebook computer, an intelligent wearable device, a monitoring data processing device, or a radar data processing device, which is not limited herein. The execution device 1500 may be disposed with the execution device 1300 described in the embodiment corresponding to fig. 13, so as to implement the function of the execution device 1300 in the embodiment corresponding to fig. 13. Specifically, the execution apparatus 1500 includes: a receiver 1501, a transmitter 1502, a processor 1503 and a memory 1504 (where the number of processors 1503 in the execution device 1500 may be one or more, one processor is taken as an example in fig. 15), wherein the processor 1503 may comprise an application processor 15031 and a communication processor 15032. In some embodiments of the present application, the receiver 1501, the transmitter 1502, the processor 1503 and the memory 1504 may be connected by a bus or other means.
Memory 1504 may include both read-only memory and random access memory and provides instructions and data to processor 1503. The portion of the memory 1504 may also include non-volatile random access memory (NVRAM). The memory 1504 stores processors and operating instructions, executable modules or data structures, or subsets thereof, or expanded sets thereof, wherein the operating instructions may include various operating instructions for performing various operations.
The processor 1503 controls the operation of the execution apparatus 1500. In a particular application, the various components of the execution apparatus 1500 are coupled together by a bus system that may include a power bus, a control bus, a status signal bus, and the like, in addition to a data bus. For clarity of illustration, the various buses are referred to in the figures as a bus system.
The method disclosed in the above-mentioned embodiment of fig. 8 may be implemented in the processor 1503, or implemented by the processor 1503. The processor 1503 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be implemented by hardware integrated logic circuits or instructions in software in the processor 1503. The processor 1503 may be a general-purpose processor, a Digital Signal Processor (DSP), a microprocessor or a microcontroller, and may further include an Application Specific Integrated Circuit (ASIC), a field-programmable gate array (FPGA) or other programmable logic device, a discrete gate or transistor logic device, or a discrete hardware component. The processor 1503 may implement or execute the methods, steps and logic blocks disclosed in the embodiments corresponding to fig. 8 of the present application. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory 1504, and a processor 1503 reads information in the memory 1504 and completes the steps of the above method in combination with hardware thereof.
The receiver 1501 may be used to receive input numeric or character information and generate signal inputs related to performing relevant settings and function control of the device 1500. The transmitter 1502 may be configured to output numeric or character information via the first interface; the transmitter 1502 may also be configured to send instructions to the disk pack via the first interface to modify data in the disk pack; the transmitter 1502 may also include a display device such as a display screen.
In an embodiment of the present application, in one case, the processor 1503 is configured to perform super-resolution reconstruction on an input target image through a trained student network to obtain a corresponding enhanced image. The trained student network can be obtained by the training method corresponding to fig. 4 or fig. 5 of the present application, and specific contents can be referred to the description in the foregoing method embodiments of the present application, which is not described herein again.
Also provided in the embodiments of the present application is a computer-readable storage medium, in which a program for signal processing is stored, and when the program is executed on a computer, the program causes the computer to execute the steps executed by the training apparatus according to the embodiment shown in fig. 4 or fig. 5, or causes the computer to execute the steps executed by the execution apparatus according to the embodiment shown in fig. 10.
The training device, the execution device and the like provided by the embodiment of the application can be specifically chips, and the chips comprise: a processing unit, which may be for example a processor, and a communication unit, which may be for example an input/output interface, a pin or a circuit, etc. The processing unit may execute the computer executable instructions stored by the storage unit to cause the chip within the training apparatus to perform the steps performed by the training apparatus described in the embodiment shown in fig. 4 or fig. 5 above, or to cause the chip within the execution apparatus to perform the steps performed by the execution apparatus described in the embodiment shown in fig. 10 above.
Optionally, the storage unit is a storage unit in the chip, such as a register, a cache, and the like, and the storage unit may also be a storage unit located outside the chip in the wireless access device, such as a read-only memory (ROM) or another type of static storage device that can store static information and instructions, a Random Access Memory (RAM), and the like.
Specifically, referring to fig. 16, fig. 16 is a schematic structural diagram of a chip provided in the embodiment of the present application, where the chip may be represented as a neural network processor NPU 200, and the NPU 200 is mounted on a main CPU (Host CPU) as a coprocessor, and the Host CPU allocates tasks. The core portion of the NPU is an arithmetic circuit 2003, and the controller 2004 controls the arithmetic circuit 2003 to extract matrix data in the memory and perform multiplication.
In some implementations, the arithmetic circuit 2003 internally includes a plurality of processing units (PEs). In some implementations, the arithmetic circuitry 2003 is a two-dimensional systolic array. The arithmetic circuit 2003 may also be a one-dimensional systolic array or other electronic circuit capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuit 2003 is a general purpose matrix processor.
For example, assume that there is an input matrix A, a weight matrix B, and an output matrix C. The arithmetic circuit fetches the data corresponding to the matrix B from the weight memory 2002 and buffers it in each PE in the arithmetic circuit. The arithmetic circuit takes the matrix a data from the input memory 2001 and performs matrix arithmetic with the matrix B, and partial results or final results of the obtained matrix are stored in an accumulator (accumulator) 2008.
The unified memory 2006 is used to store input data and output data. The weight data directly passes through a Direct Memory Access Controller (DMAC) 2005, and the DMAC is transferred to the weight memory 2002. Input data is also carried into the unified memory 2006 by the DMAC.
A bus interface unit 2010 (BIU) is used for interaction between the AXI bus and the DMAC and an Instruction Fetch memory (IFB) 2009.
The bus interface unit 2010 is configured to fetch an instruction from the external memory by the instruction fetch memory 2009, and further configured to fetch the original data of the input matrix a or the weight matrix B from the external memory by the storage unit access controller 2005.
The DMAC is mainly used to transfer input data in the external memory DDR to the unified memory 2006 or to transfer weight data to the weight memory 2002 or to transfer input data to the input memory 2001.
The vector calculation unit 2007 includes a plurality of operation processing units, and further processes the output of the operation circuit, such as vector multiplication, vector addition, exponential operation, logarithmic operation, magnitude comparison, and the like, if necessary. The method is mainly used for non-convolution/full-connection layer network calculation in the neural network, such as Batch Normalization, pixel-level summation, up-sampling of a feature plane and the like.
In some implementations, the vector calculation unit 2007 can store the vector of processed outputs to the unified memory 2006. For example, the vector calculation unit 2007 may apply a linear function and/or a nonlinear function to the output of the arithmetic circuit 2003, such as linear interpolation of the feature planes extracted by the convolutional layers, and further such as a vector of accumulated values, to generate the activation values. In some implementations, the vector calculation unit 2007 generates normalized values, pixel-level summed values, or both. In some implementations, the vector of processed outputs can be used as activation inputs to the arithmetic circuit 2003, e.g., for use in subsequent layers in a neural network.
An instruction fetch buffer 2009 connected to the controller 2004 for storing instructions used by the controller 2004;
the unified memory 2006, the input memory 2001, the weight memory 2002, and the instruction fetch memory 2009 are all On-Chip memories. The external memory is private to the NPU hardware architecture.
Wherein any of the aforementioned processors may be a general purpose central processing unit, a microprocessor, an ASIC, or one or more integrated circuits configured to control the execution of the programs of the method of the first aspect.
It should be noted that the above-described embodiments of the apparatus are merely schematic, where the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. In addition, in the drawings of the embodiments of the apparatus provided in the present application, the connection relationship between the modules indicates that there is a communication connection therebetween, and may be implemented as one or more communication buses or signal lines.
Through the above description of the embodiments, those skilled in the art will clearly understand that the present application can be implemented by software plus necessary general-purpose hardware, and certainly can also be implemented by special-purpose hardware including special-purpose integrated circuits, special-purpose CPUs, special-purpose memories, special-purpose components and the like. Generally, functions performed by computer programs can be easily implemented by corresponding hardware, and specific hardware structures for implementing the same functions may be various, such as analog circuits, digital circuits, or dedicated circuits. However, for the present application, the implementation of a software program is more preferable. Based on such understanding, the technical solutions of the present application may be substantially embodied in the form of a software product, which is stored in a readable storage medium, such as a floppy disk, a usb disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk of a computer, and includes several instructions for enabling a computer device (which may be a personal computer, an exercise device, or a network device) to execute the method according to the embodiments of the present application.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product.
The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, training device, or data center to another website site, computer, training device, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that a computer can store or a data storage device, such as a training device, a data center, etc., that incorporates one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

Claims (21)

1. A training method of a model is applied to a super-resolution reconstruction task, and is characterized in that the model comprises a generation network, a teacher network and a student network, the generation network is used for generating random initialized noise into an image, the teacher network is a neural network trained in advance, and the method comprises the following steps:
training the generation network by using a first loss function and a second loss function, wherein the first loss function is used for representing the difference between a first image and a second image, the first image is an image generated by the generation network according to input random initialization noise, the second image is an image obtained by down-sampling a third image, the third image is an image obtained by performing super-resolution reconstruction on the input first image by the teacher network, and the second loss function is related to the third loss function;
training the student network by using the third loss function, wherein the third loss function is used for representing the difference between a fourth image and a fifth image, the fifth image is obtained by performing super-resolution reconstruction on an input sixth image by the teacher network, the sixth image is generated by the trained generation network according to input random initialization noise, and the fourth image is obtained by performing super-resolution reconstruction on the input sixth image by the student network;
and repeatedly executing the steps until a first iteration termination condition is reached.
2. The method of claim 1, wherein the student network comprises a first neural network layer and a second neural network layer, wherein the first neural network layer comprises a first sublayer and a second sublayer that are adjacent, and wherein training the student network with the third loss function comprises:
training the first sublayer and the second neural network layer by using the third loss function, wherein the model parameters of the first sublayer and the model parameters of the second neural network layer are random initialization values;
after the repeatedly executing the above steps until reaching the first iteration termination condition, the method further comprises:
acquiring a first model parameter value of the trained first sublayer and a second model parameter value of the trained second neural network layer;
training the generated network by using the first loss function and the second loss function, and training the trained first sublayer, the trained second sublayer and the trained second neural network layer by using the third loss function, wherein the model parameter of the trained first sublayer is the first model parameter value, the model parameter of the trained second neural network layer is the second model parameter value, and the model parameter value of the second sublayer is a random initialization value;
and repeatedly executing the process of training the generated network by using the first loss function and the second loss function and the process of training the trained first sublayer, the trained second sublayer and the trained second neural network layer by using the third loss function until a second iteration termination condition is reached.
3. The method of claim 2, wherein the reaching a second iteration termination condition comprises:
and reaching a preset training turn, or converging the third loss function, or reaching a preset threshold value by the third loss function.
4. The method of any one of claims 1-3, wherein the reaching a first iteration termination condition comprises:
and reaching a preset training turn, or converging the third loss function, or reaching a preset threshold value by the third loss function.
5. The method according to any one of claims 1-4, wherein the first image is n, the second image is n, n ≧ 2, and the first loss function for characterizing the difference between the first image and the second image comprises:
the first loss function is used to characterize the mean difference or mean square error between the n first images and the n second images.
6. The method according to any one of claims 1-5, wherein the fourth image is m, the fifth image is m, m ≧ 2, and the third loss function for characterizing the difference between the fourth image and the fifth image comprises:
the third loss function is used to characterize the mean or mean square error between the m fourth images and the m fifth images.
7. The method according to any one of claims 1-6, further comprising:
and deploying the trained student network on the target equipment.
8. An image enhancement method, comprising:
acquiring a target image;
performing super-resolution reconstruction on the target image through a trained student network to obtain an enhanced image, wherein the trained student network is the student network trained according to any one of claims 1 to 7.
9. An exercise apparatus, comprising:
the first training module is used for training the generation network by using a first loss function and a second loss function, the first loss function is used for representing the difference between a first image and a second image, the first image is an image generated by the generation network according to input random initialization noise, the second image is an image obtained by down-sampling a third image, the third image is an image obtained by performing super-resolution reconstruction on the input first image by the teacher network, and the second loss function is related to the third loss function;
the second training module is used for training the student network by using the third loss function, the third loss function is used for representing the difference between a fourth image and a fifth image, the fifth image is an image obtained by performing super-resolution reconstruction on an input sixth image by the teacher network, the sixth image is an image generated by a trained generation network according to input random initialization noise, and the fourth image is an image obtained by performing super-resolution reconstruction on the input sixth image by the student network;
and the iteration triggering module is used for triggering the first training module and the second training module to repeatedly execute the respective steps until a first iteration termination condition is reached.
10. The training device of claim 9, wherein the student network comprises a first neural network layer and a second neural network layer, the first neural network layer comprising a first sublayer and a second sublayer that are adjacent to each other, and wherein the second training module is specifically configured to:
training the first sublayer and the second neural network layer by using the third loss function, wherein the model parameters of the first sublayer and the model parameters of the second neural network layer are random initialization values;
the training equipment further comprises an acquisition module, a first sub-layer and a second sub-layer, wherein the acquisition module is used for acquiring a first model parameter value of the first sub-layer after training and a second model parameter value of the second neural network layer after training;
the training device further includes a third training module, configured to train the generated network by using the first loss function and the second loss function, and train the trained first sublayer, the trained second sublayer, and the trained second neural network layer by using the third loss function, where a model parameter of the trained first sublayer is the first model parameter value, a model parameter of the trained second neural network layer is the second model parameter value, and a model parameter value of the second sublayer is a random initialization value;
the third training module is further configured to repeatedly execute a process of training the generated network by using the first loss function and the second loss function and a process of training the trained first sublayer, the trained second sublayer, and the trained second neural network layer by using the third loss function until a second iteration termination condition is reached.
11. The training apparatus of claim 10, wherein said reaching a second iteration termination condition comprises:
and reaching a preset training turn, or converging the third loss function, or reaching a preset threshold value by the third loss function.
12. Training device according to any of claims 9-11, wherein said reaching a first iteration end condition comprises:
and reaching a preset training turn, or converging the third loss function, or reaching a preset threshold value by the third loss function.
13. Training device according to any of claims 9-12, wherein the first image is n and the second image is n, n ≧ 2, the first loss function for characterizing the difference between the first image and the second image comprising:
the first loss function is used to characterize the mean difference or mean square error between the n first images and the n second images.
14. Training device according to any of claims 9-13, wherein the fourth images are m, the fifth images are m, m ≧ 2, and the third loss function for characterizing the difference between the fourth and fifth images comprises:
the third loss function is used to characterize the mean or mean square error between the m fourth images and the m fifth images.
15. Training device according to any of claims 9-14, wherein the iterative triggering module is further configured to:
and deploying the trained student network on the target equipment.
16. An execution device, comprising:
the acquisition module is used for acquiring a target image;
a reconstruction module, configured to perform super-resolution reconstruction on the target image through a trained student network to obtain an enhanced image, where the trained student network is obtained after being trained by the training apparatus according to any one of claims 9 to 15.
17. A training device comprising a processor and a memory, the processor being coupled to the memory,
the memory is used for storing programs;
the processor to execute a program in the memory to cause the training apparatus to perform the method of any of claims 1-7.
18. An execution device comprising a processor and a memory, the processor coupled with the memory,
the memory is used for storing programs;
the processor, configured to execute the program in the memory, to cause the execution device to perform the method of claim 8.
19. A computer-readable storage medium comprising a program which, when run on a computer, causes the computer to perform the method of any one of claims 1-8.
20. A computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of any one of claims 1-8.
21. A chip comprising a processor and a data interface, the processor reading instructions stored on a memory through the data interface to perform the method of any one of claims 1-8.
CN202110221444.8A 2021-02-27 2021-02-27 Model training method, image enhancement method and device Pending CN113065635A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110221444.8A CN113065635A (en) 2021-02-27 2021-02-27 Model training method, image enhancement method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110221444.8A CN113065635A (en) 2021-02-27 2021-02-27 Model training method, image enhancement method and device

Publications (1)

Publication Number Publication Date
CN113065635A true CN113065635A (en) 2021-07-02

Family

ID=76559439

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110221444.8A Pending CN113065635A (en) 2021-02-27 2021-02-27 Model training method, image enhancement method and device

Country Status (1)

Country Link
CN (1) CN113065635A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113449851A (en) * 2021-07-15 2021-09-28 北京字跳网络技术有限公司 Data processing method and device
CN113657483A (en) * 2021-08-14 2021-11-16 北京百度网讯科技有限公司 Model training method, target detection method, device, equipment and storage medium
CN113793265A (en) * 2021-09-14 2021-12-14 南京理工大学 Image super-resolution method and system based on depth feature relevance
CN113935554A (en) * 2021-12-15 2022-01-14 北京达佳互联信息技术有限公司 Model training method in delivery system, resource delivery method and device
CN114359053A (en) * 2022-01-07 2022-04-15 中国电信股份有限公司 Image processing method, device, equipment and storage medium
CN115222600A (en) * 2022-07-29 2022-10-21 大连理工大学 Multispectral remote sensing image super-resolution reconstruction method for contrast learning
CN115564024A (en) * 2022-10-11 2023-01-03 清华大学 Feature distillation method and device for generating network, electronic equipment and storage medium
WO2023142918A1 (en) * 2022-01-28 2023-08-03 华为云计算技术有限公司 Image processing method based on pre-trained large model, and related apparatus

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108197700A (en) * 2018-01-12 2018-06-22 广州视声智能科技有限公司 A kind of production confrontation network modeling method and device
CN108830813A (en) * 2018-06-12 2018-11-16 福建帝视信息科技有限公司 A kind of image super-resolution Enhancement Method of knowledge based distillation
CN109690530A (en) * 2018-11-29 2019-04-26 袁振南 Model training method and its node, network and storage device
CN109903242A (en) * 2019-02-01 2019-06-18 深兰科技(上海)有限公司 A kind of image generating method and device
CN110322418A (en) * 2019-07-11 2019-10-11 北京航空航天大学 A kind of super-resolution image generates the training method and device of confrontation network
CN110427799A (en) * 2019-06-12 2019-11-08 中国地质大学(武汉) Based on the manpower depth image data Enhancement Method for generating confrontation network
CN110503186A (en) * 2019-07-19 2019-11-26 北京三快在线科技有限公司 Commodity sequence neural network model training method, device, electronic equipment
CN111080528A (en) * 2019-12-20 2020-04-28 北京金山云网络技术有限公司 Image super-resolution and model training method, device, electronic equipment and medium
CN111401406A (en) * 2020-02-21 2020-07-10 华为技术有限公司 Neural network training method, video frame processing method and related equipment
CN111523640A (en) * 2020-04-09 2020-08-11 北京百度网讯科技有限公司 Training method and device of neural network model
CN111563843A (en) * 2020-04-30 2020-08-21 苏州大学 Image super-resolution reconstruction method, system and related device
CN112200722A (en) * 2020-10-16 2021-01-08 鹏城实验室 Generation method and reconstruction method of image super-resolution reconstruction model and electronic equipment
CN112365405A (en) * 2020-11-25 2021-02-12 重庆邮电大学 Unsupervised super-resolution reconstruction method based on generation countermeasure network

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108197700A (en) * 2018-01-12 2018-06-22 广州视声智能科技有限公司 A kind of production confrontation network modeling method and device
CN108830813A (en) * 2018-06-12 2018-11-16 福建帝视信息科技有限公司 A kind of image super-resolution Enhancement Method of knowledge based distillation
CN109690530A (en) * 2018-11-29 2019-04-26 袁振南 Model training method and its node, network and storage device
CN109903242A (en) * 2019-02-01 2019-06-18 深兰科技(上海)有限公司 A kind of image generating method and device
CN110427799A (en) * 2019-06-12 2019-11-08 中国地质大学(武汉) Based on the manpower depth image data Enhancement Method for generating confrontation network
CN110322418A (en) * 2019-07-11 2019-10-11 北京航空航天大学 A kind of super-resolution image generates the training method and device of confrontation network
CN110503186A (en) * 2019-07-19 2019-11-26 北京三快在线科技有限公司 Commodity sequence neural network model training method, device, electronic equipment
CN111080528A (en) * 2019-12-20 2020-04-28 北京金山云网络技术有限公司 Image super-resolution and model training method, device, electronic equipment and medium
CN111401406A (en) * 2020-02-21 2020-07-10 华为技术有限公司 Neural network training method, video frame processing method and related equipment
CN111523640A (en) * 2020-04-09 2020-08-11 北京百度网讯科技有限公司 Training method and device of neural network model
CN111563843A (en) * 2020-04-30 2020-08-21 苏州大学 Image super-resolution reconstruction method, system and related device
CN112200722A (en) * 2020-10-16 2021-01-08 鹏城实验室 Generation method and reconstruction method of image super-resolution reconstruction model and electronic equipment
CN112365405A (en) * 2020-11-25 2021-02-12 重庆邮电大学 Unsupervised super-resolution reconstruction method based on generation countermeasure network

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113449851A (en) * 2021-07-15 2021-09-28 北京字跳网络技术有限公司 Data processing method and device
CN113657483A (en) * 2021-08-14 2021-11-16 北京百度网讯科技有限公司 Model training method, target detection method, device, equipment and storage medium
CN113793265A (en) * 2021-09-14 2021-12-14 南京理工大学 Image super-resolution method and system based on depth feature relevance
CN113935554A (en) * 2021-12-15 2022-01-14 北京达佳互联信息技术有限公司 Model training method in delivery system, resource delivery method and device
CN114359053A (en) * 2022-01-07 2022-04-15 中国电信股份有限公司 Image processing method, device, equipment and storage medium
WO2023142918A1 (en) * 2022-01-28 2023-08-03 华为云计算技术有限公司 Image processing method based on pre-trained large model, and related apparatus
CN115222600A (en) * 2022-07-29 2022-10-21 大连理工大学 Multispectral remote sensing image super-resolution reconstruction method for contrast learning
CN115564024A (en) * 2022-10-11 2023-01-03 清华大学 Feature distillation method and device for generating network, electronic equipment and storage medium
CN115564024B (en) * 2022-10-11 2023-09-15 清华大学 Characteristic distillation method, device, electronic equipment and storage medium for generating network

Similar Documents

Publication Publication Date Title
CN113065635A (en) Model training method, image enhancement method and device
CN112529150B (en) Model structure, model training method, image enhancement method and device
WO2022042713A1 (en) Deep learning training method and apparatus for use in computing device
CN112418392A (en) Neural network construction method and device
CN112651511A (en) Model training method, data processing method and device
CN113066017B (en) Image enhancement method, model training method and equipment
WO2022179492A1 (en) Pruning processing method for convolutional neural network, data processing method and devices
CN111914997B (en) Method for training neural network, image processing method and device
CN113705769A (en) Neural network training method and device
CN112598597A (en) Training method of noise reduction model and related device
CN113570029A (en) Method for obtaining neural network model, image processing method and device
CN113516227B (en) Neural network training method and device based on federal learning
CN111612215A (en) Method for training time sequence prediction model, time sequence prediction method and device
WO2022012668A1 (en) Training set processing method and apparatus
CN111738403A (en) Neural network optimization method and related equipment
CN111428854A (en) Structure searching method and structure searching device
CN111950700A (en) Neural network optimization method and related equipment
CN115081588A (en) Neural network parameter quantification method and device
CN114492723A (en) Neural network model training method, image processing method and device
WO2021036397A1 (en) Method and apparatus for generating target neural network model
CN112529149A (en) Data processing method and related device
CN113536970A (en) Training method of video classification model and related device
CN114140841A (en) Point cloud data processing method, neural network training method and related equipment
CN113627421A (en) Image processing method, model training method and related equipment
CN114298289A (en) Data processing method, data processing equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination