CN114494006A - Training method and device for image reconstruction model, electronic equipment and storage medium - Google Patents

Training method and device for image reconstruction model, electronic equipment and storage medium Download PDF

Info

Publication number
CN114494006A
CN114494006A CN202011155802.1A CN202011155802A CN114494006A CN 114494006 A CN114494006 A CN 114494006A CN 202011155802 A CN202011155802 A CN 202011155802A CN 114494006 A CN114494006 A CN 114494006A
Authority
CN
China
Prior art keywords
neural network
network model
image
reconstruction
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011155802.1A
Other languages
Chinese (zh)
Inventor
陈虹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Communications Ltd Research Institute
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Communications Ltd Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Communications Ltd Research Institute filed Critical China Mobile Communications Group Co Ltd
Priority to CN202011155802.1A priority Critical patent/CN114494006A/en
Publication of CN114494006A publication Critical patent/CN114494006A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4053Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4046Scaling of whole images or parts thereof, e.g. expanding or contracting using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses a training method of an image reconstruction model, an image super-resolution reconstruction method, an image super-resolution reconstruction device, an electronic device and a storage medium, wherein the training method of the image reconstruction model comprises the following steps: simultaneously training a first neural network model and a second neural network model by using the same training image; when the first neural network model is trained, performing supervised training on the first neural network model based on first information; the first information characterizes at least one level of image features output in the middle of the second neural network model; the first neural network model and the second neural network model are both used for image super-resolution reconstruction; the first parameter corresponding to the second neural network model is larger than the first parameter corresponding to the first neural network model; the first parameter characterizes a structural complexity of a first network structure in a neural network model; the first network structure characterizes a non-linear mapping structure.

Description

Training method and device for image reconstruction model, electronic equipment and storage medium
Technical Field
The present application relates to the field of image processing technologies, and in particular, to a training method for an image reconstruction model, an image super-resolution reconstruction method, an image super-resolution reconstruction device, an electronic apparatus, and a storage medium.
Background
Image super-resolution reconstruction is a technology for generating a single high-quality and high-resolution image by using a group of low-quality and low-resolution images so as to improve the identification capability and the identification precision of the image. In the related art, a single convolutional layer is used as a nonlinear mapping module to perform image super-resolution reconstruction, so that the structure of a nonlinear mapping network is single, and abundant characteristic information is difficult to extract from an image, which results in poor reconstruction effect.
Disclosure of Invention
In order to solve the related technical problems, embodiments of the present application provide a training method for an image reconstruction model, an image super-resolution reconstruction method, an apparatus, an electronic device, and a storage medium.
The embodiment of the application provides a training method of an image reconstruction model, which comprises the following steps:
simultaneously training a first neural network model and a second neural network model by using the same training image; wherein,
when the first neural network model is trained, performing supervised training on the first neural network model based on the first information; the first information characterizes at least one level of image features output in the middle of the second neural network model;
the first neural network model and the second neural network model are both used for image super-resolution reconstruction; the first parameter corresponding to the second neural network model is larger than the first parameter corresponding to the first neural network model; the first parameter characterizes a structural complexity of a first network structure in a neural network model; the first network structure characterizes a non-linear mapping structure.
In the above scheme, the nonlinear mapping structure of the second neural network model includes cascaded M residual blocks; the nonlinear mapping structure of the first neural network model comprises cascaded N residual blocks; wherein,
each residual block is used for extracting corresponding image characteristics; and M and N are integers which are larger than 1, and M is larger than N.
In the above scheme, two adjacent residual blocks are connected in a jumping manner.
In the above scheme, each of the N residual blocks of the first neural network model adopts a deep separable convolution.
In the above scheme, the method further comprises:
calculating a training error of the first neural network model based on the set loss function; wherein,
the set loss function consists of a first loss function and at least one second loss function; a function value of the first loss function characterizes a difference between an output image and a target image of a first neural network model; the function value of each of the at least one second loss function characterizes a difference between the image feature output by the first neural network model at the corresponding layer and the image feature output by the second neural network model at the same corresponding layer.
In the foregoing solution, the first loss function includes one of:
mean Absolute Error (MAE) loss function;
a perceptual loss function.
In the above scheme, each of the at least one second loss function includes a pair-wise loss function.
In the above scheme, the first neural network model and the second neural network model both include a second network structure for processing a shallow feature map of the obtained input image; wherein,
the second network structure includes a convolutional layer and a nonlinear active layer.
In the above solution, the first neural network model and the second neural network model both include a third network structure for image reconstruction; wherein,
the third network structure reconstructs super-resolution images by pixel reconstruction.
The embodiment of the application also provides an image super-resolution reconstruction method, which is used for reconstructing the image super-resolution by using the image reconstruction model trained by the training method of the image reconstruction model.
The embodiment of the present application further provides a training apparatus for an image reconstruction model, including:
the training unit is used for simultaneously training the first neural network model and the second neural network model by adopting the same training image; wherein,
when the first neural network model is trained, performing supervised training on the first neural network model based on the first information; the first information characterizes at least one level of image features output in the middle of the second neural network model;
the first neural network model and the second neural network model are both used for image super-resolution reconstruction; the first parameter corresponding to the second neural network model is larger than the first parameter corresponding to the first neural network model; the first parameter represents the structural complexity corresponding to a first network structure in the neural network model; the first network structure characterizes a non-linear mapping structure.
The embodiment of the application further provides an image super-resolution reconstruction device, which comprises:
and the reconstruction unit is used for performing image super-resolution reconstruction on the image reconstruction model trained by using the training method of the image reconstruction model.
An embodiment of the present application further provides a first electronic device, including: a first processor and a first communication interface; wherein,
the first processor is used for adopting the same training image to train a first neural network model and a second neural network model simultaneously; wherein,
when the first neural network model is trained, performing supervised training on the first neural network model based on the first information; the first information characterizes at least one level of image features output in the middle of the second neural network model;
the first neural network model and the second neural network model are both used for image super-resolution reconstruction; the first parameter corresponding to the second neural network model is larger than the first parameter corresponding to the first neural network model; the first parameter represents the structural complexity corresponding to a first network structure in the neural network model; the first network structure characterizes a non-linear mapping structure.
An embodiment of the present application further provides a second electronic device, including: a second processor and a second communication interface; wherein,
the second processor is used for performing image super-resolution reconstruction on the image reconstruction model trained by using the training method of the image reconstruction model.
An embodiment of the present application further provides a first electronic device, including: a first processor and a first memory for storing a computer program capable of running on the processor,
the first processor is configured to execute the steps of the training method for the image reconstruction model according to any one of the above descriptions when the computer program is executed.
An embodiment of the present application further provides a second electronic device, including: a second processor and a second memory for storing a computer program capable of running on the processor,
the second processor is used for executing the steps of the image super-resolution reconstruction method when the computer program is run.
The embodiment of the present application further provides a storage medium, on which a computer program is stored, which when executed by a processor implements the steps of the method for training the image reconstruction model described in any one of the above, or implements the steps of the method for reconstructing the image super-resolution described above.
In the training method of the image reconstruction model, both a first neural network model and a second neural network model are used for image super-resolution reconstruction, and a first parameter corresponding to the second neural network model is larger than a first parameter corresponding to the first neural network model, wherein the first parameter represents the structural complexity of a first network structure in the neural network model; the first network structure represents a non-linear mapping structure; the method comprises the steps of simultaneously training a first neural network model and a second neural network model by using the same training image, and carrying out supervision training on the first neural network model based on first information when the first neural network model is trained, wherein the first information represents at least one level of image features output in the middle of the second neural network model. Therefore, the intermediate features of the second neural network model with a more complex network structure are used for supervising the training of the first neural network model, so that the first neural network model can learn the rich feature information of the second neural network model, the generalization capability of the first neural network model is enhanced, the trained first neural network model is adopted to reconstruct the image super-resolution on the basis, the rich feature information can be extracted from the image, and the reconstruction effect is improved.
Drawings
FIG. 1 is a schematic diagram illustrating an implementation process of a training method of an image reconstruction model in the related art;
FIG. 2 is a schematic diagram illustrating an implementation process of a training method for an image reconstruction model according to an embodiment of the present disclosure;
FIG. 3 is an exemplary diagram of a process for implementing the training method of the image reconstruction model according to an embodiment of the present disclosure;
FIG. 4 is a schematic diagram of a training apparatus for image reconstruction models according to an embodiment of the present disclosure;
FIG. 5 is a schematic structural diagram of an image super-resolution reconstruction apparatus according to an embodiment of the present application;
FIG. 6 is a schematic structural diagram of a first apparatus according to an embodiment of the present disclosure;
fig. 7 is a schematic structural diagram of a second apparatus according to an embodiment of the present application.
Detailed Description
Image super-resolution reconstruction is a technology for generating a single high-quality and high-resolution image by using a group of low-quality and low-resolution images so as to improve the identification capability and the identification precision of the image. With reference to the related art of image super-resolution reconstruction shown in fig. 1, in the related art, an input low-resolution image is first preprocessed, including converting an RGB-format image into a YUV-format image, and then, the image is subjected to feature extraction and image upsampling, so that reconstruction of image super-resolution is completed, and a high-resolution image is output. In the process of training an image reconstruction model, whether the model is converged or not is judged by calculating the MAE loss between the output high-resolution image and the real high-resolution image. The image feature extraction part in the above reconstruction process is usually implemented based on a Convolutional Neural Network (CNN). CNN is a kind of feedforward neural network including convolution calculation and having a deep structure, is one of representative algorithms of deep learning (deep learning), and is widely applied in the fields of computer vision, such as image recognition, image super-resolution reconstruction, and the like. Since the storage space and the computing power of mobile devices such as mobile phones and vehicle-mounted devices are limited, in order to embed the deep neural network into the application of the mobile devices, the network model needs to be compressed, so that the related art adopts a single convolutional layer for nonlinear mapping during implementation, but the structure of the nonlinear mapping network is single, so that it is difficult to extract rich feature information from the image.
Based on the training method of the image reconstruction model, the image super-resolution reconstruction method, the device, the electronic equipment and the storage medium, when the image super-resolution model is trained, a teacher network with a more complex network structure is used for monitoring the learning of a student network. Compared with a student network, the teacher network has a more complex network structure and more feature extraction modules when image feature extraction is carried out, so that the teacher network is introduced for supervised learning while the student network is trained, and intermediate feature information output by a plurality of feature extraction modules of the teacher network is transmitted to the student network, so that the model generalization capability of the trained student network is improved, and a better reconstruction effect can be obtained when the student network is applied to image super-resolution reconstruction.
The present application will be described in further detail with reference to the following drawings and examples.
The embodiment of the application provides a training method of an image reconstruction model, as shown in fig. 2, the method includes:
step 201: simultaneously training a first neural network model and a second neural network model by using the same training image; in training the first neural network model, supervised training of the first neural network model is performed based on the first information.
Wherein the first information characterizes at least one level of image features intermediate output of a second neural network model; the first neural network model and the second neural network model are both used for image super-resolution reconstruction; the first parameter corresponding to the second neural network model is larger than the first parameter corresponding to the first neural network model; the first parameter characterizes the structural complexity of a first network structure in the neural network model; the first network structure characterizes a non-linear mapping structure.
Here, both the first neural network model and the second neural network model are used for image super-Resolution reconstruction, that is, a Low Resolution (LR) image is input to the neural network model used for image super-Resolution reconstruction, and the neural network model sequentially performs processing such as image preprocessing, image feature extraction, and image reconstruction on the input image, and finally outputs a High Resolution (HR) image. In practical application, the structural complexity of the nonlinear mapping structure of the second neural network model is greater than that of the nonlinear mapping structure of the first neural network model, in other words, the number of the feature extraction modules of the second neural network model is greater than that of the feature extraction modules of the first neural network model, and the complexity of each feature extraction module in the second neural network model is greater than that of the feature extraction module of the first neural network model, so that the second neural network model can extract image features with more depth compared with the first neural network. Based on this, when model training is performed, the same training image is adopted to input the first neural network model and the second neural network model at the same time, namely, the first neural network model and the second neural network model are trained at the same time, and in the training process, the first neural network model is supervised and trained based on at least one hierarchy of deeper image features output in the middle of the second neural network model, so that the network structure of the first neural network model can be ensured to be light, and the generalization capability of the first neural network model is improved on the premise of not increasing the calculated amount of the first neural network model.
In an embodiment, the nonlinear mapping structure of the second neural network model comprises a concatenation of M residual blocks; the nonlinear mapping structure of the first neural network model comprises cascaded N residual blocks; wherein,
each residual block is used for extracting corresponding image characteristics; and M and N are integers which are larger than 1, and M is larger than N.
The nonlinear mapping structure of the first neural network model comprises N cascaded residual blocks, and the nonlinear mapping structure of the second neural network model comprises M cascaded residual blocks, wherein M is larger than N, namely the number of the residual blocks in the nonlinear mapping structure of the second neural network model is larger than that of the residual blocks in the nonlinear mapping structure of the first neural network model. Here, each residual block is used for image feature extraction, and thus the number of modules for image feature extraction in the second neural network model is greater than that of modules for image feature extraction in the first neural network model. In practical application, multi-scale residual blocks are adopted in the first neural network model and the second neural network model, so that image features of different scales can be extracted during image feature extraction, and the image reconstruction effect can be improved.
In one embodiment, the connection is skipped between two adjacent residual blocks.
Here, in the nonlinear mapping structure of the neural network model, a skip connection (short connection) is used for adjacent residual blocks, so that original pixel information of an input image can be retained, the neural network model can be concentrated on residual information of a learning image during training, the training complexity of the model is reduced, and the image reconstruction model trained in this way can have good reconstruction performance.
In an embodiment, each of the N residual blocks of the first neural network model employs a depth separable convolution.
In practical application, the depth separable convolution splits one convolution kernel into two independent convolution kernels, and the two convolutions are respectively performed: compared with the conventional convolution operation, the number of parameters and the operation cost of the deep separable convolution are lower, so that the deep separable convolution is adopted in each residual block in the first neural network model, the network structure of the first neural network model can be lightened, and the method is more suitable for the image processing scene of the mobile equipment.
In an embodiment, the method further comprises:
calculating a training error of the first neural network model based on the set loss function; wherein,
the set loss function consists of a first loss function and at least one second loss function; a function value of the first loss function characterizes a difference between an output image and a target image of a first neural network model; the function value of each of the at least one second loss function characterizes a difference between the image feature output by the first neural network at the corresponding layer and the image feature output by the second neural network at the same corresponding layer.
In an embodiment, the first loss function comprises one of:
a MAE loss function;
a perceptual loss function, such as an SRGAN network.
In an embodiment, each of the at least one second loss function comprises a pair-wise loss function.
In the related art, the difference between the actual value and the target value of the model output is usually calculated by using an MAE loss function or a perceptual loss function in the training process, where, since a second neural network model is introduced in the model training process for supervising the learning of the first neural network model and intermediate feature information output by a plurality of feature extraction modules of the second neural network model is transferred to the first neural network model, the loss value calculation corresponding to an intermediate feature layer is added when the loss function is calculated. In practical application, the loss function may be loss ═ MAE loss + n × pair-wise loss, where the loss is a loss function of the first neural network model, the MAE loss is a loss value calculated based on the MAE function, the corresponding loss value represents a difference between an output image obtained by super-resolution reconstruction of the training image by the first neural network model and a target image corresponding to the training image, and the n × pair-wise loss represents a sum of loss values calculated corresponding to all pair-wise functions in the n pair-wise functions, where the loss value of each pair-wise function in the n pair-wise functions represents a difference between an image feature output by the first neural network model at a corresponding layer and an image feature output by the second neural network model at the same corresponding layer.
In practical application, when the loss value is calculated, the intermediate features output by each corresponding layer of the first neural network model and the second neural network model respectively correspond to the image features from the shallow layer to the deep layer in sequence. For example, the loss function is set to be less 0+ less 1+ less 2+ less 3, where less 0 represents the difference between the output image of the first neural network model after performing super-resolution reconstruction on the training image and the target image corresponding to the training image, less 1 represents the difference between the shallow image feature output in the middle of the first neural network model and the shallow image feature output in the middle of the second neural network model, less 2 represents the difference between the middle image feature output in the middle of the first neural network model and the middle image feature output in the middle of the second neural network model, and less 3 represents the difference between the deep image feature output in the middle of the first neural network model and the deep image feature output in the middle of the second neural network model. In practical applications, the set loss function may be represented as a weighted sum of four loss functions, loss0, loss1, loss2, and loss 3.
Here, regarding the above pair-wise function, in combination with the image feature extraction, when actually applied, the order is
Figure BDA0002742694610000091
Characterizing the correlation of the ith pixel point and the jth pixel point in the output image of the second neural network model,
Figure BDA0002742694610000092
expressing the correlation between the ith pixel point and the jth pixel point in the output image of the first neural network model, and adopting a pair-wise loss function to represent the difference between the ith pixel point and the jth pixel point of the output image as follows:
Figure BDA0002742694610000093
Here, because the pair-wise loss function is used to calculate the loss value corresponding to the intermediate feature, that is, the correlation between the paired pixel points is used instead of using single information to perform knowledge distillation, the trained first neural network model has better generalization capability while realizing model compression.
In an embodiment, the first neural network model and the second neural network model both include a second network structure for processing a shallow feature map of the input image; wherein,
the second network structure includes a convolutional layer and a nonlinear active layer.
The first neural network model and the second neural network model both comprise a third network structure for image reconstruction; wherein,
the third network structure reconstructs super-resolution images through pixel reconstruction (pixel reconstruction).
Here, image reconstruction using pixel reconstruction is performed with a smaller computational complexity and without involving parameters, compared to image reconstruction using deconvolution in the related art.
The embodiment of the application also provides an image super-resolution reconstruction method, which is used for reconstructing the image super-resolution by using the image reconstruction model trained by the training method of the image reconstruction model.
Fig. 3 shows a schematic implementation flow diagram of a training method for an image reconstruction model in an embodiment of the present application. Referring to fig. 3, a low-resolution image as a training image is input to a first neural network model as a student network and a second neural network model as a teacher network at the same time, and after a plurality of multi-scale feature extraction modules are used in each neural network model, image reconstruction is performed based on extracted multi-scale image features, and a high-resolution image is output. And, when calculating the loss function in the training process, including calculating the true high resolution image, that is, the MAE loss function between the true high resolution image (that is, the target image) and the output high resolution image, and also including the pair-wise loss function corresponding to each level between the teacher network and the student network.
Application examples of the embodiments of the present application are given below:
in the first application embodiment, the network structure of the second neural network model includes the following three parts:
1. feature extraction and representation: the part consists of a convolution layer and a nonlinear activation layer and mainly aims to perform feature extraction on an input image to obtain a shallow feature map.
2. Nonlinear mapping: the part consists of 10 residual blocks, each containing 2 convolutional layers, each followed by a nonlinear activation. Furthermore, each residual block has a jump connection for connecting the input features of each residual block with the input features.
3. Image reconstruction: the part reconstructs a high-resolution image using the deconvolution layer and outputs it. Among other things, the deconvolution layer can be considered the inverse operation of the convolution layer and is typically stacked at the end of the image super-resolution reconstruction network.
The network structure of the first neural network model comprises the following three parts:
1. feature extraction and representation: the part has the same structure as the characteristic extraction and representation part of a teacher network, consists of a convolution layer and a nonlinear activation part and mainly aims to extract the characteristics of an input image to obtain a shallow characteristic diagram.
2. Nonlinear mapping: the section employs 3 depth separable convolution modules, each module containing one depth convolution layer and a point-by-point convolution layer. Where the depth convolutional layer applies a single 3 x 3 filter to each input channel, the point-by-point convolutional layer applies a 1 x 1 convolution to combine the output of the depth convolutions.
3. Image reconstruction: the image reconstruction part has the same structure as the image reconstruction part of the teacher network, and reconstructs and outputs a high-resolution image by using the deconvolution layer. Among other things, the deconvolution layer can be considered the inverse operation of the convolution layer and is typically stacked at the end of the image super-resolution reconstruction network.
In the second application embodiment, the network structure of the second neural network model includes the following three parts:
1. feature extraction and representation: the part consists of a convolution layer and a nonlinear activation layer and mainly aims to perform feature extraction on an input image to obtain a shallow feature map.
2. Nonlinear mapping: the part consists of 8 multi-scale residual blocks, each residual block is formed by connecting a 3 x 3 convolutional layer and a 5 x 5 convolutional layer in parallel, in each residual block, an input feature map firstly passes through the 3 x 3 convolutional layer and the 5 x 5 convolutional layer respectively to obtain two groups of intermediate output feature maps, the two groups of intermediate output feature maps are cascaded and used as input to be transmitted to the other group of 3 x 3 and 5 x 5 convolutional layers, the two convolved output feature maps are cascaded after passing through an activation function respectively, and then a 1 x 1 convolutional layer is adopted to compress a feature map channel to obtain the output feature map of the multi-scale residual block.
3. Image reconstruction: the part cascades the output feature maps of the 8 multi-scale residual blocks, performs weight redistribution on the cascade feature maps through a channel attention module, reconstructs an image by adopting a pixel shuffle module, and obtains a final reconstructed image through a 1 × 1 convolution.
The network structure of the first neural network model comprises the following three parts:
1. feature extraction and representation: the part has the same structure as the characteristic extraction and representation part of a teacher network, consists of a convolution layer and a nonlinear activation part and mainly aims to extract the characteristics of an input image to obtain a shallow characteristic diagram.
2. Nonlinear mapping: the part adopts 2 multi-scale residual modules, optimizes the convolution structure of the modules by utilizing depth separable convolution, and reduces parameter quantity and calculation complexity.
3. Image reconstruction: the part has the same structure as an image reconstruction part of a teacher network, output feature maps of the 2 multi-scale residual blocks are cascaded, weight redistribution is carried out on the cascaded feature maps through a channel attention module, then a pixel buffer module is adopted to reconstruct the image, and a final reconstructed image is obtained through 1 × 1 convolution.
In order to implement the method of the embodiment of the present application, an embodiment of the present application further provides a training apparatus for an image reconstruction model, which is disposed on a first electronic device, as shown in fig. 4, and includes:
a training unit 401, configured to train a first neural network model and a second neural network model simultaneously using the same training image; wherein,
when the first neural network model is trained, performing supervised training on the first neural network model based on the first information; the first information characterizes at least one level of image features output in the middle of the second neural network model;
the first neural network model and the second neural network model are both used for image super-resolution reconstruction; the first parameter corresponding to the second neural network model is larger than the first parameter corresponding to the first neural network model; the first parameter represents the structural complexity corresponding to a first network structure in the neural network model; the first network structure characterizes a non-linear mapping structure.
Wherein, in an embodiment, the nonlinear mapping structure of the second neural network model comprises cascaded M residual blocks; the nonlinear mapping structure of the first neural network model comprises cascaded N residual blocks; wherein,
each residual block is used for extracting corresponding image characteristics; and M and N are integers which are larger than 1, and M is larger than N.
In an embodiment a jump connection between two adjacent residual blocks.
In an embodiment each of the N residual blocks of the first neural network model employs a deep separable convolution.
In one embodiment, the apparatus further comprises:
the calculation unit is used for calculating the training error of the first neural network model based on the set loss function; wherein,
the set loss function consists of a first loss function and at least one second loss function; a function value of the first loss function characterizes a difference between an output image and a target image of a first neural network model; the function value of each of the at least one second loss function characterizes a difference between the image feature output by the first neural network model at the corresponding layer and the image feature output by the second neural network at the same corresponding layer.
In an embodiment, the first loss function comprises one of:
a MAE loss function;
a perceptual loss function.
In an embodiment, each of the at least one second loss function comprises a pair-wise loss function.
In an embodiment, the first neural network model and the second neural network model both include a second network structure for processing a shallow feature map of the input image; wherein,
the second network structure includes a convolutional layer and a nonlinear active layer.
In an embodiment, the first neural network model and the second neural network model each comprise a third network structure for image reconstruction; wherein,
the third network structure reconstructs super-resolution images by pixel reconstruction.
In practical applications, the training unit 401 and the calculating unit may be implemented by a processor in a training apparatus for image reconstruction models.
It should be noted that: in the training apparatus for an image reconstruction model provided in the above embodiment, only the division of the program modules is exemplified when performing synchronization, and in practical applications, the processing distribution may be completed by different program modules according to needs, that is, the internal structure of the apparatus may be divided into different program modules to complete all or part of the processing described above. In addition, the training apparatus for an image reconstruction model and the embodiment of the training method for an image reconstruction model provided in the above embodiments belong to the same concept, and specific implementation processes thereof are described in detail in the embodiment of the method, and are not described herein again.
In order to implement the method of the embodiment of the present application, an embodiment of the present application further provides an image super-resolution reconstruction apparatus, disposed on a second electronic device, as shown in fig. 5, the apparatus including:
a reconstruction unit 501, configured to perform super-resolution image reconstruction on an image reconstruction model trained by using any one of the above-mentioned training methods for image reconstruction models.
In practical applications, the reconstruction unit 501 may be implemented by a processor in an image super-resolution reconstruction apparatus.
It should be noted that: the image super-resolution reconstruction device provided in the above embodiment is only exemplified by the division of the above program modules when performing synchronization, and in practical applications, the above processing may be distributed to different program modules according to needs, that is, the internal structure of the device may be divided into different program modules to complete all or part of the above-described processing. In addition, the image super-resolution reconstruction device and the image super-resolution reconstruction method provided by the above embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments in detail and are not described herein again.
Based on the hardware implementation of the program module, and in order to implement the training method for the image reconstruction model in the embodiment of the present application, an embodiment of the present application further provides a first electronic device, as shown in fig. 6, the first electronic device 600 includes:
a first communication interface 601, which is capable of performing information interaction with other network nodes;
the first processor 602 is connected to the first communication interface 601 to implement information interaction with other network nodes, and is configured to execute the method provided by one or more technical solutions of the first electronic device side when running a computer program. And the computer program is stored on the first memory 603.
Specifically, the first processor 602 is configured to train a first neural network model and a second neural network model simultaneously using the same training image; wherein,
when the first neural network model is trained, performing supervised training on the first neural network model based on the first information; the first information characterizes at least one level of image features output in the middle of the second neural network model;
the first neural network model and the second neural network model are both used for image super-resolution reconstruction; the first parameter corresponding to the second neural network model is larger than the first parameter corresponding to the first neural network model; the first parameter characterizes a structural complexity of a first network structure in a neural network model; the first network structure characterizes a non-linear mapping structure.
Wherein, in an embodiment, the nonlinear mapping structure of the second neural network model comprises cascaded M residual blocks; the nonlinear mapping structure of the first neural network model comprises cascaded N residual blocks; wherein,
each residual block is used for extracting corresponding image characteristics; and M and N are integers which are larger than 1, and M is larger than N.
In one embodiment, the connection is skipped between two adjacent residual blocks.
In an embodiment, each of the N residual blocks of the first neural network model employs a depth separable convolution.
In an embodiment, the first processor 602:
calculating a training error of the first neural network model based on the set loss function; wherein,
the set loss function consists of a first loss function and at least one second loss function; a function value of the first loss function characterizes a difference between an output image and a target image of a first neural network model; the function value of each of the at least one second loss function characterizes a difference between the image feature output by the first neural network model at the corresponding layer and the image feature output by the second neural network at the same corresponding layer.
In an embodiment, the first loss function comprises one of:
a MAE loss function;
a perceptual loss function.
In an embodiment, each of the at least one second loss function comprises a pair-wise loss function.
In an embodiment, the first neural network model and the second neural network model both include a second network structure for processing a shallow feature map of the input image; wherein,
the second network structure includes a convolutional layer and a nonlinear active layer.
In an embodiment, the first neural network model and the second neural network model each comprise a third network structure for image reconstruction; wherein,
the third network structure reconstructs super-resolution images by pixel reconstruction.
It should be noted that: the specific processing procedures of the first processor 602 and the first communication interface 601 can be understood with reference to the above-described methods.
Of course, in practice, the various components in the first electronic device 600 are coupled together by a bus system 604. It is understood that the bus system 604 is used to enable communications among the components. The bus system 604 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled as bus system 604 in fig. 6.
The first memory 603 in the embodiment of the present application is used to store various types of data to support the operation of the first electronic device 600. Examples of such data include: any computer program for operating on the first electronic device 600.
The method disclosed in the embodiment of the present application may be applied to the first processor 602, or implemented by the first processor 602. The first processor 602 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be implemented by an integrated logic circuit of hardware or an instruction in the form of software in the first processor 602. The first Processor 602 may be a general purpose Processor, a Digital Signal Processor (DSP), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, etc. The first processor 602 may implement or perform the methods, steps and logic blocks disclosed in the embodiments of the present application. A general purpose processor may be a microprocessor or any conventional processor or the like. The steps of the method disclosed in the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software modules may be located in a storage medium located in the first memory 603, and the first processor 602 reads the information in the first memory 603 and, in conjunction with its hardware, performs the steps of the foregoing method.
In an exemplary embodiment, the first electronic Device 600 may be implemented by one or more Application Specific Integrated Circuits (ASICs), DSPs, Programmable Logic Devices (PLDs), Complex Programmable Logic Devices (CPLDs), Field Programmable Gate Arrays (FPGAs), general purpose processors, controllers, Micro Controllers (MCUs), microprocessors (microprocessors), or other electronic components for performing the aforementioned methods.
Based on the hardware implementation of the program module, in order to implement the image super-resolution reconstruction method according to the embodiment of the present application, an embodiment of the present application further provides a second electronic device, as shown in fig. 7, where the second electronic device 700 includes:
a second communication interface 701 capable of performing information interaction with other network nodes;
the second processor 702 is connected to the second communication interface 701 to implement information interaction with other network nodes, and is configured to execute the image super-resolution reconstruction method provided in one or more of the above technical solutions when running a computer program. And the computer program is stored on the second memory 703.
Specifically, the second processor 702 is configured to perform super-resolution image reconstruction by using an image reconstruction model trained by the training method of the image reconstruction model according to any one of the above-mentioned methods.
It should be noted that: the specific processing procedures of the second processor 702 and the second communication interface 701 may be understood with reference to the above-described methods.
Of course, in practice, the various components in the second electronic device 700 are coupled together by the bus system 704. It is understood that the bus system 704 is used to enable communications among the components. The bus system 704 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled in fig. 7 as the bus system 704.
The second memory 703 in the embodiment of the present application is used for storing various types of data to support the operation of the second electronic device 700. Examples of such data include: any computer program for operating on the second electronic device 700.
The method disclosed in the embodiments of the present application can be applied to the second processor 702, or implemented by the second processor 702. The second processor 702 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be implemented by an integrated logic circuit of hardware or an instruction in the form of software in the second processor 702. The second processor 702 described above may be a general purpose processor, a DSP, or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. The second processor 702 may implement or perform the methods, steps and logic blocks disclosed in the embodiments of the present application. A general purpose processor may be a microprocessor or any conventional processor or the like. The steps of the method disclosed in the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in a storage medium located in the second memory 703, and the second processor 702 reads the information in the second memory 703, and completes the steps of the foregoing method in combination with its hardware.
In an exemplary embodiment, the second electronic device 700 may be implemented by one or more ASICs, DSPs, PLDs, CPLDs, FPGAs, general-purpose processors, controllers, MCUs, microprocessors, or other electronic components for performing the aforementioned methods.
It is understood that the memories (the first memory 603 and the second memory 703) of the embodiments of the present application may be volatile memories or nonvolatile memories, and may include both volatile and nonvolatile memories. Among them, the nonvolatile Memory may be a Read Only Memory (ROM), a Programmable Read Only Memory (PROM), an Erasable Programmable Read-Only Memory (EPROM), an Electrically Erasable Programmable Read-Only Memory (EEPROM), a magnetic random access Memory (FRAM), a Flash Memory (Flash Memory), a magnetic surface Memory, an optical disk, or a Compact Disc Read-Only Memory (CD-ROM); the magnetic surface storage may be disk storage or tape storage. Volatile Memory can be Random Access Memory (RAM), which acts as external cache Memory. By way of illustration and not limitation, many forms of RAM are available, such as Static Random Access Memory (SRAM), Synchronous Static Random Access Memory (SSRAM), Dynamic Random Access Memory (DRAM), Synchronous Dynamic Random Access Memory (SDRAM), Double Data Rate Synchronous Dynamic Random Access Memory (DDRSDRAM), Enhanced Synchronous Dynamic Random Access Memory (ESDRAM), Enhanced Synchronous Dynamic Random Access Memory (Enhanced DRAM), Synchronous Dynamic Random Access Memory (SLDRAM), Direct Memory (DRmb Access), and Random Access Memory (DRAM). The memories described in the embodiments of the present application are intended to comprise, without being limited to, these and any other suitable types of memory.
In an exemplary embodiment, the present application further provides a storage medium, specifically a computer storage medium, which is a computer readable storage medium, for example, the storage medium includes a first memory 603 storing a computer program, where the computer program is executable by the first processor 602 of the first electronic device 600 to perform the steps of the first electronic device side method. For example, the second memory 703 may store a computer program, which may be executed by the second processor 702 of the second electronic device 700 to perform the steps of the second electronic device side method. The computer readable storage medium may be Memory such as FRAM, ROM, PROM, EPROM, EEPROM, Flash Memory, magnetic surface Memory, optical disk, or CD-ROM.
It should be noted that: "first," "second," and the like are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.
The technical means described in the embodiments of the present application may be arbitrarily combined without conflict.
The above description is only a preferred embodiment of the present application, and is not intended to limit the scope of the present application.

Claims (17)

1. A training method of an image reconstruction model is characterized by comprising the following steps:
simultaneously training a first neural network model and a second neural network model by using the same training image; wherein,
when the first neural network model is trained, performing supervised training on the first neural network model based on the first information; the first information characterizes at least one level of image features output in the middle of the second neural network model;
the first neural network model and the second neural network model are both used for image super-resolution reconstruction; the first parameter corresponding to the second neural network model is larger than the first parameter corresponding to the first neural network model; the first parameter characterizes the structural complexity of a first network structure in the neural network model; the first network structure characterizes a non-linear mapping structure.
2. The method of claim 1, wherein the nonlinear mapping structure of the second neural network model comprises a concatenation of M residual blocks; the nonlinear mapping structure of the first neural network model comprises cascaded N residual blocks; wherein,
each residual block is used for extracting corresponding image characteristics; and M and N are integers which are larger than 1, and M is larger than N.
3. The method of claim 1, wherein a jump connection is between two adjacent residual blocks.
4. The method of claim 2, wherein each of the N residual blocks of the first neural network model employs a deep separable convolution.
5. The method of claim 1, further comprising:
calculating a training error of the first neural network model based on the set loss function; wherein,
the set loss function consists of a first loss function and at least one second loss function; a function value of the first loss function characterizes a difference between an output image and a target image of the first neural network model; the function value of each of the at least one second loss function characterizes a difference between the image feature output by the first neural network model at the corresponding layer and the image feature output by the second neural network at the same corresponding layer.
6. The method of claim 5, wherein the first loss function comprises one of:
mean absolute error MAE loss function;
a perceptual loss function.
7. The method of claim 5, wherein each of the at least one second loss function comprises a pair-wise loss function.
8. The method of claim 1, wherein the first neural network model and the second neural network model each comprise a second network structure for processing a shallow feature map of the input image; wherein,
the second network structure includes a convolutional layer and a nonlinear active layer.
9. The method of claim 1, wherein the first and second neural network models each contain a third network structure for image reconstruction; wherein,
the third network structure reconstructs super-resolution images by pixel reconstruction.
10. An image super-resolution reconstruction method, characterized in that the image super-resolution reconstruction is performed by using an image reconstruction model trained by the method according to any one of claims 1 to 9.
11. An apparatus for training an image reconstruction model, comprising:
the training unit is used for adopting the same training image to train the first neural network model and the second neural network model at the same time; wherein,
when the first neural network model is trained, performing supervised training on the first neural network model based on the first information; the first information characterizes at least one level of image features output in the middle of the second neural network model;
the first neural network model and the second neural network model are both used for image super-resolution reconstruction; the first parameter corresponding to the second neural network model is larger than the first parameter corresponding to the first neural network model; the first parameter represents the structural complexity corresponding to a first network structure in the neural network model; the first network structure characterizes a non-linear mapping structure.
12. An image super-resolution reconstruction apparatus, comprising:
a reconstruction unit for performing image super-resolution reconstruction using an image reconstruction model trained by the method of any one of claims 1 to 9.
13. A first electronic device, comprising: a first processor and a first communication interface; wherein,
the first processor is used for simultaneously training a first neural network model and a second neural network model by adopting the same training image; wherein,
when the first neural network model is trained, performing supervised training on the first neural network model based on the first information; the first information characterizes at least one level of image features output in the middle of the second neural network model;
the first neural network model and the second neural network model are both used for image super-resolution reconstruction; the first parameter corresponding to the second neural network model is larger than the first parameter corresponding to the first neural network model; the first parameter represents the structural complexity corresponding to a first network structure in the neural network model; the first network structure characterizes a non-linear mapping structure.
14. A second electronic device, comprising: a second processor and a second communication interface; wherein,
the second processor is configured to perform super-resolution image reconstruction using an image reconstruction model trained by the method according to any one of claims 1 to 9.
15. A first electronic device, comprising: a first processor and a first memory for storing a computer program capable of running on the processor,
wherein the first processor is adapted to perform the steps of the method of any one of claims 1 to 9 when running the computer program.
16. A second electronic device, comprising: a second processor and a second memory for storing a computer program capable of running on the processor,
wherein the second processor is adapted to perform the steps of the method of claim 10 when running the computer program.
17. A storage medium having stored thereon a computer program for performing the steps of the method of any one of claims 1 to 9, or for performing the steps of the method of claim 10, when the computer program is executed by a processor.
CN202011155802.1A 2020-10-26 2020-10-26 Training method and device for image reconstruction model, electronic equipment and storage medium Pending CN114494006A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011155802.1A CN114494006A (en) 2020-10-26 2020-10-26 Training method and device for image reconstruction model, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011155802.1A CN114494006A (en) 2020-10-26 2020-10-26 Training method and device for image reconstruction model, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114494006A true CN114494006A (en) 2022-05-13

Family

ID=81470717

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011155802.1A Pending CN114494006A (en) 2020-10-26 2020-10-26 Training method and device for image reconstruction model, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114494006A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115115512A (en) * 2022-06-13 2022-09-27 荣耀终端有限公司 Training method and device for image hyper-resolution network
CN117425013A (en) * 2023-12-19 2024-01-19 杭州靖安防务科技有限公司 Video transmission method and system based on reversible architecture

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115115512A (en) * 2022-06-13 2022-09-27 荣耀终端有限公司 Training method and device for image hyper-resolution network
CN115115512B (en) * 2022-06-13 2023-10-03 荣耀终端有限公司 Training method and device for image superdivision network
CN117425013A (en) * 2023-12-19 2024-01-19 杭州靖安防务科技有限公司 Video transmission method and system based on reversible architecture
CN117425013B (en) * 2023-12-19 2024-04-02 杭州靖安防务科技有限公司 Video transmission method and system based on reversible architecture

Similar Documents

Publication Publication Date Title
CN111369440B (en) Model training and image super-resolution processing method, device, terminal and storage medium
WO2022017025A1 (en) Image processing method and apparatus, storage medium, and electronic device
CN111696110B (en) Scene segmentation method and system
CN111951167B (en) Super-resolution image reconstruction method, super-resolution image reconstruction device, computer equipment and storage medium
CN110246084B (en) Super-resolution image reconstruction method, system and device thereof, and storage medium
CN109858613B (en) Compression method and system of deep neural network and terminal equipment
CN111192278B (en) Semantic segmentation method, semantic segmentation device, computer equipment and computer readable storage medium
CN114494006A (en) Training method and device for image reconstruction model, electronic equipment and storage medium
CN113159143A (en) Infrared and visible light image fusion method and device based on jump connection convolution layer
CN114821058A (en) Image semantic segmentation method and device, electronic equipment and storage medium
CN114913094A (en) Image restoration method, image restoration apparatus, computer device, storage medium, and program product
CN117478949A (en) Method and system for constructing audio visual sense attention prediction model
CN115082306A (en) Image super-resolution method based on blueprint separable residual error network
CN116977169A (en) Data processing method, apparatus, device, readable storage medium, and program product
CN116433686A (en) Medical image segmentation method and related equipment based on transform context information fusion
CN116029905A (en) Face super-resolution reconstruction method and system based on progressive difference complementation
CN116978057A (en) Human body posture migration method and device in image, computer equipment and storage medium
CN116311455A (en) Expression recognition method based on improved Mobile-former
CN112529064B (en) Efficient real-time semantic segmentation method
CN115719297A (en) Visible watermark removing method, system, equipment and medium based on high-dimensional space decoupling
US20210224632A1 (en) Methods, devices, chips, electronic apparatuses, and storage media for processing data
US20220044370A1 (en) Image processing methods
CN115496651A (en) Feature processing method and device, computer-readable storage medium and electronic equipment
CN113902631A (en) Image processing method, electronic device, and storage medium
CN113887719A (en) Model compression method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination