CN115170399A - Multi-target scene image resolution improving method, device, equipment and medium - Google Patents

Multi-target scene image resolution improving method, device, equipment and medium Download PDF

Info

Publication number
CN115170399A
CN115170399A CN202211092795.4A CN202211092795A CN115170399A CN 115170399 A CN115170399 A CN 115170399A CN 202211092795 A CN202211092795 A CN 202211092795A CN 115170399 A CN115170399 A CN 115170399A
Authority
CN
China
Prior art keywords
resolution
image
super
target scene
low
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211092795.4A
Other languages
Chinese (zh)
Inventor
郭金林
老松杨
汤俊
李欣炜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN202211092795.4A priority Critical patent/CN115170399A/en
Publication of CN115170399A publication Critical patent/CN115170399A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4053Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4046Scaling of whole images or parts thereof, e.g. expanding or contracting using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The application relates to a method, a device, equipment and a medium for improving the resolution of multi-target scene images, wherein the method comprises the following steps: acquiring a multi-target scene image with low resolution; calling a super-resolution network model based on a DRN architecture; inputting the multi-target scene image with low resolution into the trained super-resolution network model, and generating the multi-target scene image with high resolution by updating a forward mapping network in the super-resolution network model; inputting the low-resolution multi-target scene images and the high-resolution multi-target scene images into a truncated VGG19 network for feature extraction to obtain feature maps of the low-resolution multi-target scene images and the high-resolution multi-target scene images; substituting the characteristic diagram into a loss function of the super-resolution network model to calculate a mean square error; and outputting the high-resolution multi-target scene image after the mean square error. The resolution of the generated multi-target scene images is greatly improved.

Description

Multi-target scene image resolution improving method, device, equipment and medium
Technical Field
The present application relates to the field of image processing technologies, and in particular, to a method, an apparatus, a device, and a medium for improving a resolution of a multi-target scene image.
Background
Among the generation problems of multi-target scene images, the conditional simple graph generation countermeasure network (conditional simple graph generation countermeasure network) improved based on the SinGAN has successfully solved the controllable generation problem of multi-target scene images, which can train according to a given multi-target scene image and generate pseudo multi-target scene images along the direction desired by a user under the guidance of control conditions, and the number, distribution, etc. of the pseudo-images more conforming to the target are controllable. The pseudo multi-target scene image is more in line with human visual cognition, has rich application scenes and especially has important application value in the field of news public opinion and intelligence.
However, in the process of implementing the present invention, the inventor finds that in the foregoing conventional multi-target scene image generation method, the connectinal singan is trained to generate an antagonistic network model in a single-image block training manner, so that the generated image has a small size, a low resolution, coarse details and an influence on visual experience, thereby reducing the fidelity of a forged image, and thus, the technical problem of insufficient resolution of the generated image exists.
Disclosure of Invention
Accordingly, it is necessary to provide a method, a device and a computer apparatus for improving the resolution of a multi-target scene image, which can greatly improve the resolution of the generated multi-target scene image.
In order to achieve the above purpose, the embodiment of the invention adopts the following technical scheme:
in one aspect, an embodiment of the present invention provides a method for improving resolution of a multi-target scene image, including:
acquiring a multi-target scene image with low resolution;
calling a super-resolution network model based on a DRN architecture; the super-resolution network model comprises a low-resolution to high-resolution forward mapping network and a high-resolution to low-resolution reverse mapping network;
inputting the multi-target scene image with low resolution into the trained super-resolution network model, and generating the multi-target scene image with high resolution by updating a forward mapping network in the super-resolution network model;
inputting the low-resolution multi-target scene images and the high-resolution multi-target scene images into a truncated VGG19 network for feature extraction to obtain feature maps of the low-resolution multi-target scene images and the high-resolution multi-target scene images;
substituting the characteristic diagram into a loss function of the super-resolution network model to calculate a mean square error;
and outputting the high-resolution multi-target scene image after the mean square error.
On the other hand, a multi-target scene image resolution improving device is also provided, which includes:
the image acquisition module is used for acquiring a multi-target scene image with low resolution;
the model calling module is used for calling a super-resolution network model based on the DRN architecture; the super-resolution network model comprises a forward mapping network from low resolution to high resolution and an inverse mapping network from high resolution to low resolution;
the super-generation module is used for inputting the multi-target scene images with low resolution into the trained super-resolution network model and generating the multi-target scene images with high resolution by updating the forward mapping network in the super-resolution network model;
the characteristic extraction module is used for inputting the low-resolution multi-target scene images and the high-resolution multi-target scene images into the truncated VGG19 network for characteristic extraction to obtain characteristic graphs of the low-resolution multi-target scene images and the high-resolution multi-target scene images;
the loss calculation module is used for substituting the characteristic diagram into a loss function of the super-resolution network model to calculate a mean square error;
and the image output module is used for outputting the high-resolution multi-target scene image after the mean square error.
In another aspect, a computer device is further provided, which includes a memory and a processor, where the memory stores a computer program, and the processor implements any one of the steps of the above multi-target scene image resolution enhancement method when executing the computer program.
In another aspect, a computer-readable storage medium is provided, on which a computer program is stored, and when the computer program is executed by a processor, the method for increasing the resolution of a multi-target scene image as described above is implemented.
One of the above technical solutions has the following advantages and beneficial effects:
according to the method, the device, the equipment and the medium for improving the resolution of the multi-target scene image, after the low-resolution multi-target scene image to be reconstructed is obtained, the trained super-resolution network model based on the DRN architecture is called, the low-resolution multi-target scene image is input into the super-resolution network model for processing, and the super-resolution network model generates the high-resolution multi-target scene image (namely, the reconstructed image) through updating the forward mapping. And then, extracting the characteristics of the low-resolution multi-target scene images and the high-resolution multi-target scene images by using a truncated VGG19 network to obtain corresponding characteristic graphs, and substituting the characteristic graphs into a loss function of the model to calculate the mean square error. Therefore, in both the forward mapping and the reverse mapping of the model, the feature extraction plays a role, so that the model pays more attention to the acquisition of the inherent information of the image in the training process, namely, the inherent features of the reconstructed image (the high-resolution multi-target scene image) and the real image (the low-resolution multi-target scene image) are calculated by the loss function, the semantic modification of the reconstructed image to the real image is reduced, the quality of the reconstructed image is improved, the purpose of super-resolution reconstruction of the multi-target scene image is really achieved, the effect of greatly improving the resolution of the generated multi-target scene image is realized, and the fidelity of the generated image is obviously improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments or the conventional technologies of the present application, the drawings used in the descriptions of the embodiments or the conventional technologies will be briefly introduced below, it is obvious that the drawings in the following descriptions are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flowchart illustrating a method for resolution enhancement of multi-target scene images in an embodiment;
FIG. 2 is a diagram of a DRN-based network architecture in one embodiment;
FIG. 3 is a schematic diagram of the steps of closed loop training in one embodiment;
FIG. 4 is a schematic diagram illustrating an exemplary process flow for feature map difference loss based processing;
FIG. 5 is a diagram illustrating sample reconstructed images obtained by different VGG truncation schemes in an embodiment; wherein, (a) is an original forged image, (b) is an image obtained in scheme 1, and (c) is an image obtained in scheme 2;
FIG. 6 is a diagram illustrating details of output results of super-resolution reconstruction methods according to an embodiment; wherein, (a) is an original image, (b) is an image of Bicubic, (c) is an image of SRResNet, (d) is an image of SRGAN, and (e) is an image of a network based on a feature map difference value;
fig. 7 is a schematic block diagram of an embodiment of a multi-target scene image resolution enhancement apparatus.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more clearly understood, the present application is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of and not restrictive on the broad application.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used in the description of the present application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application.
In addition, the technical solutions in the embodiments of the present invention may be combined with each other, but it must be based on the realization of the technical solutions by those skilled in the art, and when the technical solutions are contradictory to each other or cannot be realized, the combination of the technical solutions should be considered to be absent and not to be within the protection scope of the present invention.
In the application, high-quality super-resolution reconstruction of multi-target scene images with insufficient resolution is realized, and the multi-target scene images to be reconstructed are, for example and without limitation, pseudo multi-target scene images generated by conventional SinGAN or other low-resolution multi-target scene images in the military field. By performing super-resolution reconstruction, the resolution of the generated pseudo multi-target scene image can be improved, so that the pseudo multi-target scene image has better puzzlement, and the practical application effect of the pseudo multi-target scene image in the fields of intelligence and public opinion is improved.
Although the existing super-resolution networks SRGAN and SRResNet can be used for improving the image resolution, the two network models need a large amount of data pairs during training, image intelligence in the application is usually difficult to obtain, and a pseudo multi-target scene image generated by a multi-target scene image generation technology only has single resolution. If a predefined image degradation method is used to downsample a high-resolution image, an image pair is obtained, and the actual image degradation mode is unknown, so that the robustness of the model is reduced.
The Depth Residual Network (DRN) can solve the problem of unknown image degradation by means of learning, and can be trained under the condition that image pairs are complete or missing. The DRN has the defect that the reconstructed super-resolution image is low in quality, and the inventor finds out that the mean square error of the difference value of the original image and the pixel of the reconstructed image is directly used as a loss function in the calculation of the loss function, so that the high-frequency details of the reconstructed image are lost, the target contour is not clear, and the visual experience of people is influenced. And in the SRGAN, the truncated VGG19 is adopted to extract the characteristics of the real image and the reconstructed image, and then the extracted characteristic graph is substituted into the loss function for calculation, so that the problem of high-frequency information loss of the image is solved to a great extent, and the generation quality of the image is improved. Therefore, the two models can be fused for reference, the requirement of the models on data pair is reduced, and the quality of the reconstructed image is improved. The super-resolution reconstruction technology based on the difference characteristic diagram network is based on the theoretical basis.
In preparing the training material, pairs of images, i.e., low resolution and high resolution versions of the same picture, that exist in the real-world environment can be found, but this approach is difficult to implement because it is difficult to find such pairs in some fields, such as the military. The low-resolution image of the real image is obtained through a down-sampling mode, the mode is simple to implement, but on the premise that the image degradation method is unknown, the low-resolution image is obtained through a predefined image down-sampling mode, the training effect of the model can be influenced, and the trained model has a good testing effect only on a small part of images and does not have universality.
Referring to fig. 1, in one aspect, the present invention provides a method for increasing a resolution of a multi-target scene image, including the following steps S12 to S22.
S12, acquiring a multi-target scene image with low resolution;
s14, calling a super-resolution network model based on a DRN architecture; the super-resolution network model comprises a low-resolution to high-resolution forward mapping network and a high-resolution to low-resolution reverse mapping network;
s16, inputting the multi-target scene image with low resolution into the trained super-resolution network model, and generating the multi-target scene image with high resolution by updating an order mapping network in the super-resolution network model;
s18, inputting the low-resolution multi-target scene images and the high-resolution multi-target scene images into a truncated VGG19 network for feature extraction to obtain feature maps of the low-resolution multi-target scene images and the high-resolution multi-target scene images;
s20, substituting the characteristic diagram into a loss function of the super-resolution network model to calculate a mean square error;
and S22, outputting the high-resolution multi-target scene image after the mean square error.
It can be understood that, in order to make the method of the present application easier to understand, the network model of the present application is introduced first in this embodiment, and the construction of the new model provided by the present application is based on the DRN architecture, and the structure of the new model is shown in fig. 2.
As can be seen from FIG. 2, the network model is based on a U-Net structure and is mainly divided into two parts: a forward mapping from low resolution to high resolution and an inverse mapping from high resolution to low resolution. The specific operations of the up-sampling module and the down-sampling module in the network model on the image are as follows: in the model order mapping, a low-resolution image LR input into a network is firstly amplified to the size of a high-resolution image HR through Bicubic (Bicubic) interpolation, a feature map is extracted through convolution, the feature map is reduced by 4 times after being subjected to convolution with two step lengths of 2 to obtain a feature map with the size of 1/4HR, the reduced feature map is sequentially sampled to the size of 4 times through a pixel cleaning and RCA (depth residual error channel attention) module, and the finally obtained high-resolution image is compared with a real high-resolution image, so that the order mapping is updatedFThus, the work of the forward mapping network is completed. Wherein, RCAB is a residual channel attention module.
In the inverse mapping process, the high-resolution image is reduced by 4 times through the convolution module to obtain a reduced image, the image is processed through the dual network and then compared with the original low-resolution image, and therefore inverse mapping is updatedR. The convolution module is composed of two groups of conv (stride = 2) -LeakyReLU-conv modules, and the modules are CB for reducing the image by two times. The RCA module is relatively complex and comes from a traditional RCAN network, and the network adaptively adjusts channel characteristics based on an attention mechanism and enhances the characterization capability of inherent information of the image.
The series of operations performed in the RCA module are: the method comprises the steps of further feature extraction through two RCA modules, image size enlargement through two times of pixel cleaning, global average pooling of obtained feature maps to obtain a channel descriptor containing rough information, division on a channel by a certain proportion, namely down sampling, and up sampling to obtain a weight coefficient of each channel. And finally multiplying the original characteristics obtained through the residual error to obtain new characteristics with the channel weight redistributed. The new feature is added to the original 1/4 feature map, and the new feature is the output of the RCA network. The purpose of this is to improve the characterization ability of the network for low-frequency information and high-frequency information by adaptively adjusting the channel characteristics through a channel attention mechanism.
The network model of the application adopts a closed-loop training mode, the training process can be as shown in fig. 3, in each step of training, the training from left to right is sequential mappingFThe method is used for customizing training of paired data. Training from right to left is then inverse mappingRAnd order mappingFA closed-loop training mode is formed.
In some embodiments, the training process of the super-resolution network model includes:
acquiring paired image pairs and inputting the paired image pairs into a super-resolution network model; the image pair comprises a low resolution image and a high resolution image;
improving the resolution of the low-resolution image through a forward mapping network of a super-resolution network model to obtain a first pseudo high-resolution image with the same size as the high-resolution image;
comparing the first pseudo high-resolution image with the high-resolution image, and improving the similarity of the first pseudo high-resolution image to the high-resolution image;
and updating model parameters of the forward mapping network by a gradient descent method according to the high-resolution image and the first pseudo high-resolution image with the improved similarity degree, and completing pre-training of the super-resolution network model.
Specifically, can be provided with
Figure 857338DEST_PATH_IMAGE001
Representing the original low-resolution image LR,
Figure 5422DEST_PATH_IMAGE002
represents the pseudo-LR of the light beam,
Figure 154644DEST_PATH_IMAGE003
representing a true high-resolution image HR,
Figure 362771DEST_PATH_IMAGE004
representing a pseudo HR.
Figure 82597DEST_PATH_IMAGE005
Which represents the original loss of the material,
Figure 34372DEST_PATH_IMAGE006
which represents a loss of the dual-pair,
Figure 38100DEST_PATH_IMAGE007
represents the first
Figure 682708DEST_PATH_IMAGE008
The loss of the steps is caused,
Figure 886900DEST_PATH_IMAGE009
representing the function of the cis-mapping,
Figure 642366DEST_PATH_IMAGE010
represents parameters that are updatable therein;
Figure 235022DEST_PATH_IMAGE011
represents the inverse of the mapping function and,
Figure 50531DEST_PATH_IMAGE012
representing parameters that can be updated therein, with an apostrophe added to the parameters after the update. When different data are aimed at, the training mode is different, and (1), (2) and (3) in each step in the figure represent the sequential order of the execution of the mapping. The specific training steps are as follows (1) and (2):
(1) When training the pair of images, the basic process of super-resolution reconstruction is followed, namely, the images are firstly mapped by the sequential mapping
Figure 10528DEST_PATH_IMAGE013
Enhancing low resolution images
Figure 772948DEST_PATH_IMAGE014
I.e. upsampling the image to the original high resolution image
Figure 485689DEST_PATH_IMAGE015
Same size, high resolution image is obtained
Figure 206520DEST_PATH_IMAGE016
An image is formed
Figure 903081DEST_PATH_IMAGE017
And images
Figure 751082DEST_PATH_IMAGE018
Making a comparison so that
Figure 583909DEST_PATH_IMAGE019
And
Figure 210062DEST_PATH_IMAGE020
more similarly, and then the gradient descent method is used to map the cis direction
Figure 128340DEST_PATH_IMAGE021
The model parameters in (1) are adjusted so that the order maps are in an upsampled manner. The mathematical expression of this step is as follows:
Figure 780032DEST_PATH_IMAGE022
(1)
Figure 201786DEST_PATH_IMAGE023
(2)
Figure 529999DEST_PATH_IMAGE024
(by min
Figure 669993DEST_PATH_IMAGE025
Updating a forward mapping function
Figure 145885DEST_PATH_IMAGE026
) The principle of this step is shown as the first step in fig. 3.
In some embodiments, the training process of the super-resolution network model further includes:
improving the resolution of the low-resolution image by using a forward mapping network of the pre-trained super-resolution network model to obtain a second pseudo high-resolution image;
reducing the resolution of the second pseudo high-resolution image by adopting an inverse mapping network of the super-resolution network model to obtain a pseudo low-resolution image with the same size as the low-resolution image;
and comparing the pseudo low-resolution image with the low-resolution image, and updating model parameters of an inverse mapping network and a forward mapping network of the super-resolution network model to obtain the trained super-resolution network model.
Specifically, the method comprises the following steps: (2) When training unpaired data, firstly using the order mapping trained in the step (1)
Figure 687724DEST_PATH_IMAGE027
For original low resolution image
Figure 452418DEST_PATH_IMAGE028
To obtain a high-resolution image
Figure 282971DEST_PATH_IMAGE029
. However, since the images are not paired and the high-resolution images cannot be compared with each other, and the model parameters cannot be corrected, the high-resolution images are obtained
Figure 525733DEST_PATH_IMAGE030
Semantic and original image of
Figure 938391DEST_PATH_IMAGE031
There is a difference in that the domain of the mapping is not defined
Figure 608407DEST_PATH_IMAGE032
Within the domain of (c). Next, the high resolution image is mapped using inverse mapping
Figure 722993DEST_PATH_IMAGE033
Resolution reduction of (2), i.e. down-sampling to and from low resolution images
Figure 503868DEST_PATH_IMAGE031
Same size, resulting in low resolution images
Figure 771032DEST_PATH_IMAGE034
Then images are taken
Figure 611949DEST_PATH_IMAGE035
And the original low resolution image
Figure 948252DEST_PATH_IMAGE036
Performing comparison to further perform inverse mapping
Figure 532818DEST_PATH_IMAGE037
Heshun mapping
Figure 654489DEST_PATH_IMAGE038
The model parameters in (1) are adjusted so that
Figure 666307DEST_PATH_IMAGE039
And
Figure 489906DEST_PATH_IMAGE040
more similarly. So that the inverse mapping is converted into a down-sampling mode, the parameters of the forward mapping are updated accordingly, and the updated forward mapping is obtained by inputting the original low-resolution image
Figure 143742DEST_PATH_IMAGE041
The 'natural' high-resolution image with better quality and closer semantic meaning to the original low-resolution image can be obtained
Figure 116989DEST_PATH_IMAGE042
Figure 299709DEST_PATH_IMAGE043
Is limited to
Figure 876184DEST_PATH_IMAGE044
In the field); low resolution image at this time
Figure 802552DEST_PATH_IMAGE045
Can be mixed with
Figure 367656DEST_PATH_IMAGE046
An image pair is formed. The mathematical expression for this step is as follows:
Figure 986856DEST_PATH_IMAGE047
(3)
Figure 785048DEST_PATH_IMAGE048
(4)
Figure 515107DEST_PATH_IMAGE049
(5)
Figure 200297DEST_PATH_IMAGE050
(using a minimum of
Figure 990399DEST_PATH_IMAGE051
Updating a proper mapping function
Figure 275886DEST_PATH_IMAGE052
And inverse mapping function
Figure 809636DEST_PATH_IMAGE053
Figure 614912DEST_PATH_IMAGE054
(6)
The principle of this step is shown in the second step of fig. 3. Using the product obtained in step (2)
Figure 779177DEST_PATH_IMAGE055
And with
Figure 286382DEST_PATH_IMAGE056
Make up an image pair, this time
Figure 358243DEST_PATH_IMAGE057
I.e. a high resolution image, having an intensity equal to
Figure 532872DEST_PATH_IMAGE058
The semantics are the same, the resolution is higher, and the like. The forward mapping at this time also has the ability to convert a low resolution image to a high resolution image.
Step (1) is equivalent to pre-training the model, and the training material may be an easily-obtained, paired public data set. Because the model is pre-trained in step (1), only the mapping relation from low resolution to high resolution needs to be learned, and the data of a special field does not need to be learned.
Step (2) is equivalent to manufacturing low resolution data, except that the manufacturing method differs from the conventional method, which obtains paired data by predefined down-sampling, while step (2) obtains "native" low resolution data by learning the corresponding inverse mapping. In practical application scenarios, it is most common to use step (2) in which a high resolution image is generated by updating the cis mapping, which constitutes an image pair with a "native" low resolution image. Therefore, the generation process of the high-resolution multi-target scene image in the above step S16 can be understood similarly with reference to the aforementioned step (2).
By adopting the training mode based on the closed-loop mapping, the dilemma of few data in reality can be overcome, and especially when the images are rare and the resolution ratio is generally low for military multi-target scenes, the training mode has greater practical significance. Only low resolution images need to be provided and the trained model input to obtain the corresponding high resolution image.
The closed-loop training mode can solve the problems of image data scarcity and low natural resolution, such as military field and some public opinion field. However, because the loss function of the DRN does not perform further feature extraction on the reconstructed image and the real image, the inherent features of the reconstructed image and the original image (such as the real image or the initially provided pseudo image with low resolution) cannot be well expressed, so that the reconstructed image has the characteristics of low quality, poor visual effect and the like, and cannot meet the required super-resolution reconstruction standard.
Loss function based on feature map difference: it can be understood that, regarding the loss function of the super-resolution network model, the processing concept of the loss function of the conventional SRGAN network is combined, and the improvement is performed, so that the loss function is used for calculating the inherent characteristics of the reconstructed image and the real image, so as to improve the quality of the reconstructed image and realize the super-resolution reconstruction of the real multi-target scene image.
Specifically, when a traditional learning-based super-resolution reconstruction model including the DRN is used for super-resolution model training, the mean square error of the pixel difference between a real image and a reconstructed image is commonly used as a loss function, and then model parameters are updated. However, such conventional loss focuses on comparing pixel information, ignores semantic differences between a reconstructed image and a real image, and often loses high-frequency information such as target edges, boundaries and the like in the image in such a loss function. If the edge information of the super-resolution image is lost in the reconstruction process, the boundary of the target in the reconstructed super-resolution image is blurred, so that the quality of the reconstructed image is greatly reduced. In order to enable the model to pay more attention to the difference between the reconstructed image and the real image during training, the loss function of the model is improved on the basis of a DRN overall framework by combining the thought of the loss function of the SRGAN, so that the inherent characteristics of the reconstructed image and the real image are calculated, and the quality of the reconstructed image is improved.
In order to extract the inherent characteristics of the reconstructed image and the real image in the training process, the network-VGG 19 special for extracting the image characteristics is adopted to extract the characteristics of the reconstructed image and the real image, and then the extraction result is substituted into the loss function to be calculated. It should be noted that the VGG19 employed in the network model of the present application does not employ the entire network layer of the VGG19, but rather employs a truncated VGG19. When the VGG19 is intercepted, the VGG network before a certain pooling layer and a certain convolutional layer is intercepted. The deeper the network interception, the greater the computation overhead and the stronger the feature extraction capability, so that in practical application, specific network interception of the VGG19 can be performed according to the limitations of computation resources, computation overhead and the like.
Fig. 4 is a schematic diagram of a processing flow based on a feature map difference loss, which is newly proposed in the present application. As can be seen from fig. 3, the reconstructed image (i.e., the high-resolution multi-target scene image generated in step S16) and the real image (i.e., the low-resolution multi-target scene image in step S12) processed by the DRN network are not directly substituted into the loss function to calculate the mean square error, but the reconstructed image and the real image are input into the truncated VGG19 to obtain the feature maps, and then the feature maps of the reconstructed image and the real image are substituted into the loss function to calculate, where the loss functions are both mean square errors, which is the greatest innovation of the network of the present application. In the process, no matter the output results of the forward mapping and the reverse mapping of the closed-loop training, or the real low-resolution image and the high-resolution image, the feature extraction is carried out on the truncated VGG19.
In some embodiments, using the same variable definitions as before, the loss function of the super-resolution network model may include
Figure 147000DEST_PATH_IMAGE059
And
Figure 407080DEST_PATH_IMAGE060
two parts. Process abstraction for feature extraction with truncated VGG19 networks as a function
Figure 17053DEST_PATH_IMAGE061
Then the formula of the process flow is expressed as follows:
Figure 311768DEST_PATH_IMAGE062
(7)
Figure 99727DEST_PATH_IMAGE063
(8)
Figure 847103DEST_PATH_IMAGE064
(by min
Figure 464029DEST_PATH_IMAGE065
Updating a forward mapping function
Figure 347671DEST_PATH_IMAGE066
);
Figure 821378DEST_PATH_IMAGE067
(9)
Figure 541203DEST_PATH_IMAGE068
(10)
Figure 758558DEST_PATH_IMAGE069
(11)
Figure 762286DEST_PATH_IMAGE070
(using a minimum of
Figure 344577DEST_PATH_IMAGE071
Updating a forward mapping function
Figure 800966DEST_PATH_IMAGE072
And inverse mapping function
Figure 572744DEST_PATH_IMAGE073
)。
Wherein the content of the first and second substances,
Figure 430979DEST_PATH_IMAGE074
which represents the original loss of the image,
Figure 449751DEST_PATH_IMAGE075
the dual loss is represented by the loss in pairs,
Figure 393436DEST_PATH_IMAGE076
a function representing a process abstraction for feature extraction with a truncated VGG19 network,
Figure 952593DEST_PATH_IMAGE077
representing a real high-resolution image of the scene,
Figure 413137DEST_PATH_IMAGE078
representing a pseudo high-resolution image of the image,
Figure 399547DEST_PATH_IMAGE079
representing a true low-resolution image of the image,
Figure 33791DEST_PATH_IMAGE080
representing a pseudo low resolution image.
As can be seen from the above equations (8) and (11), the greatest improvement of the new loss function compared to the conventional loss function is that the change of the independent variable of the loss function is changed from the mean square error between the pixel points between the two images to the mean square error between the feature maps between the two images. Feature extraction works in both the forward and reverse mappings. The improvement can enable the model to pay more attention to the acquisition of the inherent information of the image in the training process, reduce the modification of the original semantics of the image by the heavy model and improve the visual effect of the reconstructed image.
In conclusion, the newly proposed network model can train the network under the condition of data unpaired through closed-loop mapping, and the super-resolution reconstruction quality is improved through the feature map difference loss function, so that the fidelity of the generated image is improved.
According to the multi-target scene image resolution improvement method, after the multi-target scene image with low resolution to be reconstructed is obtained, the trained super-resolution network model based on the DRN architecture is called, the multi-target scene image with low resolution is input into the super-resolution network model for processing, and the super-resolution network model generates the multi-target scene image with high resolution (namely, the reconstructed image) through updating the sequential mapping. And then, extracting the characteristics of the low-resolution multi-target scene images and the high-resolution multi-target scene images by using a truncated VGG19 network to obtain corresponding characteristic graphs, and substituting the characteristic graphs into a loss function of the model to calculate the mean square error. Therefore, in both the forward mapping and the reverse mapping of the model, the feature extraction plays a role, so that the model pays more attention to the acquisition of the inherent information of the image in the training process, namely, the inherent features of the reconstructed image (the high-resolution multi-target scene image) and the real image (the low-resolution multi-target scene image) are calculated by the loss function, the semantic modification of the reconstructed image to the real image is reduced, the quality of the reconstructed image is improved, the purpose of super-resolution reconstruction of the multi-target scene image is really achieved, the effect of greatly improving the resolution of the generated multi-target scene image is realized, and the fidelity of the generated image is obviously improved.
In one embodiment, in order to more intuitively and fully describe the multi-target scene image resolution enhancement method, the following is an experimental example of the method: the experiment was trained on public data sets and self-constructed data.
The public data sets may include Set5, set14, BSD100, and COCO2014, which collectively contain a total of 82902 paired images, where the low resolution images are obtained by predefined downsampling of the true high resolution images. Military scene image data in a self-built data set are collected from the Internet, and comprise 5653 images of four major categories, namely airplanes, ships, tanks and missiles, and no paired images exist in the data set.
Because of the lack of military field paired "low-resolution-high-resolution" datasets on the internet, this experiment employed a closed-loop training mode. And (3) sequentially executing the steps (1) to (2) to complete the experiment. During training, the model is trained by using the paired public data sets, and then the self-built military scene image data is trained.
The number of layers of the truncated VGG19 can be adjusted during training. In order to analyze the influence of VGG networks with different depths on the model reconstruction effect, two truncated VGG19 schemes are provided in the experiment: scheme 1 is to perform feature extraction with the first 8 convolutional layers and the first 3 pooling layers as truncated VGGs 19, and scheme 2 is to perform feature extraction with the first 16 convolutional layers and the first 5 pooling layers as truncated VGGs 19. The networks of the two truncation schemes are respectively embedded into the DRN model, and the generation results of the two truncation schemes are compared. The network structure of the feature extraction network adopted by the two schemes is shown in table 1.
TABLE 1
Figure 131060DEST_PATH_IMAGE081
During training, the parameter batch size of the network is set to 400, the epoch is set to 130, and the image is uniformly upgraded to 4 times of the resolution of the forged image.
In order to highlight the superiority of the network based on the difference profile. The experiment adopts 4 representative image super-resolution reconstruction methods to carry out contrast experiments. In the experiment, set5, set14, BSD100, COCO2014 and self-built military scene image datasets were used to train the conventional SRGAN and SRResNet, as well as the new model of the present application.
In the testing stage, in order to compare the model performance, a low-resolution forged image generated by Conditional SinGAN is used as a testing material, and the trained SRGAN and SRResNet, a new network model based on the difference loss of the feature map and a traditional Bicubic method (without training) are tested. After test results of the four methods are obtained, super-resolution reconstruction effects of the 4 methods are compared. Besides qualitatively evaluating and comparing the reconstruction effect of the 4 methods through human visual perception, the images are also subjected to the highest peak signal-to-noise ratioPNSRAnd structural similaritySSIMQuantitative evaluation and comparison (all are common image quality evaluation indexes).
PNSRIs defined as follows: give two sheets
Figure 963887DEST_PATH_IMAGE082
One image is an original high-resolution image HR, the other image is a high-resolution image SR obtained after super-resolution processing, and the mean square error between the two images is defined as:
Figure 871931DEST_PATH_IMAGE083
(12)
then thePNSRIs defined as:
Figure 993471DEST_PATH_IMAGE084
(13)
wherein the content of the first and second substances,
Figure 628852DEST_PATH_IMAGE085
the maximum pixel value possible for the original high resolution image. Provided that each pixel is used
Figure 581764DEST_PATH_IMAGE086
Bit binary representation, then
Figure 909977DEST_PATH_IMAGE087
Figure 800704DEST_PATH_IMAGE088
Representing the mean square error.
PSNRIs dB, and a larger value indicates a smaller distortion.PSNRIs the most common and extensive image resolution evaluation index,PSNRthe calculation of (2) is based on the error between corresponding pixel points, namely based on an error-sensitive image quality evaluation method. Due to the fact thatPSNRThe visual characteristics of human eyes (the human eyes have higher sensitivity to contrast difference with lower spatial frequency and higher sensitivity to brightness contrast difference, the human eyes have higher chroma, and the perception result of the human eyes to one area is influenced by the surrounding adjacent areas) are not considered, so that the situation that the evaluation result is inconsistent with the subjective feeling of the human is often generated. Empirical studies have shown a correspondence between the value of PSNR and the quality of the reconstructed image, e.g.Shown in Table 2:
TABLE 2
Figure 708617DEST_PATH_IMAGE089
Therefore, the PNSR is used for evaluating the quality of the super-resolution reconstruction image, and the proximity degree and the distortion degree of the reconstruction image and the original image can be reflected to a certain degree.
SSIMIs defined as follows: give two sheets
Figure 250457DEST_PATH_IMAGE090
One is an original image
Figure 15151DEST_PATH_IMAGE091
One is a high-resolution image obtained by the super-resolution processing
Figure 642441DEST_PATH_IMAGE092
. Let the image brightness contrast function be
Figure 822887DEST_PATH_IMAGE093
The contrast function is
Figure 235545DEST_PATH_IMAGE094
The structure contrast function is
Figure 905560DEST_PATH_IMAGE095
And then:
Figure 285726DEST_PATH_IMAGE096
(14)
Figure 66600DEST_PATH_IMAGE097
(15)
Figure 786295DEST_PATH_IMAGE098
(16)
Figure 375014DEST_PATH_IMAGE099
(17)
Figure 242476DEST_PATH_IMAGE100
(18)
wherein the content of the first and second substances,
Figure 827041DEST_PATH_IMAGE101
Figure 197980DEST_PATH_IMAGE102
and
Figure 960531DEST_PATH_IMAGE103
are constants that are used to stabilize, avoid dividing by 0,
Figure 784130DEST_PATH_IMAGE104
and
Figure 375648DEST_PATH_IMAGE105
respectively represent
Figure 601093DEST_PATH_IMAGE106
And
Figure 49392DEST_PATH_IMAGE107
is determined by the average value of (a) of (b),
Figure 376600DEST_PATH_IMAGE108
and
Figure 302967DEST_PATH_IMAGE109
respectively represent
Figure 586181DEST_PATH_IMAGE110
And
Figure 205381DEST_PATH_IMAGE111
the standard deviation of (a) is determined,
Figure 3573DEST_PATH_IMAGE112
represents
Figure 484364DEST_PATH_IMAGE113
And
Figure 684401DEST_PATH_IMAGE114
the covariance of (a). Therefore, the temperature of the molten metal is controlled,SSIMthe expression of (a) is a combination of the three:
Figure 677765DEST_PATH_IMAGE115
(19)
when in use
Figure 697674DEST_PATH_IMAGE116
Figure 231423DEST_PATH_IMAGE117
And
Figure 299349DEST_PATH_IMAGE118
when the number of the groups is 1,SSIMis expressed as:
Figure 994772DEST_PATH_IMAGE119
(20)
the value range is 0 to 1 whenSSIMIf =1, the two images are the same.
The basic idea of structural similarity is that images are highly structured, with strong correlation between adjacent pixels. Such a correlation represents structural information of the object in the image. The human visual system is used to extract such structural information when browsing images. And the structural similarity loss is combined with three indexes of brightness, contrast and structure of the two images, which reflect the structural attributes of the objects in the images, so that the image quality is analyzed and calculated, and structural information rather than pixel information is highlighted. The evaluation can be used for evaluating the quality of the multi-target scene image obtained by super-resolution reconstruction.
In summary,PNSRemphasis is placed on measuring the similarity between two image pixels,SSIMemphasis on weighing two figuresThe similarity between the image structures can be used for analyzing the images in a microscopic and macroscopic manner by combining two indexes when analyzing the images obtained by super-resolution reconstruction.
To facilitate comparison of the reconstruction effects, both horizontal and vertical pixels of the reconstructed image are uniformly magnified 4 times the lower resolution image. In the process ofPNSRAndSSIMat the time of calculation, since the original high-resolution image does not exist, an image obtained by enlarging a corresponding real image (the highest layer input of the Conditional SinGAN) through double interpolation is compared with the reconstructed image for calculation.
And (3) analysis of experimental results:
the experiments were performed according to the experimental setup described above. FIG. 5 shows an example of the results of tests with different schemes of truncated VGG19 in the new model.
It can be seen from the test sample that, for the same image, the super-resolution reconstruction effect of the truncation scheme 2 is better than that of the scheme 1. The target edges in some reconstructed images obtained by using the scheme 1 are unclear, and the situation is more obvious when pixel points are amplified. The image obtained by the scheme 2 has clear edges, and brings good visual experience to people. In the aspect of detail reconstruction, when the details of the tank crawler are reconstructed in fig. 5, it is obvious that the solution 2 is superior to the solution 1. The track profile in the scheme 1 is fuzzy, track lines cannot be seen, and the track profile in the scheme 2 is clear, and the track lines can be seen. This is because the scheme 2 adopts a deeper network for feature extraction, and extracts richer image information.
From experimental results, the visual effect of the image reconstructed based on the Bicubic method is poor, the phenomena of image blurring, target contour and boundary unsharpness and the like exist, and a hazy visual feeling is presented in the whole image. The method only mechanically uses interpolation to increase the pixel points of the low-resolution image, and the increased pixel points are only an average value of the surrounding pixel points.
The image reconstruction effect based on the SRResNet and the SRGAN method is superior to that based on the Bicubic method, the image blurring degree is reduced, and the definition of the target contour is increased. Because SRResNet is a learning-based approach that learns the mapping from low resolution to high resolution from a large number of pairs of data using a deep residual network. The SRGAN applies the thought of generating antagonism, and the reconstruction result is better and better by continuously improving the performances of a generator and a discriminator in training. Such a model has stronger reconstruction performance than an interpolation method based on the mean value idea. The super-resolution reconstruction results of SRResNet and SRGAN are not greatly different in visual perception.
The image reconstruction effect obtained by the reconstruction method based on the characteristic map difference loss network is optimal, the comparative advantage of the overall definition of the image can be obviously seen, the outline of the target in the image is clear, the boundary is clear, the image fineness degree is obviously superior to that of the other three methods, and although the specific details of the target cannot be restored, the disordered noise around the target is removed. This method presents the best visual perception and the highest fidelity among the 4 methods.
In order to show more clear details, an image is selected in the experiment, and the original image and the reconstruction result of the image are shown in fig. 6 according to the size of the reconstructed original image. By displaying details of the reconstructed image, the superiority of the image resolution improving method based on the feature map difference network can be more easily seen, and the method is obviously superior to other methods in the aspects of target contour definition, scene definition, visual perception of people and subjective aesthetics of people. Other methods all have different degrees of noise interference.
The above analyses were qualitative assessments and comparisons based on human perception. To make the evaluation of the experimental results more convincing, use was made ofPNSRAndSSIMthe experimental results of the 4 methods were evaluated quantitatively, and 50 pictures of each method were randomly selected as evaluation samples when the evaluation was performed. The results of the evaluation are shown in table 3,PNSRandSSIMthe average of 50 images obtained by each method was randomly taken. Is provided with the first
Figure 970819DEST_PATH_IMAGE120
Of sheetsPNSRIs composed of
Figure 42680DEST_PATH_IMAGE121
SSIMIs composed of
Figure 951730DEST_PATH_IMAGE122
Then the calculation method is as shown in equations 21 and 22:
Figure 99946DEST_PATH_IMAGE123
(21)
Figure 94446DEST_PATH_IMAGE124
(22)
TABLE 3
Figure 969999DEST_PATH_IMAGE125
As can be seen from Table 3, among the 4 methods, based on the reconstruction results of the Bicubic methodPNSRAndSSIMis the lowest.PNSRThe value is only 26.58, and belongs to the interval with better image quality. The above method of the present applicationPNSRThe image quality is the highest, reaches more than 30, and belongs to an interval with good image quality.SSIMThe value also reaches 0.84, which shows that the reconstructed image has high structural similarity with the original image, the image structure is close, and the image semantics are close. In quantitative evaluation, the scoring of the new method (i.e., the method described above in this application) is also overall optimal. This is essentially consistent with the results of human qualitative assessments. Therefore, the high-resolution image obtained by the new method is superior to other methods in both the degree of detail fineness and the structural similarity with the original image.
In summary, the advantages of the new model are specifically as follows:
(1) The ability to produce "natural" data pairs is provided. The new model adopts a training mechanism of closed-loop mapping based on DRN, can learn down-sampling mapping by using a deep network to obtain a natural low-resolution image, thereby obtaining updated up-sampling mapping and obtaining a high-resolution image of a fixed domain. Thus imaging the low resolution image and the high resolution image. The training mechanism only needs the participation of low-resolution images, and compared with the traditional super-resolution model which needs the participation of a large number of artificially manufactured data pairs during training, the new model is more suitable for a real application scene.
(2) The super-resolution reconstruction method has strong super-resolution reconstruction performance. Before loss function calculation, the method improves the traditional method that the super-resolution loss function directly calculates the mean square error of pixel points of a reconstructed image and an original image, uses a feature extraction network to extract features of the reconstructed image and the original image, and calculates the mean square error of the extracted feature map. By the method, the model can pay more attention to the intrinsic information of the image in the learning process, and only the difference of pixel points is not paid more attention. The reconstruction capability of the model is further improved.
In addition, experimental results also show that the effect of reconstructing the multi-target scene image generated by the Conditional SinGAN by using the new method is superior to that of the traditional method and the mainstream learning-based method.
It should be understood that although the various steps in the flowcharts of fig. 1-4 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps of fig. 1-4 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternating with other steps or at least some of the sub-steps or stages of other steps.
Referring to fig. 7, in an embodiment, an apparatus 100 for improving a multi-target scene image resolution is further provided, and includes an image obtaining module 11, a model invoking module 13, a super generating module 15, a feature extracting module 17, a loss calculating module 19, and an image output module 21. The image acquiring module 11 is configured to acquire a low-resolution multi-target scene image. The model calling module 13 is used for calling a super-resolution network model based on the DRN architecture; the super-resolution network model comprises a forward mapping network from low resolution to high resolution and an inverse mapping network from high resolution to low resolution. The super-generation module 15 is configured to input the low-resolution multi-target scene image into the trained super-resolution network model, and generate the high-resolution multi-target scene image by updating the forward mapping network in the super-resolution network model.
The feature extraction module 17 is configured to input the low-resolution multi-target scene images and the high-resolution multi-target scene images into the truncated VGG19 network for feature extraction, so as to obtain feature maps of the low-resolution multi-target scene images and the high-resolution multi-target scene images. And the loss calculation module 19 is used for substituting the feature map into a loss function of the super-resolution network model to calculate a mean square error. The image output module 21 is configured to output the high-resolution multi-target scene image after the mean square error.
After the multi-target scene image resolution improving apparatus 100 obtains the low-resolution multi-target scene image to be reconstructed through cooperation of the modules, the trained super-resolution network model based on the DRN architecture is called, the low-resolution multi-target scene image is input into the super-resolution network model for processing, and the super-resolution network model generates the high-resolution multi-target scene image (i.e., the reconstructed image) through updating the sequential mapping. And then, extracting the characteristics of the low-resolution multi-target scene images and the high-resolution multi-target scene images by using a truncated VGG19 network to obtain corresponding characteristic graphs, and substituting the characteristic graphs into a loss function of the model to calculate the mean square error. Therefore, in both the forward mapping and the reverse mapping of the model, the feature extraction plays a role, so that the model pays more attention to the acquisition of the inherent information of the image in the training process, namely, the inherent features of the reconstructed image (the high-resolution multi-target scene image) and the real image (the low-resolution multi-target scene image) are calculated by the loss function, the semantic modification of the reconstructed image to the real image is reduced, the quality of the reconstructed image is improved, the purpose of super-resolution reconstruction of the multi-target scene image is really achieved, the effect of greatly improving the resolution of the generated multi-target scene image is realized, and the fidelity of the generated image is obviously improved.
In one embodiment, the training process of the super-resolution network model comprises the following steps:
acquiring paired image pairs and inputting the paired image pairs into a super-resolution network model; the image pair comprises a low resolution image and a high resolution image;
improving the resolution of the low-resolution image through a forward mapping network of a super-resolution network model to obtain a first pseudo high-resolution image with the same size as the high-resolution image;
comparing the first pseudo high-resolution image with the high-resolution image to improve the similarity of the first pseudo high-resolution image to the high-resolution image;
and updating model parameters of the forward mapping network by a gradient descent method according to the high-resolution image and the first pseudo high-resolution image with the improved similarity degree, and completing pre-training of the super-resolution network model.
In one embodiment, the training process of the super-resolution network model further includes:
improving the resolution of the low-resolution image by using a pre-trained forward mapping network of the super-resolution network model to obtain a second pseudo high-resolution image;
reducing the resolution of the second pseudo high-resolution image by adopting an inverse mapping network of a super-resolution network model to obtain a pseudo low-resolution image with the same size as the low-resolution image;
and comparing the pseudo low-resolution image with the low-resolution image, and updating model parameters of an inverse mapping network and a forward mapping network of the super-resolution network model to obtain the trained super-resolution network model.
In one embodiment, the loss function of the super-resolution network model comprises
Figure 202397DEST_PATH_IMAGE126
And
Figure 239623DEST_PATH_IMAGE127
Figure 3311DEST_PATH_IMAGE128
Figure 416975DEST_PATH_IMAGE129
wherein the content of the first and second substances,
Figure 503879DEST_PATH_IMAGE130
which represents the original loss of the image,
Figure 712007DEST_PATH_IMAGE131
the dual loss is represented by the loss in pairs,
Figure 681100DEST_PATH_IMAGE132
a function representing a process abstraction for feature extraction with a truncated VGG19 network,
Figure 649187DEST_PATH_IMAGE133
representing a real high-resolution image of the scene,
Figure 56510DEST_PATH_IMAGE134
representing a pseudo high-resolution image of the image,
Figure 701118DEST_PATH_IMAGE135
representing a true low-resolution image of the image,
Figure 360769DEST_PATH_IMAGE136
representing a pseudo low resolution image.
For specific limitations of the apparatus 100 for increasing resolution of multiple target scene images, reference may be made to the corresponding limitations of the foregoing method for increasing resolution of multiple target scene images, which are not described herein again. All or part of the modules in the multi-target scene image resolution improving apparatus 100 may be implemented by software, hardware and a combination thereof. The modules may be embedded in a hardware form or a device independent of a specific data processing function, or may be stored in a memory of the device in a software form, so that a processor can call and execute operations corresponding to the modules, where the device may be, but is not limited to, various computer devices in the art.
In still another aspect, a computer device is provided, which includes a memory and a processor, the memory stores a computer program, and the processor executes the computer program to implement the following steps: acquiring a multi-target scene image with low resolution; calling a super-resolution network model based on a DRN architecture; the super-resolution network model comprises a low-resolution to high-resolution forward mapping network and a high-resolution to low-resolution reverse mapping network; inputting the multi-target scene image with low resolution into the trained super-resolution network model, and generating the multi-target scene image with high resolution by updating a forward mapping network in the super-resolution network model; inputting the low-resolution multi-target scene images and the high-resolution multi-target scene images into a truncated VGG19 network for feature extraction to obtain feature maps of the low-resolution multi-target scene images and the high-resolution multi-target scene images; substituting the characteristic diagram into a loss function of the super-resolution network model to calculate a mean square error; and outputting the high-resolution multi-target scene image after the mean square error.
In an embodiment, the processor, when executing the computer program, may further implement the additional steps or sub-steps in the embodiments of the multi-target scene image resolution enhancement method.
In yet another aspect, there is also provided a computer readable storage medium having a computer program stored thereon, the computer program when executed by a processor implementing the steps of: acquiring a multi-target scene image with low resolution; calling a super-resolution network model based on a DRN architecture; the super-resolution network model comprises a forward mapping network from low resolution to high resolution and an inverse mapping network from high resolution to low resolution; inputting the multi-target scene image with low resolution into the trained super-resolution network model, and generating the multi-target scene image with high resolution by updating a forward mapping network in the super-resolution network model; inputting the low-resolution multi-target scene images and the high-resolution multi-target scene images into a truncated VGG19 network for feature extraction to obtain feature maps of the low-resolution multi-target scene images and the high-resolution multi-target scene images; substituting the characteristic diagram into a loss function of the super-resolution network model to calculate a mean square error; and outputting the high-resolution multi-target scene image after the mean square error.
In one embodiment, when being executed by a processor, the computer program may further implement the additional steps or sub-steps in the embodiments of the multi-target scene image resolution enhancement method.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above may be implemented by a computer program, which may be stored in a non-volatile computer-readable storage medium, and the computer program may include the processes of the embodiments of the methods described above when executed. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchlink DRAM (SLDRAM), rambus DRAM (RDRAM), and interface DRAM (DRDRAM).
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above examples only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for those skilled in the art, various changes and modifications can be made without departing from the spirit of the present application, and all of them fall within the scope of the present application. Therefore, the protection scope of the present patent should be subject to the appended claims.

Claims (10)

1. A multi-target scene image resolution improving method is characterized by comprising the following steps:
acquiring a multi-target scene image with low resolution;
calling a super-resolution network model based on a DRN architecture; the super-resolution network model comprises a low-resolution to high-resolution forward mapping network and a high-resolution to low-resolution reverse mapping network;
inputting the multi-target scene image with low resolution into the trained super-resolution network model, and generating the multi-target scene image with high resolution by updating an order mapping network in the super-resolution network model;
inputting the low-resolution multi-target scene images and the high-resolution multi-target scene images into a truncated VGG19 network for feature extraction to obtain feature maps of the low-resolution multi-target scene images and the high-resolution multi-target scene images;
substituting the characteristic diagram into a loss function of the super-resolution network model to calculate a mean square error;
and outputting the multi-target scene image with high resolution after the mean square error.
2. The multi-target scene image resolution enhancement method according to claim 1, wherein the training process of the super-resolution network model comprises:
acquiring paired image pairs and inputting the super-resolution network model; the image pair comprises a low resolution image and a high resolution image;
improving the resolution of the low-resolution image through a forward mapping network of the super-resolution network model to obtain a first pseudo high-resolution image with the same size as the high-resolution image;
comparing the first pseudo high-resolution image with the high-resolution image to improve the similarity degree of the first pseudo high-resolution image to the high-resolution image;
and updating model parameters of the forward mapping network by a gradient descent method according to the high-resolution image and the first pseudo high-resolution image with the improved similarity degree, and completing pre-training of the super-resolution network model.
3. The multi-target scene image resolution enhancement method according to claim 2, wherein the training process of the super-resolution network model further comprises:
improving the resolution of the low-resolution image by using a pre-trained forward mapping network of the super-resolution network model to obtain a second pseudo high-resolution image;
reducing the resolution of the second pseudo high-resolution image by adopting an inverse mapping network of the super-resolution network model to obtain a pseudo low-resolution image with the same size as the low-resolution image;
and comparing the pseudo low-resolution image with the low-resolution image, and updating model parameters of an inverse mapping network and a forward mapping network of the super-resolution network model to obtain the trained super-resolution network model.
4. The multi-target scene image resolution enhancement method according to any one of claims 1 to 3, wherein the loss function of the super-resolution network model comprises
Figure 151241DEST_PATH_IMAGE001
And
Figure 599540DEST_PATH_IMAGE002
Figure 395589DEST_PATH_IMAGE003
Figure 56378DEST_PATH_IMAGE004
wherein, the first and the second end of the pipe are connected with each other,
Figure 401908DEST_PATH_IMAGE005
which represents the original loss of the image,
Figure 489950DEST_PATH_IMAGE006
the dual loss is represented by the loss in pairs,
Figure 288142DEST_PATH_IMAGE007
a function representing a process abstraction for feature extraction with a truncated VGG19 network,
Figure 766003DEST_PATH_IMAGE008
representing a real high-resolution image of the scene,
Figure 700461DEST_PATH_IMAGE009
representing a pseudo high-resolution image of the image,
Figure 224983DEST_PATH_IMAGE010
representing a true low-resolution image of the image,
Figure 244892DEST_PATH_IMAGE011
representing a pseudo low resolution image.
5. A multi-target scene image resolution improving device is characterized by comprising:
the image acquisition module is used for acquiring a multi-target scene image with low resolution;
the model calling module is used for calling a super-resolution network model based on the DRN architecture; the super-resolution network model comprises a low-resolution to high-resolution forward mapping network and a high-resolution to low-resolution reverse mapping network;
the super-generation module is used for inputting the multi-target scene images with low resolution into the trained super-resolution network model and generating the multi-target scene images with high resolution by updating an order mapping network in the super-resolution network model;
the feature extraction module is used for inputting the low-resolution multi-target scene images and the high-resolution multi-target scene images into a truncated VGG19 network for feature extraction to obtain feature maps of the low-resolution multi-target scene images and the high-resolution multi-target scene images;
the loss calculation module is used for substituting the characteristic diagram into a loss function of the super-resolution network model to calculate a mean square error;
and the image output module is used for outputting the multi-target scene image with high resolution after the mean square error.
6. The multi-target scene image resolution enhancement device according to claim 5, wherein the training process of the super-resolution network model comprises:
acquiring paired image pairs and inputting the super-resolution network model; the image pair comprises a low resolution image and a high resolution image;
improving the resolution of the low-resolution image through a forward mapping network of the super-resolution network model to obtain a first pseudo high-resolution image with the same size as the high-resolution image;
comparing the first pseudo high-resolution image with the high-resolution image to improve the similarity degree of the first pseudo high-resolution image to the high-resolution image;
and updating model parameters of the forward mapping network by a gradient descent method according to the high-resolution image and the first pseudo high-resolution image with the improved similarity degree, and completing pre-training of the super-resolution network model.
7. The multi-target scene image resolution enhancement apparatus according to claim 6, wherein the training process of the super-resolution network model further comprises:
improving the resolution of the low-resolution image by using a pre-trained sequential mapping network of the super-resolution network model to obtain a second pseudo high-resolution image;
reducing the resolution of the second pseudo high-resolution image by adopting an inverse mapping network of the super-resolution network model to obtain a pseudo low-resolution image with the same size as the low-resolution image;
and comparing the pseudo low-resolution image with the low-resolution image, and updating model parameters of an inverse mapping network and a forward mapping network of the super-resolution network model to obtain the trained super-resolution network model.
8. The multi-target scene image resolution enhancement device according to any one of claims 5 to 7, wherein the loss function of the super-resolution network model comprises
Figure 513062DEST_PATH_IMAGE012
And
Figure 302027DEST_PATH_IMAGE013
Figure 748183DEST_PATH_IMAGE014
Figure 520967DEST_PATH_IMAGE015
wherein the content of the first and second substances,
Figure 327249DEST_PATH_IMAGE016
which represents the original loss of the image,
Figure 236299DEST_PATH_IMAGE017
which represents the loss of the dual-pair,
Figure 102624DEST_PATH_IMAGE018
a function representing a process abstraction for feature extraction with a truncated VGG19 network,
Figure 847857DEST_PATH_IMAGE019
representing a real high-resolution image of the scene,
Figure 457830DEST_PATH_IMAGE020
representing a pseudo-high-resolution image of,
Figure 221386DEST_PATH_IMAGE021
representing a true low-resolution image of the image,
Figure 524192DEST_PATH_IMAGE022
representing a pseudo low resolution image.
9. A computer device comprising a memory and a processor, wherein the memory stores a computer program, and the processor implements the steps of the multi-target scene image resolution enhancement method according to any one of claims 1 to 4 when executing the computer program.
10. A computer-readable storage medium, on which a computer program is stored, wherein the computer program, when being executed by a processor, implements the steps of the multi-target scene image resolution enhancement method according to any one of claims 1 to 4.
CN202211092795.4A 2022-09-08 2022-09-08 Multi-target scene image resolution improving method, device, equipment and medium Pending CN115170399A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211092795.4A CN115170399A (en) 2022-09-08 2022-09-08 Multi-target scene image resolution improving method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211092795.4A CN115170399A (en) 2022-09-08 2022-09-08 Multi-target scene image resolution improving method, device, equipment and medium

Publications (1)

Publication Number Publication Date
CN115170399A true CN115170399A (en) 2022-10-11

Family

ID=83480984

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211092795.4A Pending CN115170399A (en) 2022-09-08 2022-09-08 Multi-target scene image resolution improving method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN115170399A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180075581A1 (en) * 2016-09-15 2018-03-15 Twitter, Inc. Super resolution using a generative adversarial network
CN111583109A (en) * 2020-04-23 2020-08-25 华南理工大学 Image super-resolution method based on generation countermeasure network
US20200357096A1 (en) * 2018-01-25 2020-11-12 King Abdullah University Of Science And Technology Deep-learning based structure reconstruction method and apparatus
CN113643183A (en) * 2021-10-14 2021-11-12 湖南大学 Non-matching remote sensing image weak supervised learning super-resolution reconstruction method and system
CN114862679A (en) * 2022-05-09 2022-08-05 南京航空航天大学 Single-image super-resolution reconstruction method based on residual error generation countermeasure network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180075581A1 (en) * 2016-09-15 2018-03-15 Twitter, Inc. Super resolution using a generative adversarial network
US20200357096A1 (en) * 2018-01-25 2020-11-12 King Abdullah University Of Science And Technology Deep-learning based structure reconstruction method and apparatus
CN111583109A (en) * 2020-04-23 2020-08-25 华南理工大学 Image super-resolution method based on generation countermeasure network
CN113643183A (en) * 2021-10-14 2021-11-12 湖南大学 Non-matching remote sensing image weak supervised learning super-resolution reconstruction method and system
CN114862679A (en) * 2022-05-09 2022-08-05 南京航空航天大学 Single-image super-resolution reconstruction method based on residual error generation countermeasure network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
AI科技大本营: "用于单图像超分辨率的对偶回归网络,达到最新SOTA | CVPR 2020", 《HTTPS://BLOG.CSDN.NET/DQCFKYQDXYM3F8RB0/ARTICLE/DETAILS/105259338》 *
机器学习AI算法工程: "图像超分辨率重建算法,让模糊图像变清晰(附数据和代码)", 《HTTPS://CLOUD.TENCENT.COM/DEVELOPER/ARTICLE/1771908》 *

Similar Documents

Publication Publication Date Title
CN110443768B (en) Single-frame image super-resolution reconstruction method based on multiple consistency constraints
CN107610049B (en) Image super-resolution method based on sparse regularization technology and weighting-guided filtering
CN103871041B (en) The image super-resolution reconstructing method built based on cognitive regularization parameter
CN112669214B (en) Fuzzy image super-resolution reconstruction method based on alternating direction multiplier algorithm
CN111951164B (en) Image super-resolution reconstruction network structure and image reconstruction effect analysis method
US20230215132A1 (en) Method for generating relighted image and electronic device
CN112150354A (en) Single image super-resolution method combining contour enhancement and denoising statistical prior
CN115063318A (en) Adaptive frequency-resolved low-illumination image enhancement method and related equipment
CN116645569A (en) Infrared image colorization method and system based on generation countermeasure network
CN112801914A (en) Two-stage image restoration method based on texture structure perception
Wang et al. No-reference stereoscopic image quality assessment using quaternion wavelet transform and heterogeneous ensemble learning
CN112686830A (en) Super-resolution method of single depth map based on image decomposition
CN116630209A (en) SAR and visible light image fusion method based on cross-mixed attention
CN111105354A (en) Depth image super-resolution method and device based on multi-source depth residual error network
Zhu et al. Low-light image enhancement network with decomposition and adaptive information fusion
CN117197627B (en) Multi-mode image fusion method based on high-order degradation model
CN113920014A (en) Neural-networking-based combined trilateral filter depth map super-resolution reconstruction method
CN109741258A (en) Image super-resolution method based on reconstruction
CN117151983A (en) Image full-color sharpening method based on wavelet heuristics and high-frequency enhancement
DE102017009118A1 (en) IMAGE BALANCE FOR SERIAL PICTURES
CN115170399A (en) Multi-target scene image resolution improving method, device, equipment and medium
CN116823688A (en) Multi-focus image fusion method and system based on multi-scale fuzzy quality evaluation
CN116029908A (en) 3D magnetic resonance super-resolution method based on cross-modal and cross-scale feature fusion
Cosmo et al. Multiple sequential regularized extreme learning machines for single image super resolution
WO2023155298A1 (en) Data augmentation processing method and apparatus, computer device, and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20221011