CN110399881B

CN110399881B - End-to-end quality enhancement method and device based on binocular stereo image

Info

Publication number: CN110399881B
Application number: CN201910624300.XA
Authority: CN
Inventors: 邹文斌; 金枝; 彭映青; 唐毅; 李霞
Original assignee: Shenzhen University
Current assignee: Shenzhen University
Priority date: 2019-07-11
Filing date: 2019-07-11
Publication date: 2021-06-01
Anticipated expiration: 2039-07-11
Also published as: CN110399881A

Abstract

According to the end-to-end quality enhancement method and device based on the binocular stereo images disclosed by the embodiment of the invention, low/high-quality images in the binocular stereo images are respectively input to a feature extraction network for feature extraction, shallow feature maps of the obtained low/high-quality images are respectively input to an information distillation network for information distillation, high feature maps of the obtained low/high-quality images are input to an information fusion network for information fusion, and finally the obtained fusion feature maps are input to a long-short term memory network for learning the visual difference of the stereo images, so that the quality enhancement images of the low-quality images are reconstructed. According to the end-to-end image quality enhancement algorithm, the high-quality image is used for guiding the reconstruction of the low-quality image, the visual difference of the three-dimensional image is learned through the long-term and short-term memory network based on information fusion, the operation speed can be improved, the error transmission is avoided, the rate and the accuracy of image reconstruction are ensured, and the consumption of memory and computing resources is reduced.

Description

End-to-end quality enhancement method and device based on binocular stereo image

Technical Field

The invention relates to the technical field of image processing, in particular to an end-to-end quality enhancement method and device based on binocular stereo images.

Background

In recent years, with the increase of additional information in the visual difference, quality enhancement of stereoscopic images has become an active research area.

Since the pioneering work of Super-Resolution Convolutional Neural networks (SRCNN), a learning-based method has been widely adopted to enhance image quality. Currently, the commonly adopted stereo image enhancement method utilizes stereo matching to learn the corresponding relationship between stereo image pairs, and uses cost to simulate long-term dependence in a network, however, learning the accurate corresponding relationship between stereo image pairs has great difficulty due to the large difference between different viewpoints of stereo image pairs. In addition, a Convolutional Neural Network (CNN) is used as a main method, the size of the input image is adjusted before the input image is sent to the Network, and a deeper recursive Network is adopted to obtain better reconstruction performance, but the method needs a large amount of computing resources and memory consumption.

Disclosure of Invention

The embodiments of the present invention mainly aim to provide an end-to-end quality enhancement method and apparatus based on binocular stereo images, which can at least solve the problems that it is difficult to learn the accurate corresponding relationship between stereo image pairs and a large amount of computing resources and memory consumption are required when the stereo image quality is enhanced in the related art.

In order to achieve the above object, a first aspect of embodiments of the present invention provides an end-to-end quality enhancement method based on binocular stereo images, which is applied to an overall neural network including a first quality enhancement branch network, a second quality enhancement branch network, an information fusion network, and a long-term and short-term memory network, where the first quality enhancement branch network and the second quality enhancement branch network each include a feature extraction network and an information distillation network, and the method includes:

respectively inputting a low-quality image and a high-quality image in a binocular stereo image into the first quality enhancement branch network and the second quality enhancement branch network, and respectively performing feature extraction processing on the input image through the feature extraction network to obtain shallow feature images respectively corresponding to the low-quality image and the high-quality image;

inputting the shallow characteristic diagrams of the low-quality image and the high-quality image into the information distillation network respectively to carry out information distillation processing, so as to obtain high-level characteristic diagrams corresponding to the low-quality image and the high-quality image respectively; wherein the high-level feature map has a larger amount of useful information than the shallow feature map;

inputting the high-level feature maps of the low-quality image and the high-level feature maps of the high-quality image into the information fusion network, and performing information fusion processing on the low-quality image features and the high-quality image features to obtain fusion feature maps;

inputting the fusion feature map into the long-short term memory network, learning position information under the visual fields respectively corresponding to the low-quality image and the high-quality image, and reconstructing to obtain a quality enhanced image of the low-quality image according to the acquired position difference information of the low-quality image and the high-quality image.

In order to achieve the above object, a second aspect of the embodiments of the present invention provides an end-to-end quality enhancement apparatus based on binocular stereo images, applied to an overall neural network including a first quality enhancement branch network, a second quality enhancement branch network, an information fusion network, and a long-term and short-term memory network, where the first quality enhancement branch network and the second quality enhancement branch network each include a feature extraction network and an information distillation network, the apparatus including:

the extraction module is used for respectively inputting a low-quality image and a high-quality image in a binocular stereo image into the first quality enhancement branch network and the second quality enhancement branch network, and respectively performing feature extraction processing on the input image through the feature extraction network to obtain shallow feature maps respectively corresponding to the low-quality image and the high-quality image;

the distillation module is used for respectively inputting the shallow characteristic diagrams of the low-quality image and the high-quality image into the information distillation network for information distillation processing to obtain high-level characteristic diagrams respectively corresponding to the low-quality image and the high-quality image; wherein the high-level feature map has a larger amount of useful information than the shallow feature map;

the fusion module is used for inputting the high-level feature maps of the low-quality image and the high-quality image into the information fusion network and performing information fusion processing on the low-quality image feature and the high-quality image feature to obtain fusion feature maps;

and the reconstruction module is used for inputting the fusion feature map into the long-term and short-term memory network, learning position information under the visual fields respectively corresponding to the low-quality image and the high-quality image, and reconstructing to obtain a quality enhanced image of the low-quality image according to the acquired position difference information of the low-quality image and the high-quality image.

According to the end-to-end quality enhancement method and device based on the binocular stereo images, provided by the embodiment of the invention, the low-quality images and the high-quality images in the binocular stereo images are respectively input to the feature extraction network to carry out feature extraction processing on the input images, the obtained shallow feature images corresponding to the low-quality images and the high-quality images are respectively input to the information distillation network to carry out information distillation processing, the obtained high feature images corresponding to the low-quality images and the high-quality images are input to the information fusion network to carry out information fusion processing, the obtained fusion feature images are input to the long-short term memory network to learn the visual difference of the stereo images, and the quality enhancement images of the input low-quality images are reconstructed. According to the end-to-end image quality enhancement algorithm, the reconstruction of low-quality images is guided by using high-quality images, the visual difference of the three-dimensional images is learned through a long-term and short-term memory network based on information fusion, the operation speed can be greatly improved, the error transmission of multi-stage processing is avoided, the image reconstruction rate and accuracy are ensured, and the consumption of memory and calculation resources is reduced.

Other features and corresponding effects of the present invention are set forth in the following portions of the specification, and it should be understood that at least some of the effects are apparent from the description of the present invention.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a schematic diagram of a network framework of an overall neural network according to a first embodiment of the present invention;

fig. 2 is a schematic basic flow chart of a quality enhancement method according to a first embodiment of the present invention;

FIG. 3 is a schematic view of a basic flow chart of an information distillation processing method according to a first embodiment of the present invention;

fig. 4 is a schematic diagram of a network architecture of a DBlock according to a first embodiment of the present invention;

fig. 5 is a schematic diagram of a network architecture of an LDBlock according to a first embodiment of the present invention;

FIG. 6 is a schematic structural diagram of a quality enhancing apparatus according to a second embodiment of the present invention;

fig. 7 is a schematic structural diagram of an electronic device according to a third embodiment of the invention.

Detailed Description

In order to make the objects, features and advantages of the present invention more obvious and understandable, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The first embodiment:

in order to solve the technical problems that it is difficult to learn the exact corresponding relationship between stereo image pairs and a large amount of computing resources and memory consumption are required when stereo image quality enhancement is performed in the related art, the present embodiment provides an end-to-end quality enhancement method based on binocular stereo images, which is applied to an overall neural network including a first quality enhancement branch network, a second quality enhancement branch network, an information fusion network, and a long-term and short-term memory network, where the first quality enhancement branch network and the second quality enhancement branch network both include a feature extraction network and an information distillation network, as shown in fig. 1, a network framework diagram of the overall neural network provided by the present embodiment is shown, in the diagram, a is the first quality enhancement branch network, B is the second quality enhancement branch network, and in the network framework of the present embodiment, feature extraction and information distillation networks of dual-flow high/low-quality images are adopted, the abundant additional information brought by the binocular stereo image is efficiently utilized.

As shown in fig. 2, a basic flow diagram of the quality enhancement method provided in this embodiment is shown, and the quality enhancement method provided in this embodiment includes the following steps:

step 201, respectively inputting a low-quality image and a high-quality image in a binocular stereo image to a first quality enhancement branch network and a second quality enhancement branch network, and respectively performing feature extraction processing on the input images through a feature extraction network to obtain shallow feature maps respectively corresponding to the low-quality image and the high-quality image.

Specifically, in the binocular stereo image of the present embodiment, the image quality of different views is different, one view corresponds to a low-quality image, the other view corresponds to a high-quality image, the low-quality image and the high-quality image are respectively used as the input of the feature extraction network of one of the quality enhancement branch networks for feature extraction, in the present embodiment, the feature extraction network may include two convolution kernels of 3 × 3, and the output is a 64-dimensional feature map.

Step 202, inputting the shallow characteristic diagrams of the low-quality image and the high-quality image into an information distillation network respectively for information distillation processing to obtain high-level characteristic diagrams corresponding to the low-quality image and the high-quality image respectively; the high-level feature map has a larger amount of useful information than the shallow feature map.

Specifically, in this embodiment, the feature map output by the feature extraction network is further input to the information distillation network, and the extracted feature map has more useful information than the feature map output by the feature extraction network. In practical applications, in order to further ensure that the extracted feature map has as much useful information as possible, the information distillation network in this embodiment may be composed of a plurality of identical information distillation sub-networks in cascade.

As shown in fig. 3, which is a schematic flow chart of the information distillation processing method provided in this embodiment, optionally, the information distillation network includes: an image quality enhancement network and a compression network; the method specifically comprises the following steps of inputting the shallow characteristic diagrams of the low-quality image and the high-quality image into an information distillation network respectively for information distillation processing, and obtaining the high-level characteristic diagrams respectively corresponding to the low-quality image and the high-quality image:

step 301, inputting shallow feature maps of a low-quality image and a high-quality image into an image quality enhancement network respectively to perform information enhancement processing, so as to obtain quality-enhanced feature maps;

and step 302, inputting the quality-enhanced feature map into a compression network for information compression processing to obtain a high-level feature map corresponding to the low-quality image and the high-quality image respectively.

Specifically, in this embodiment, the information distillation network is composed of two modules, namely, an image quality enhancement network and a compression network, wherein the image quality enhancement network is used to enhance the information of the image, so as to increase the useful information in the feature map; then, the information compression is carried out through a compression network, wherein the compression network can adopt a convolution kernel of 1 multiplied by 1 to play the roles of reducing dimensionality and reducing parameters, and in addition, the nonlinear characteristic of the characteristic diagram can be greatly increased on the premise of keeping the size of the characteristic diagram unchanged.

Optionally, the first quality enhancement branch network and the second quality enhancement branch network further include: an attention mechanism network; before the high-level feature maps of the low-quality image and the high-quality image are input into the information fusion network, the method further comprises the following steps: and respectively inputting the high-level feature maps of the low-quality image and the high-level feature maps of the high-quality image into an attention mechanism network, selectively learning the features in the high-level feature maps of the low-quality image and the high-quality image, and obtaining the high-level feature maps of the low-quality image and the high-level image from which more useful information is extracted.

Specifically, in this embodiment, after the high-level feature map is output through the information distillation network, in order to extract more key and useful information, an attention mechanism is further added on the basis of the information distillation network, and more weights are assigned to the useful key features in the extracted feature map to obtain more detailed information of the target that needs to be focused, so as to focus on learning more useful features. It should be noted that the use of force mechanism networks for configuration within or outside of the information distillation network allows for more characterization features to be learned. The information distillation network is formed by one or more information distillation networks through cascade operation, and an attention mechanism network is configured behind each information distillation network or on an image quality enhancement unit inside each information distillation network. In this embodiment, the configuration of the attention mechanism network is not limited uniquely, and may be determined according to specific requirements in practical applications.

It should be noted that, in the present embodiment, the process including the feature extraction operation, the information distillation operation, and the attention mechanism operation can be expressed as: d_i＝C(D_i-1(f (x))), i ═ 1, …, n, and P_i＝P(D_i) Where x denotes an input image, i.e., a low-quality image and a high-quality image, which are input to the respective feature extraction networks, respectively, f denotes a feature extraction operation, D_iRepresents the function of the ith information distillation network, C represents the attention mechanism operation, and P represents the compression operation.

Optionally, the network depth and the network width of the information distillation network in the first quality enhancement branch network are greater than those of the information distillation network in the second quality enhancement branch network.

Specifically, since a low-quality image requires a deeper network to learn more features than a high-quality image, in the present embodiment, a long information distillation module (LDBlock) is used on a first quality enhancement branch network that performs low-quality image processing to extract useful information, and an information distillation module (DBlock) is used on a second quality enhancement branch network that performs high-quality image processing, wherein the network becomes deeper and wider than the information distillation module, and the abstraction level of the extractable information is higher.

Fig. 4 is a schematic diagram of a network architecture of a DBlock provided in this embodiment, and fig. 5 is a schematic diagram of a network architecture of an LDBlock provided in this embodiment, where S denotes a branch operation, C denotes a cascade between channels, and a Channel-wise denotes an attention mechanism network, and the LDBlock and the DBlock in this embodiment both include a plurality of sub-networks. The embodiment adopts the stacked convolution operation to extract more information after segmenting the feature map by the deeper and wider LDBlock compared with DBlock. In order to reduce the parameters of the network, the present embodiment employs packet convolution (the number of groups is 4) at the second layer convolution layer of each sub-network. In addition, the LDBlock and DBlock in this embodiment all use filters of 3 × 3 size, which increases the representation capability and reduces the parameter data, thereby enhancing the calculation capability. In addition, the present embodiment adaptively learns the feature map using an attention mechanism embedded in the information distilling module in consideration of the interrelation of features between channels.

And 203, inputting the high-level feature maps of the low-quality image and the high-level feature map of the high-quality image into the information fusion network, and performing information fusion processing on the low-quality image features and the high-quality image features to obtain a fusion feature map.

Specifically, in this embodiment, after feature extraction is performed on a low-quality image and a high-quality image respectively through a dual-flow network formed by a first quality enhancement branch network and a second quality enhancement branch network, the extracted features are fused through an information fusion network. The information fusion process in the present embodiment can be expressed as follows: f₀＝F(I_low+I_high) Wherein, I_lowOutput of the information distillation network representing the first quality-enhancing branch network, I_highAn output of the information distillation network representing the second quality-enhanced branch network, F represents an information fusion operation of the high-quality image feature and the low-quality image feature, F₀Representing the output of the information fusion network. Furthermore, it should be understood that if the attention mechanism operation is performed after the information distilling operation, the high-level feature maps of the low-quality image and the high-quality image, which are extracted to more useful information, output by the attention mechanism network should be input to the information fusion network correspondingly.

And step 204, inputting the fusion feature map into a long-term and short-term memory network, learning position information under the visual fields respectively corresponding to the low-quality image and the high-quality image, and reconstructing to obtain a quality enhanced image of the low-quality image according to the acquired position difference information of the low-quality image and the high-quality image.

Specifically, in this embodiment, using the similarity of features, a Long Short-Term Memory network (LSTM) is used to learn the corresponding information of the positions in different fields of view, and the disparity change between two viewpoints in a stereo image pair is processed, so that the left and right images of the binocular stereo image maintain the consistency of the positions. In order to improve the utilization rate of the features, the embodiment combines the feature map before entering the LSTM, and guides the low-quality image to perform quality enhancement reconstruction through the high-quality image, so that the high-quality image can be reconstructed more effectively. In the present embodiment, the image reconstruction operation can be expressed as follows: y ═ L (F)₀)+F₀Where y is the final low quality image enhanced image, L represents the LSTM function, and F₀Representing the output of the information fusion network.

Furthermore, in an alternative implementation of this embodiment, the loss function of the overall neural network is expressed as: loss ═ n₁(Loss)+n₂(L_lstm) (ii) a Wherein the content of the first and second substances,

wherein Loss is the Loss function of the overall neural network, L_lstmAs a loss function of the long-short term memory network, n₁And n₂Respectively, the weight hyperparameter corresponding to the loss function, alpha is the parameter of the whole neural network, F is the predicted image generated by the whole neural network, I_lowFor low quality images, I_highFor high quality images, I_GTFor high quality images before compression of low quality images, I_lstmThe image is enhanced for the quality of the low quality image.

Specifically, the network of the present embodiment is optimized by a loss function, and the present embodiment designs two loss functions, including a total loss function of the whole network and a loss function of the LSTM network, wherein the total loss function is used to measure a difference between a predicted image of a low-quality image and a corresponding high-quality image before compression of the low-quality image, and the loss function of the LSTM network is used to optimize a difference between positions of the low-quality image and the high-quality image.

According to the end-to-end quality enhancement method based on the binocular stereo images, provided by the embodiment of the invention, the low-quality images and the high-quality images in the binocular stereo images are respectively input to the feature extraction network to carry out feature extraction processing on the input images, the obtained shallow feature maps corresponding to the low-quality images and the high-quality images are respectively input to the information distillation network to carry out information distillation processing, the obtained high feature maps corresponding to the low-quality images and the high-quality images are input to the information fusion network to carry out information fusion processing, the obtained fusion feature maps are input to the long-short term memory network to learn the visual difference of the stereo images, and the quality enhancement images of the input low-quality images are reconstructed. According to the end-to-end image quality enhancement algorithm, the reconstruction of low-quality images is guided by using high-quality images, the visual difference of the three-dimensional images is learned through a long-term and short-term memory network based on information fusion, the operation speed can be greatly improved, the error transmission of multi-stage processing is avoided, the image reconstruction rate and accuracy are ensured, and the consumption of memory and calculation resources is reduced.

Second embodiment:

in order to solve the technical problems that it is difficult to learn the accurate corresponding relationship between stereo image pairs and a large amount of computing resources and memory consumption are required when stereo image quality enhancement is performed in the related art, this embodiment shows an end-to-end quality enhancement device based on binocular stereo images, which is applied to an overall neural network including a first quality enhancement branch network, a second quality enhancement branch network, an information fusion network, and a long-term and short-term memory network, where the first quality enhancement branch network and the second quality enhancement branch network both include a feature extraction network and an information distillation network, and referring to fig. 6 specifically, the quality enhancement device of this embodiment includes:

the extracting module 601 is configured to input a low-quality image and a high-quality image in a binocular stereo image to the first quality enhancement branch network and the second quality enhancement branch network, and perform feature extraction processing on the input images through the feature extraction networks, so as to obtain shallow feature maps corresponding to the low-quality image and the high-quality image;

a distilling module 602, configured to input the shallow feature maps of the low-quality image and the high-quality image to an information distilling network respectively for information distilling processing, so as to obtain high-level feature maps corresponding to the low-quality image and the high-quality image respectively; wherein the useful information content of the high-level characteristic diagram is more than that of the shallow-level characteristic diagram;

the fusion module 603 is configured to input both the high-level feature maps of the low-quality image and the high-level feature maps of the high-quality image into the information fusion network, and perform information fusion processing on the low-quality image features and the high-quality image features to obtain fusion feature maps;

and a reconstructing module 604, configured to input the fusion feature map into a long-term and short-term memory network, learn position information under the fields of view respectively corresponding to the low-quality image and the high-quality image, and reconstruct a quality-enhanced image of the low-quality image according to the acquired position difference information of the low-quality image and the high-quality image.

In an alternative embodiment of this embodiment, the information distillation network comprises: image quality enhancement networks and compression networks. Correspondingly, the distillation module 602 is specifically configured to input the shallow feature maps of the low-quality image and the high-quality image to an image quality enhancement network for information enhancement processing, so as to obtain quality-enhanced feature maps; and inputting the feature map with enhanced quality into a compression network for information compression processing to obtain high-level feature maps respectively corresponding to the low-quality image and the high-quality image.

In an optional implementation manner of this embodiment, the first quality enhancement branch network and the second quality enhancement branch network further include: attention mechanism network. The quality enhancing apparatus of the present embodiment further comprises: a learning module; the learning module is used for respectively inputting the high-level feature maps of the low-quality image and the high-level feature maps of the high-quality image into the attention mechanism network before inputting the high-level feature maps of the low-quality image and the high-level image into the information fusion network, selectively learning the features in the high-level feature maps of the low-quality image and the high-quality image, and obtaining the high-level feature maps of the low-quality image and the high-level image which extract more useful information. Correspondingly, the fusion module 603 is specifically configured to input both the low-quality image from which more useful information is extracted and the high-level feature map of the high-quality image to the information fusion network, and perform information fusion processing on the low-quality image features and the high-quality image features to obtain a fusion feature map.

In an optional implementation manner of this embodiment, the network depth and the network width of the information distillation network in the first quality enhancement branch network are greater than those of the information distillation network in the second quality enhancement branch network.

Further, in an alternative implementation of this embodiment, the loss function of the overall neural network is expressed as: loss ═ n₁(Loss)+n₂(L_lstm) (ii) a Wherein the content of the first and second substances,

It should be noted that, the end-to-end quality enhancement method based on the binocular stereo image in the foregoing embodiment can be implemented based on the end-to-end quality enhancement device based on the binocular stereo image provided in this embodiment, and it can be clearly understood by those skilled in the art that, for convenience and simplicity of description, the specific working process of the end-to-end quality enhancement device based on the binocular stereo image described in this embodiment may refer to the corresponding process in the foregoing method embodiment, and is not described herein again.

By using the end-to-end quality enhancement device based on the binocular stereo image provided by this embodiment, the low-quality image and the high-quality image in the binocular stereo image are respectively input to the feature extraction network to perform feature extraction processing on the input image, the obtained shallow feature maps corresponding to the low-quality image and the high-quality image are respectively input to the information distillation network to perform information distillation processing, the obtained high feature maps corresponding to the low-quality image and the high-quality image are input to the information fusion network to perform information fusion processing, and the obtained fusion feature maps are input to the long-short term memory network to learn the visual difference of the stereo image, so as to reconstruct the quality enhanced image of the input low-quality image. According to the end-to-end image quality enhancement algorithm, the reconstruction of low-quality images is guided by using high-quality images, the visual difference of the three-dimensional images is learned through a long-term and short-term memory network based on information fusion, the operation speed can be greatly improved, the error transmission of multi-stage processing is avoided, the image reconstruction rate and accuracy are ensured, and the consumption of memory and calculation resources is reduced.

The third embodiment:

the present embodiment provides an electronic apparatus, as shown in fig. 7, which includes a processor 701, a memory 702, and a communication bus 703, wherein: the communication bus 703 is used for realizing connection communication between the processor 701 and the memory 702; the processor 701 is configured to execute one or more computer programs stored in the memory 702 to implement at least one step of the end-to-end binocular stereo image based quality enhancement method in the first embodiment.

The present embodiments also provide a computer-readable storage medium including volatile or non-volatile, removable or non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, computer program modules or other data. Computer-readable storage media include, but are not limited to, RAM (Random Access Memory), ROM (Read-Only Memory), EEPROM (Electrically Erasable Programmable Read-Only Memory), flash Memory or other Memory technology, CD-ROM (Compact disk Read-Only Memory), Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by a computer.

The computer-readable storage medium in this embodiment may be used for storing one or more computer programs, and the stored one or more computer programs may be executed by a processor to implement at least one step of the method in the first embodiment.

The present embodiment also provides a computer program, which can be distributed on a computer readable medium and executed by a computing device to implement at least one step of the method in the first embodiment; and in some cases at least one of the steps shown or described may be performed in an order different than that described in the embodiments above.

The present embodiments also provide a computer program product comprising a computer readable means on which a computer program as shown above is stored. The computer readable means in this embodiment may include a computer readable storage medium as shown above.

It will be apparent to those skilled in the art that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software (which may be implemented in computer program code executable by a computing device), firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed by several physical components in cooperation. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit.

In addition, communication media typically embodies computer readable instructions, data structures, computer program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to one of ordinary skill in the art. Thus, the present invention is not limited to any specific combination of hardware and software.

The foregoing is a more detailed description of embodiments of the present invention, and the present invention is not to be considered limited to such descriptions. For those skilled in the art to which the invention pertains, several simple deductions or substitutions can be made without departing from the spirit of the invention, and all shall be considered as belonging to the protection scope of the invention.

Claims

1. An end-to-end quality enhancement method based on binocular stereo images is applied to an overall neural network comprising a first quality enhancement branch network, a second quality enhancement branch network, an information fusion network and a long-term and short-term memory network, wherein the first quality enhancement branch network and the second quality enhancement branch network respectively comprise a feature extraction network and an information distillation network, and the quality enhancement method is characterized by comprising the following steps of:

2. The quality enhancement method of claim 1, wherein the information distillation network comprises: an image quality enhancement network and a compression network;

the step of inputting the shallow feature maps of the low-quality image and the high-quality image into the information distillation network respectively to perform information distillation processing to obtain the high-level feature maps corresponding to the low-quality image and the high-quality image respectively comprises:

respectively inputting the shallow characteristic diagrams of the low-quality image and the high-quality image into the image quality enhancement network for information enhancement processing to obtain quality-enhanced characteristic diagrams;

and inputting the quality-enhanced feature map into the compression network for information compression processing to obtain high-level feature maps respectively corresponding to the low-quality image and the high-quality image.

3. The quality enhancement method of claim 1, wherein the first quality enhancement branch network and the second quality enhancement branch network further comprise: an attention mechanism network;

before the high-level feature maps of the low-quality image and the high-level feature maps of the high-quality image are input into the information fusion network, the method further comprises the following steps:

respectively inputting the high-level feature maps of the low-quality image and the high-level feature maps of the high-quality image into the attention mechanism network, selectively learning the features in the high-level feature maps of the low-quality image and the high-quality image, and obtaining the high-level feature maps of the low-quality image and the high-level feature maps of the high-quality image, which extract more useful information;

the inputting the high-level feature maps of the low-quality image and the high-quality image into the information fusion network comprises:

and inputting the high-level characteristic graphs of the low-quality image and the high-quality image which extract more useful information into the information fusion network.

4. The quality enhancement method of claim 1, wherein a network depth and a network width of the information distillation network in the first quality enhancement branch network are greater than the information distillation network in the second quality enhancement branch network.

5. An end-to-end quality enhancement device based on binocular stereo images is applied to an overall neural network comprising a first quality enhancement branch network, a second quality enhancement branch network, an information fusion network and a long-term and short-term memory network, wherein the first quality enhancement branch network and the second quality enhancement branch network respectively comprise a feature extraction network and an information distillation network, and the quality enhancement device is characterized by comprising:

6. The quality enhancement device of claim 5, wherein the information distillation network comprises: an image quality enhancement network and a compression network;

the distillation module is specifically used for respectively inputting the shallow layer characteristic diagrams of the low-quality image and the high-quality image into the image quality enhancement network for information enhancement processing to obtain quality-enhanced characteristic diagrams; and inputting the quality-enhanced feature map into the compression network for information compression processing to obtain high-level feature maps respectively corresponding to the low-quality image and the high-quality image.

7. The quality enhancement apparatus of claim 5, wherein the first quality enhancement branch network and the second quality enhancement branch network further comprise, respectively: an attention mechanism network; the quality enhancement device further comprises: a learning module;

the learning module is used for respectively inputting the high-level feature maps of the low-quality image and the high-level feature map of the high-quality image into the attention mechanism network before inputting the high-level feature maps of the low-quality image and the high-quality image into the information fusion network, selectively learning the features in the high-level feature maps of the low-quality image and the high-quality image, and obtaining the high-level feature maps of the low-quality image and the high-quality image with more useful information extracted;

the fusion module is specifically used for inputting the high-level feature maps of the low-quality image and the high-level feature map of the high-quality image, which extract more useful information, into the information fusion network, and performing information fusion processing on the low-quality image features and the high-quality image features to obtain a fusion feature map.

8. The quality enhancement apparatus of claim 5, wherein a network depth and a network width of the information distillation network in the first quality enhancement branch network are greater than the information distillation network in the second quality enhancement branch network.