CN113034358A

CN113034358A - Super-resolution image processing method and related device

Info

Publication number: CN113034358A
Application number: CN201911252760.0A
Authority: CN
Inventors: 林焕; 陈濛; 周琛晖
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2019-12-09
Filing date: 2019-12-09
Publication date: 2021-06-25
Also published as: WO2021115242A1

Abstract

The application discloses a super-resolution image processing method and device, and a device running the super-resolution image processing method, wherein in the process of generating a super-resolution image according to a low-resolution image, a detail-enriched image block is generated according to the low-resolution image, and the low-resolution image is divided into detail-enriched image blocks with smaller sizes. Secondly, similar image blocks are determined through the detail-rich image blocks, so that when the equipment performs super-resolution processing on a low-resolution image, the similar image blocks can be introduced to perform the super-resolution processing together. The size of the image block with rich details is small, so that the operation amount of the equipment can be reduced. And the similar image blocks are used as reference images of the low-resolution images so as to improve the definition of the images after the super-division processing.

Description

Super-resolution image processing method and related device

Technical Field

The present application relates to the field of artificial intelligence, and in particular, to a super-resolution image processing method and related apparatus.

Background

When the terminal device plays the video, the image quality of the image finally displayed by the terminal device is often poor due to various reasons such as video files or network bandwidth. To solve such problems, Super Resolution (SR) technology has been developed. The super-resolution technique is to use one or more low-resolution (LR) images to obtain a clear high-resolution (HR) image, and the above processing procedure is referred to as super-resolution processing.

In the prior art, a video file is usually subjected to super-division processing in a server (cloud), and a processing result is sent to a terminal device (mobile terminal). And the image displayed by the terminal equipment is the image after the server is subjected to the super-separation processing. According to the scheme, a large amount of data interaction is required to be carried out between the server and the terminal equipment, for example, the terminal equipment needs to send the video file to the server, and the video file is sent back to the terminal equipment after being subjected to the super-division processing by the server, so that the process occupies a large amount of network bandwidth. Therefore, a scheme for performing the super-separation processing at the terminal device is now proposed.

However, in the current scheme of performing the super-resolution processing on the terminal device, a large amount of computing resources are required to be occupied in order to ensure the definition of the image after the super-resolution processing. Therefore, a super-resolution image processing method which occupies less computational resources and can ensure the definition of the super-resolution processed image is needed.

Disclosure of Invention

The embodiment of the application provides a super-resolution image processing method and a related device, which can reduce the operation amount of equipment running the super-resolution image processing method and improve the definition of an image after super-resolution processing.

In a first aspect, an embodiment of the present application provides a super-resolution image processing method, which may include: the detail-rich image blocks and the detail-poor image blocks are generated from a low-resolution image, wherein the detail-rich image blocks are smaller in size than the low-resolution image, and the source of the low-resolution image may be a media file, and specifically may be any encoded frame file in a video file, such as a key frame or other frames (P-frame or B-frame, etc.). The number of image feature information comprised by the detail-rich image blocks is larger than the number of image feature information comprised by the detail-non-rich image blocks, for example: the color information of the image comprised by the detail-rich image block is larger than the color information of the image comprised by the detail-non-rich image block. The detail-rich image block comprises three colors of red (red, R), green (green, G) and blue (blue, B), wherein a channel corresponding to red is represented as (225, 0, 0), a channel corresponding to green is represented as (0, 225, 0), and a channel corresponding to blue is represented as (0, 0, 225). Illustratively, the detail-less image blocks may be expressed as (R, G, B) ═ one color of (225, 0, 0), and it is apparent that the number of image feature information (color information of the image) included in the detail-less image blocks is larger than the number of image feature information (color information of the image) included in the detail-less image blocks; determining similar image blocks according to the detail-rich image blocks, where the similar image blocks may be images with image features having a higher degree of Similarity to the detail-rich image blocks, and the Similarity (Similarity) between the image feature information included in the similar image blocks and the image feature information included in the detail-rich image blocks is greater than a first threshold, for example: taking the image feature information as the color information of the image as an example, if the detail-rich image block includes (R, G, B) ═ 225, 0, 0) (0, 225, 0) (0, 0, 225), and the color information of the image included in any image block is (R, G, B) ═ 223, 0, 0) (0, 223, 0) (0, 0, 223), the similarity between the color information of the image of any image block and the color information of the image included in the detail-rich image block is greater than a first threshold value through calculation, then it is determined that any image block is a similar image block; and performing super-resolution processing on the similar image block and the low-resolution image to generate a super-resolution image, wherein the similar image block is used as a reference image of the low-resolution image. In an optional implementation manner, the similar image block and the low-resolution image are subjected to super-resolution processing by a first super-resolution network model to generate a super-resolution image.

In the embodiment of the application, in the process of generating the super-resolution image according to the low-resolution image, firstly, detail-rich image blocks and detail-poor image blocks are generated according to the low-resolution image, the low-resolution image is split into the detail-rich image blocks with smaller sizes, and the detail-rich image blocks are used as reference images of the low-resolution image, so that the computation amount of equipment for operating the super-resolution image processing method is reduced. Secondly, similar image blocks are determined through the detail-enriched image blocks, so that when the device carries out super-resolution processing on a low-resolution image through the first super-resolution network model, the similar image blocks can be introduced to carry out the super-resolution processing together. Because the image feature information included in the detail-rich image block is more, and the similarity between the image feature information included in the similar image block and the image feature information included in the detail-rich image block is higher (namely, higher than the first threshold), the image feature information included in the similar image block can be considered to be more, and the similar image block is used as a reference image of the low-resolution image, so that the definition of the super-resolution image can be effectively improved.

With reference to the first aspect, in some implementations, generating the detail-rich image block from the low-resolution image may include: and generating the image block set according to the low-resolution image, wherein the image block set comprises at least one low-resolution image block. Specifically, the method comprises the following steps. After the low-resolution image is acquired, the low-resolution image is firstly divided into low-resolution image blocks with smaller sizes, and the low-resolution image blocks form an image block set. Specifically, the low-resolution image is divided into low-resolution image blocks with 32 pixels high and 32 pixels wide, and the specific size of the low-resolution image block is determined by actual requirements (i.e. requirements of a subsequent neural network model), which is not limited herein; and processing the low-resolution image block through a first network model to determine the detail-rich image block and the detail-poor image block, wherein the first network model can be a classification network (regression network) and specifically comprises a plurality of convolutional layers and at least one softmax layer. First, by splitting the low resolution image into smaller sized low resolution image blocks, these low resolution image blocks constitute a set of image blocks. The image blocks in the set of image blocks (low resolution image blocks) are then processed by the first network model to determine detail-rich image blocks and detail-poor image blocks. The operation amount of determining the detail-rich image blocks is reduced by splitting the low-resolution image. The first network model may be obtained locally through machine learning training, or may be sent locally after being obtained through training on a remote device, for example, a cloud server.

With reference to the first aspect, in some implementations, generating detail-rich image blocks and detail-poor image blocks from a low-resolution image may include: and performing convolution processing on the low-resolution image block through the first network model to generate a first convolution data set. Since the first network model needs to classify the low-resolution image blocks, the convolution processing is performed on the low-resolution image blocks by using (through) the first network model, and the generated image feature data set corresponding to the low-resolution image blocks is only an output result of the preliminary convolution processing. Further convolution processing needs to be performed on the image feature data set corresponding to the low-resolution image block through the first network model. The convolution processing result output by the convolution processing is used as source data of subsequent classification processing. The convolution processing result is called a first convolution data set; and after a first convolution data set is generated, carrying out secondary classification processing on the first convolution data set through the first network model, and determining the detail-rich image blocks and the detail-poor image blocks. The first volume data set is input to the softmax layer for a second classification process to determine which image blocks of the set of image blocks are detail-rich image blocks and which image blocks are detail-poor image blocks. The criteria for a particular classification may be: the first volume data set comprises a feature map of a plurality of image blocks, and the feature map is used for indicating image feature information of the image blocks. When the feature map of an image block shows that the image block has no outline (e.g., a blue sky background), the softmax layer outputs "0" corresponding to the image block to indicate that the image block is an image block with less detail. By the method, the method for determining the image blocks with rich details and the image blocks with poor details is provided, the image feature information can be extracted for different types of image blocks, and the image blocks are classified based on the image feature information (feature map) of each image block so as to determine which image blocks are image blocks with rich details and which image blocks are image blocks with poor details, and the implementation flexibility of the scheme is improved.

With reference to the first aspect, in some implementations, performing convolution processing on the low-resolution image block to generate the first volume data set may include: and dividing the low-resolution image, generating an image block set (the image block set comprises low-resolution image blocks), and performing convolution processing on the low-resolution image blocks in the image block set through a first network model. And outputting a convolution layer processing result by the first network model, wherein the result is called an image feature data set corresponding to the low-resolution image block, and the image feature data set corresponding to the low-resolution image block comprises a feature map of the low-resolution image block. Optionally, the first network model includes a plurality of convolutional layers (more than 2 layers), which can extract a feature map of the image, for example, extract edge information of the image, outline information of the image, brightness information of the image and/or color information of the image, and so on; and performing convolution processing on the image characteristic data set corresponding to the low-resolution image block through the first network model to generate the first convolution data set. By the mode, in the process of carrying out convolution processing on the low-resolution image blocks by the first network model, the feature maps of the images can be extracted by the convolution layer of the first network model, so that the feature maps of the low-resolution image blocks can be output for use in subsequent super-resolution image processing, and the definition of the super-resolution images is improved.

With reference to the first aspect, in some implementations, the determining the similar image block according to the detail-rich image block may include: and determining an image characteristic data set corresponding to the detail-rich image block in the image characteristic data set corresponding to the low-resolution image block according to the detail-rich image block. After determining which image blocks in the low-resolution image blocks are detail-rich image blocks, the terminal device determines which feature maps are corresponding to the detail-rich image blocks in the image feature data set corresponding to the low-resolution image blocks output by the first network model according to the detail-rich image blocks. The feature maps corresponding to the detail-rich image blocks are collectively called an image feature data set corresponding to the detail-rich image blocks; and carrying out binarization processing on the image feature data set corresponding to the detail-rich image blocks, and calculating to obtain the similarity of any two image blocks in the detail-rich image blocks. After the image feature data set (i.e., the feature map of the detail-rich image block) corresponding to the detail-rich image block is determined, the similarity between any two image blocks in the detail-rich image block is calculated for facilitating subsequent calculation. Firstly, binarization processing needs to be carried out on a feature map in an image feature data set corresponding to an image block with rich details. The binarization processing specifically refers to a process of setting the characteristic value of each pixel point on the image to be 0 or 1, that is, the whole image presents an obvious black-and-white effect. The method comprises the steps that binarization processing can be generally carried out by using OpenCV or matlab, and after binarization data of a feature map in an image feature data set corresponding to a detail-rich image block are obtained, the similarity of any two image blocks in the detail-rich image block is obtained by using calculation; and finally, when the similarity of any two image blocks is greater than the first threshold, determining the similar image block according to the similarity. By means of binarization processing of the feature map and calculation of similarity by using an XOR matching algorithm, under the condition of occupying less calculation resources, more accurate similarity of the image blocks can be obtained, and matching precision of similar image blocks is improved.

With reference to the first aspect, in some implementations, the similarity of any two image blocks satisfies:

wherein F is the similarity, N is the image size of the detail-rich image block, P (i, j) and Q (i, j) are the feature maps of the image blocks corresponding to any two detail-rich image blocks, respectively, i is an abscissa value of the feature map pixel of the image block, and j is an ordinate value of the feature map pixel of the image block. By the method, the specific implementation method for determining the similar image blocks is provided, and the implementation flexibility of the scheme is improved.

With reference to the first aspect, in some implementations, determining the similar image block according to the detail-rich image block may include: when the low-resolution image is one frame (first image frame) in the video, the position of the low-resolution image in the video, that is, the position information of the first image frame in the video is determined. For example, the acquired low resolution image is from a video file. First, a low-resolution image corresponding to the detail-rich image block is determined. Secondly, determining the position of the low-resolution image in the video, specifically as follows: since a certain image frame (such as a first image frame) in a video file is decoded to obtain the low-resolution image, the position information of the first image frame corresponding to the low-resolution image in the video file is determined according to the low-resolution image. For example, a low-resolution image corresponding to a certain detail-rich image block is determined, and the position information of a first image frame in a video file is a 10 th frame; because the video file has coherence and the picture similarity of adjacent frames is higher, when a low-resolution image corresponding to a certain detail-rich image block is determined, the low-resolution image is positioned in the video file. And searching the position information of the first image frame, and determining one or more image frames in a certain range before and after the first image frame as a second image frame. In an optional implementation manner, any one image block is selected from the image obtained by decoding the second image frame, any one image block with rich details is selected, the two image blocks are used for calculation, and which image block in the image file corresponding to the second image frame is a similar image block is determined; in another alternative implementation, the image block in the second image frame corresponding to the detail-rich image block is determined to be the similar image block. Specifically, according to the coordinates of the detail-rich image blocks in the low-resolution image, the image blocks are taken as similar image blocks at the same coordinate positions of the image file. By the method, similar image blocks can be obtained from adjacent frames of the same video file, and the occupancy rate of computing resources can be reduced on the premise of ensuring the definition of the super-resolution image.

With reference to the first aspect, in some implementations, performing super-resolution processing on the similar image block and the low-resolution image to generate the super-resolution image may include: and carrying out image exchange processing on the detail-rich image blocks and the similar image blocks to generate a first exchange image. The generated first exchange image has the characteristics of both the detail-rich image block and the similar image block. Specifically, the image exchange processing may be performed by one or more of the following ways: "concat mode", "concat + add mode", or "image swap"; and determining the feature maps of the similar image blocks in the image feature data sets corresponding to the detail-rich image blocks according to the similar image blocks, and performing image exchange processing on the feature maps of the similar image blocks and the feature maps of the low-resolution image blocks to generate a second exchange image. The generated second exchange image has the characteristics of the characteristic diagram of the similar image block and the characteristic diagram of the low-resolution image block; and performing super-resolution processing on the first exchange image, the second exchange image and the low-resolution image to generate a first image. The similar image blocks and the feature maps of the similar image blocks are used as reference maps for super-resolution processing; if the super-resolution processing is not performed for the second time, the generated first image is the super-resolution image. Because the first exchange image and the second exchange image have abundant image characteristic information, the definition of the super-resolution image can be further improved by the processing mode.

With reference to the first aspect, in some implementations, the generating the super-resolution image from the first image may include: and if the secondary super-resolution processing is carried out, acquiring a high-definition image, wherein the resolution of the high-definition image is greater than that of the low-resolution image. For example, a resolution of 256 × 128 for a low resolution image, and 1280 × 960 for a high definition image, which has two possible sources: and (I) the high-definition image is from a preset high-definition gallery. According to the first image, determining the high-definition image in the prefabricated high-definition image library through the first network model, wherein the similarity between the high-definition image and the first image is greater than a first threshold value; (ii) the high-definition image is from a remote device, such as a cloud computing device (cloud computing device system), and in an alternative implementation, the cloud computing device generates the high-definition image based on the transmitted low-resolution image using a third super-resolution network model deployed in the cloud computing device system; in order to further improve the definition of the super-resolution image, the details obtained by the steps are not rich in image blocks. And amplifying the image blocks with the insufficient details to obtain an amplified image. Specifically, the amplification process includes: bicubic or "linear" interpolation processing, when the terminal device acquires an enlarged image, the enlarged image is used as a reference image. After determining similar image blocks of the enlarged image and the low-resolution image, in an optional implementation manner, performing super-resolution processing on the high-definition image, the enlarged image and the first image through a second super-resolution network model to generate a super-resolution image. Specifically, super-resolution processing is performed on an image block similar to the first image in the high-definition image, an image block similar to the first image in the enlarged image and the first image through a second super-resolution network model, so that a super-resolution image is generated. Similar image blocks are found through an XOR matching algorithm, information complementation can be achieved through the similar image blocks, and therefore the definition of the super-resolution image is improved. In addition, the sources of the similar image blocks are various, and the definition of the super-resolution image is further improved.

With reference to the first aspect, in some implementations, before generating the detail-rich image block from the low-resolution image, the method may further include: the super-resolution image processing apparatus acquires various images, and an image set is constituted by these images. The set of images includes: pictures with different texture features, such as animals, sky, human faces or buildings, are collected from the Internet and the published data set, and different types of pictures are mixed in equal proportion to obtain a training set. The source of the image is data sets such as 'DIV 2K' or 'Timofte 91 images', or images obtained by a search engine, and the like; using a low-pass filter to filter the image set, namely deleting smooth image files with less details in the image set, and generating a first sub-training set, wherein the number of image feature information included in the images in the first sub-training set is greater than the number of image feature information included in image blocks with insufficient details; and performing data augmentation processing on the first sub-training set to generate a second sub-training set. Specifically, the data augmentation process includes: image inversion, image rotation, image reduction, and image stretching. The data augmentation process further includes: clipping, translation, affine, perspective, Gaussian noise, non-uniform light, dynamic blurring, random color filling and the like; and generating a first training set according to the first sub-training set and the second sub-training set, wherein the first training set is used for training the first network model. By generating the first training set and training the first network model, the precision of the first network model can be effectively improved.

In a second aspect, embodiments of the present application provide a super-resolution image processing apparatus that can be deployed in a variety of devices, such as a cloud computing device, an edge computing device system, or a terminal device. The super-resolution image processing apparatus includes a generation module and a determination module:

the generating module is used for generating a detail-rich image block and a detail-poor image block according to the low-resolution image, wherein the sizes of the detail-rich image block and the detail-poor image block are smaller than that of the low-resolution image, and the number of image characteristic information included in the detail-rich image block is larger than that of the image characteristic information included in the detail-poor image block;

the determining module is used for determining similar image blocks according to the detail-rich image blocks, wherein the similarity between the image characteristic information included in the similar image blocks and the image characteristic information included in the detail-rich image blocks is greater than a first threshold;

the generating module is further configured to perform super-resolution processing on the similar image block and the low-resolution image to generate a super-resolution image, where the similar image block is used as a reference map of the low-resolution image.

With reference to the second aspect, in some implementations, the generating module is specifically configured to generate an image block set according to the low-resolution image;

the determining module is specifically configured to perform convolution processing on image blocks in the image block set to generate a first convolution data set;

the determining module is specifically configured to perform secondary classification processing on the first convolution data set, and determine the detail-rich image block and the detail-poor image block.

With reference to the second aspect, in some implementations, the generating module is specifically configured to determine, according to the detail-rich image block, an image feature dataset corresponding to the detail-rich image block;

the generating module is specifically configured to perform binarization processing on the determined image feature data set to obtain similarity of any two image blocks in the detail-rich image block;

the determining module is specifically configured to determine the similar image block when the similarity of any two image blocks is greater than the first threshold.

With reference to the second aspect, in some implementations, the similarity of any two image blocks satisfies:

wherein F is the similarity, N is the image size of the detail-rich image block, P (i, j) and Q (i, j) are the feature maps of the image blocks corresponding to any two detail-rich image blocks, respectively, i is an abscissa value of the feature map pixel of the image block, and j is an ordinate value of the feature map pixel of the image block.

With reference to the second aspect, in some implementations, the determining module is specifically configured to determine a position of the low-resolution image in the video when the low-resolution image is a frame in the video;

the determining module is specifically configured to determine a second image frame according to the position, where the second image frame is an adjacent frame of the low-resolution image in the video;

the determining module is specifically configured to determine that an image block in the second image frame at a position corresponding to the detail-rich image block is the similar image block.

With reference to the second aspect, in some implementations, the generating module is specifically configured to perform image exchange processing on the detail-rich image block and the similar image block to generate a first exchange image;

the generating module is specifically configured to determine a feature map of the similar image block according to the similar image block, perform image exchange processing on the feature map of the similar image block and the feature map of the low-resolution image block, and generate a second exchanged image, where the feature map is used to indicate image feature information of the image block;

the generation module is specifically configured to perform super-resolution processing on the first exchanged image, the second exchanged image, and the low-resolution image to generate a first image;

the generation module is specifically configured to generate the super-resolution image according to the first image.

With reference to the second aspect, in some implementations, the super-resolution image processing apparatus further includes an acquisition module;

the acquisition module is used for acquiring a high-definition image, and the resolution of the high-definition image is greater than that of the low-resolution image;

the generation module is specifically configured to perform amplification processing on the image block with poor details to generate an amplified image, where the amplification processing includes bicubic interpolation processing;

the generation module is specifically configured to perform super-resolution processing on the high-definition image, the enlarged image and the first image to generate the super-resolution image, where the high-definition image and the enlarged image serve as reference images of the first image.

With reference to the second aspect, in some implementations, the high-definition image is from a far-end device, and the high-definition image is an image generated by the far-end device through super-resolution processing on the low-resolution image.

With reference to the second aspect, in some implementations, the high-definition images are from a preset high-definition gallery including at least one of the high-definition images.

With reference to the second aspect, in some implementations, the determining module is further configured to determine the high-definition image in the prefabricated high-definition gallery according to the first image, where a similarity between the high-definition image and the first image is greater than the first threshold.

With reference to the second aspect, in some implementations, the obtaining module is further configured to obtain an image set;

the generating module is further configured to perform filtering processing on the image set by using a low-pass filter to generate a first sub-training set, where the number of image feature information included in an image in the first sub-training set is greater than the number of image feature information included in the detail-ineffensive image block;

the generation module is further used for performing data augmentation processing on the first sub-training set to generate a second sub-training set, and the data augmentation processing comprises image inversion, image rotation, image reduction and image stretching;

the generation module is further configured to generate a first training set according to the first sub-training set and the second sub-training set, where the first training set is used to train the first network model, and the first network model is used to generate image blocks with rich details and image blocks with poor details.

With reference to the second aspect, in some implementations, the image feature information includes edge information of the image, contour information of the image, brightness information of the image, and/or color information of the image.

In a third aspect, embodiments of the present application provide a super-resolution image processing apparatus, which includes at least one processor and a memory, where the memory stores computer instructions executable on the processor, and when the computer instructions are executed by the processor, the processor performs the method according to the first aspect or any one of the possible implementation manners of the first aspect.

In a fourth aspect, embodiments of the present application provide a terminal device, where the terminal device includes at least one processor, a memory, a communication port, a display, and computer-executable instructions stored in the memory and executable on the processor, and when the computer-executable instructions are executed by the processor, the processor performs the method according to the first aspect or any one of the possible implementation manners of the first aspect.

In a fifth aspect, embodiments of the present application provide a computer-readable storage medium storing one or more computer-executable instructions, which, when executed by a processor, perform the method according to the first aspect or any one of the possible implementation manners of the first aspect.

In a sixth aspect, embodiments of the present application provide a computer program product (or computer program) storing one or more computer-executable instructions, where when the computer-executable instructions are executed by the processor, the processor executes the method of the first aspect or any one of the possible implementation manners of the first aspect.

In a seventh aspect, the present application provides a chip system, which includes a processor for enabling a terminal device to implement the functions recited in the foregoing aspects. In one possible design, the system-on-chip further includes a memory for storing program instructions and data necessary for the terminal device. The chip system may be formed by a chip, or may include a chip and other discrete devices.

For technical effects brought by any one of the second to seventh aspects or any one of the possible implementation manners, reference may be made to technical effects brought by the first aspect or different possible implementation manners of the first aspect, and details are not described here again.

According to the technical scheme, the embodiment of the application has the following advantages:

the embodiment of the application provides a super-resolution image processing method and a related device, and in the process of generating a super-resolution image according to a low-resolution image, a device operating the super-resolution image processing method firstly generates a detail-rich image block and a detail-poor image block according to the low-resolution image, and splits the low-resolution image into the detail-rich image block with a smaller size, wherein the detail-rich image block is used as a reference image of the low-resolution image, so that the computation load of the device operating the super-resolution image processing method is reduced. Secondly, similar image blocks are determined through the detail-enriched image blocks, so that when the device carries out super-resolution processing on a low-resolution image through the first super-resolution network model, the similar image blocks can be introduced to carry out the super-resolution processing together. Because the image feature information included in the detail-rich image block is more (greater than the detail-poor image block), and the similarity between the image feature information included in the similar image block and the image feature information included in the detail-rich image block is higher (greater than the first threshold), the image feature information included in the similar image block can be considered to be more, and the definition of the super-resolution image can be effectively improved.

Drawings

Fig. 1a is a schematic view of an application scenario proposed in the embodiment of the present application;

FIG. 1b is a system architecture diagram according to an embodiment of the present application;

FIG. 1c is a schematic diagram of a system architecture according to an embodiment of the present application;

fig. 2 is a schematic diagram of a system architecture 200 according to an embodiment of the present application;

FIG. 3 is a schematic structural diagram of a convolutional neural network provided in an embodiment of the present application;

FIG. 4a is a schematic diagram of an embodiment of a super-resolution image processing method in the embodiment of the present application;

fig. 4b is a schematic flowchart of a super-resolution image processing method according to an embodiment of the present application;

fig. 4c is a schematic flowchart of a super-resolution image processing method according to an embodiment of the present application;

fig. 5 is a schematic flowchart of a process for determining similar image blocks according to an embodiment of the present application;

fig. 6 is a schematic flowchart of a process for determining similar image blocks according to an embodiment of the present application;

fig. 7 is a schematic flowchart of super-resolution processing according to an embodiment of the present application;

fig. 8a is a schematic flowchart of super-resolution processing according to an embodiment of the present application;

FIG. 8b is a schematic diagram of a simulation experiment in an embodiment of the present application;

FIG. 8c is a diagram illustrating the calculation results of the interpolation algorithm;

fig. 8d is a schematic diagram illustrating a calculation result of the super-resolution image processing method according to the embodiment of the present application;

FIG. 8e is a schematic diagram of a simulation experiment in an embodiment of the present application;

FIG. 9 is a schematic flow chart of generating a training set according to an embodiment of the present application;

fig. 10 is a schematic diagram of an embodiment of a super-resolution image processing apparatus 1000 according to an embodiment of the present application;

FIG. 11 is a schematic structural diagram of a computing device according to an embodiment of the present application;

fig. 12 is a schematic structural diagram of a chip according to an embodiment of the present disclosure.

Detailed Description

The embodiment of the application provides a super-resolution image processing method and a related device, wherein when equipment operating the super-resolution image processing method generates a super-resolution image according to a low-resolution image, a detail-rich image block is generated according to the low-resolution image, the low-resolution image is split into the detail-rich image blocks with smaller sizes, and the detail-rich image blocks are used as reference images of the low-resolution image, so that the operation amount of the equipment is reduced. Secondly, similar image blocks are determined through the detail-rich image blocks, so that when the equipment performs super-resolution processing on a low-resolution image, the similar image blocks can be introduced to perform the super-resolution processing together, and the image definition after the super-resolution processing is improved.

Embodiments of the present application are described below with reference to the accompanying drawings. As can be known to those skilled in the art, with the development of technology and the emergence of new scenarios, the technical solution provided in the embodiments of the present application is also applicable to similar technical problems.

The terms "first," "second," and the like in the description and in the claims of the present application and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances and are merely descriptive of the various embodiments of the application and how objects of the same nature can be distinguished. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of elements is not necessarily limited to those elements, but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

The super-resolution image processing method provided by the application can be deployed on different devices, for example: (1) the system is deployed in a mobile terminal (terminal equipment); (2) deployed in the cloud (server, cloud computing device or cloud computing device system); (3) the mobile terminal is partially deployed at a mobile terminal (terminal device), the cloud terminal is partially deployed at a cloud terminal (a server, a cloud computing device or a cloud computing device system), and the mobile terminal and the cloud terminal are matched for use. For easy understanding, please refer to fig. 1a, where fig. 1a is a schematic view of an application scenario provided in the embodiment of the present application.

And S1, acquiring the media file.

In step S1, the media file may be a video file, such as an Audio Video Interleaved (AVI) video file; but may also be a picture file, such as a Joint Photographic Experts Group (JPEG) picture file, and is not limited herein.

There are many cases of S1, and the following description will be made separately:

the super-resolution image processing method is deployed in the terminal equipment and comprises the following steps:

the terminal device plays the local media file, such as the 'album' application program plays the local media file. The terminal equipment acquires the media file from a local memory. And after the terminal equipment acquires the media file, performing subsequent super-resolution image processing on the media file.

Secondly, the terminal device plays the cloud media file, and the super-resolution image processing method provided by the application is deployed in the terminal device:

the terminal device plays the cloud media file, for example, the "you ku" application plays the cloud media file. And the terminal equipment acquires the media file from the server providing the cloud media file playing service and performs subsequent super-resolution image processing on the media file.

Thirdly, the terminal equipment plays the local media file, and the super-resolution image processing method provided by the application is deployed in the server:

when the terminal equipment plays the local media file, the terminal equipment acquires the media file from a local memory, sends the media file to a server with a super-resolution image processing method, and the server performs super-resolution image processing on the media file. The server sends the super-resolution image processing result to the terminal equipment, and the terminal equipment plays the processed local media file based on the processing result.

Fourthly, the terminal equipment plays the cloud media file, and the super-resolution image processing method provided by the application is deployed in the server:

when the terminal device plays the cloud media file, the terminal device (or a server providing the cloud media file playing service) notifies the server with the super-resolution image processing method, and the media file is acquired through the address of the cloud media file. After the server performs super-resolution image processing on the media file, the server sends a processing result to the terminal device (or a server providing the cloud media file playing service). When the processing result is sent to the terminal equipment: and the terminal equipment plays the processed cloud media file based on the processing result. When the processing result is sent to a server providing the cloud media file playing service: the server providing the cloud media file playing service forwards the processing result to the terminal device, and the terminal device plays the processed cloud media file.

And fifthly, the terminal equipment plays the local media file, and part of the super-resolution image processing method provided by the application is deployed on the terminal equipment and part of the super-resolution image processing method is deployed on the server.

When the terminal equipment plays the local media file, the terminal equipment acquires the media file from the local memory. After the terminal device acquires the media file, the terminal device and the server cooperate to perform subsequent super-resolution image processing on the media file.

And sixthly, the terminal equipment plays the cloud media file, and the super-resolution image processing method provided by the application is partially deployed in the terminal equipment and partially deployed in the server.

And the terminal equipment plays the cloud media file. And the terminal equipment acquires the media file from the server providing the cloud media file playing service. After the terminal device acquires the media file, the terminal device and the server cooperate to perform subsequent super-resolution image processing on the media file.

And S2, acquiring a low-resolution image.

In step S2, the terminal device and/or the server deployed with the super-resolution image processing method acquires a media file, and processes the media file. Different processing modes exist for different media files, and the following description is given separately:

when the media file is a video file, the terminal device and/or the server, which is deployed with the super-resolution image processing method, extracts an image frame file in the video file according to the video file, wherein the image frame file can be any encoded frame file in the video file, such as a key frame (I frame), or other frames, such as a P frame or a B frame, and the like. And acquiring a low-resolution image corresponding to the image frame file according to the image frame file.

When the media file is a picture file, the terminal device and/or the server which is provided with the super-resolution image processing method is deployed, and a low-resolution image corresponding to the picture file is obtained according to the picture file.

And S3, performing super-resolution processing on the low-resolution image.

In step S3, the terminal device and/or the server in which the super-resolution image processing method is deployed performs super-resolution image processing on the low-resolution image, and outputs the super-resolution image. The specific processing flow is described in detail in the following embodiments.

In this embodiment, the super-resolution image processing method provided by the embodiment of the present application can be applied to various application environments, and can provide a super-resolution image processing service in various application environments. Has the characteristics of wide application range, high practicability and the like.

The super-resolution image processing method provided by the embodiment of the application can be executed by a super-resolution image processing device. As in the foregoing embodiments, the position where the super-resolution image processing apparatus is deployed is not limited in the embodiments of the present application. For example, as shown in fig. 1b, fig. 1b is a schematic diagram of a system architecture provided in the embodiment of the present application, and the super-resolution image processing apparatus may be operated in a cloud computing device system (including at least one cloud computing device, such as a server, etc.), an edge computing device system (including at least one edge computing device, such as a server, a desktop computer, etc.), or various terminal devices, for example: mobile phones, notebook computers, personal desktop computers, and the like.

The respective components in the super-resolution image processing apparatus may also be disposed in different systems or servers, respectively. For example, as shown in fig. 1c, each part of the apparatus may operate in three environments, namely, a cloud computing device system, an edge computing device system, or a terminal device, respectively, or may operate in any two of the three environments. The cloud computing equipment system, the edge computing equipment system and the terminal equipment are connected through communication paths, and can communicate with each other and transmit data. The training method of the classification model provided by the embodiment of the application is cooperatively executed by each combined part of the super-resolution image processing device which runs in three environments (or any two of the three environments).

The following description will be given taking an example in which a part of the super-resolution image processing apparatus is disposed in a terminal device and another part is disposed in a cloud computing device system. Referring to fig. 2, fig. 2 is a schematic diagram of a system architecture 200 according to an embodiment of the present application, where portions of a super-resolution image processing apparatus are disposed on different devices on the system architecture 200, so that the devices in the system architecture 200 cooperate to realize functions of the super-resolution image processing apparatus. As shown in fig. 2, the system architecture 200 includes a server 220, a database 230, a first communication device 240, a data storage system 250, and a second communication device 260, wherein the database 230, the server 220, and the data storage system 250 belong to a cloud computing device system, and the first communication device 240 and the second communication device 260 belong to a terminal device.

Illustratively, the first communication device 240 is configured to acquire a low-resolution image and transmit the low-resolution image to the server 220, and the server 220 generates a high-definition image according to the low-resolution image through a third super-resolution network model deployed in the server 220.

Optionally, in order to save network bandwidth resources and computational resources, the third super-resolution network model deployed in the server 220 may generate a high-definition image according to the low-resolution image from the first communication device 240 at intervals of T time, where T is a positive integer; alternatively, in the low-resolution image (set) from the first communication device 240, every Y images, where Y is a positive integer, one low-resolution image may be selected to generate a high-definition image, which is not limited herein.

Database 230 stores a first training set (the first training set includes a first sub-training set and a second sub-training set) for server 220 to iteratively train the first network model. The server 220 may send the trained first network model to the first communication device 240 every time a period of time elapses, so that the first communication device 240 updates the local first network model. The first training set may be uploaded to the server 220 by the user through the first communication device 240, or may be obtained by the server 220 from a data set such as a search engine or "DIV 2K" through a data collection device.

In this embodiment of the application, after the server 220 generates a high-definition image according to the low-resolution image uploaded by the first communication device 240, the high-definition image is sent to the first communication device 240. The first communication device 240 performs super-resolution processing on the low-resolution image using a first super-resolution network model deployed locally, generating a first image. The first communication device 240 performs super-resolution processing on the first image and the high-definition image using a second super-resolution network model deployed locally, and generates a super-resolution image.

Optionally, the server 220 may also train one or more of the first super-resolution network model, the second super-resolution network model, and the third super-resolution network model. The server 220 may send the trained first and second super-resolution network models to the first communication device 240, so that the first communication device 240 updates the local first and second super-resolution network models. Before the server 220 sends the super-resolution network model to the first communication device 240, the server 220 may also process the model using "HiAI Convert" or "sharernn Convert" software so that the first communication device 240 may successfully run the super-resolution network model. It should be noted that the first super-resolution network model and the second super-resolution network model may be two components of the same super-resolution network model, or may be different super-resolution network models, and are not limited herein. The server 220 may update a third super-resolution network model local to the server 220 using the trained third super-resolution network model, which has model parameters larger than the first super-resolution network model (and the second super-resolution network model), i.e. the third super-resolution network model is larger than the first super-resolution network model (and the second super-resolution network model), since the server 220 typically has higher computational resources than the first communication device 240.

Optionally, the high definition images may also come from a preset high definition gallery, which is stored in the data storage system 250. The preset high definition gallery may also be stored in the first communication device 240. The preset high-definition gallery may be acquired by the server 220 from a data set such as a search engine or "DIV 2K" through a data acquisition device, or acquired by the first communication device 240, which is not limited herein.

Optionally, the trained first network model, first super-resolution network model, and second super-resolution network model of the server 220 are sent to the second communication device 260. The above model is executed by the second communication device 260 so that the second communication device 260 functions as a part of the super-resolution image processing apparatus, and the super-resolution image processing method proposed by the present application is executed.

The first communication device 240 and the second communication device 260 include, but are not limited to, personal computers, computer workstations, smart phones, tablets, smart cameras, smart cars or other types of cellular phones, media consumption devices, wearable devices, set-top boxes, game consoles, and the like.

The first communication device 240 and the server 220 and the second communication device 260 and the server 220 may be connected via a wireless network. Wherein the wireless network described above uses standard communication techniques and/or protocols. The wireless network is typically the internet, but may be any network including, but not limited to, any combination of Local Area Networks (LANs), Metropolitan Area Networks (MANs), Wide Area Networks (WANs), mobile, private, or virtual private networks. In other embodiments, custom or dedicated data communication techniques may also be used in place of or in addition to the data communication techniques described above.

Although only one server 220, one first communication device 240 and one second communication device 260 are shown in fig. 2, it should be understood that the example in fig. 2 is only used for understanding the present solution, and the number of the specific server 220, the first communication device 240 and the second communication device 260 should be flexibly determined according to the actual situation.

The embodiment of the application provides: the first super-resolution network model, the second super-resolution network model, the third super-resolution network model and the first network model are all neural network models for processing image data. At present, the neural network model for processing image data is a Convolutional Neural Network (CNN) and other neural networks based on the convolutional neural network (e.g., a Recurrent Neural Network (RNN), a Super Resolution Convolutional Neural Network (SRCNN), a deep-recursive neural network (DRCN), or a sub-pixel volume neural network (ESPCN), etc.). For the convenience of understanding the present application, the super-resolution image processing method proposed in the present application will be described below by taking a convolutional neural network as an example.

Referring to fig. 3, fig. 3 is a schematic structural diagram of a convolutional neural network according to an embodiment of the present disclosure, in which the Convolutional Neural Network (CNN) is a deep neural network with a convolutional structure, and is a deep learning (deep learning) architecture, and the deep learning architecture refers to performing multiple levels of learning at different abstraction levels through a machine learning algorithm. As a deep learning architecture, it is a feed-forward (feed-forward) artificial neural network. As shown in fig. 3, convolutional neural network 100 may include an input layer 110, a convolutional/pooling layer 120, where the pooling layer is optional, and a neural network layer 130.

As shown in FIG. 3, convolutional layer/pooling layer 120 may include, for example, 121-126 layers, in one implementation, 121 layers are convolutional layers, 122 layers are pooling layers, 123 layers are convolutional layers, 124 layers are pooling layers, 125 layers are convolutional layers, and 126 layers are pooling layers; in another implementation, 121, 122 are convolutional layers, 123 are pooling layers, 124, 125 are convolutional layers, and 126 are pooling layers. I.e., the output of a convolutional layer may be used as input to a subsequent pooling layer, or may be used as input to another convolutional layer to continue the convolution operation.

Taking convolutional layer 121 as an example, convolutional layer 121 may include a plurality of convolution operators, also called kernels, whose role in image processing is to act as a filter for extracting specific information from an input image matrix, and a convolution operator may be essentially a weight matrix, which is usually predefined, and during the convolution operation on an image, the weight matrix is usually processed on the input image pixel by pixel (or two pixels by two pixels, etc., the number of pixels depends on the value of step stride) along the horizontal direction, so as to complete the task of extracting specific features from the image. The size of the weight matrix should be related to the size of the image, and it should be noted that the depth dimension (depth dimension) of the weight matrix is the same as the depth dimension of the input image, and the weight matrix extends to the entire depth of the input image during the convolution operation. Thus, convolving with a single weight matrix will produce a single depth dimension of the convolved output, but in most cases not a single weight matrix is used, but a plurality of weight matrices of the same dimension are applied. The outputs of each weight matrix are stacked to form the depth dimension of the convolved image. Different weight matrixes can be used for extracting different features in the image, for example, one weight matrix is used for extracting image edge information, another weight matrix is used for extracting specific colors of the image, another weight matrix is used for blurring unnecessary noise points in the image, and the like.

The weight values in these weight matrices need to be obtained through a large amount of training in practical application, and each weight matrix formed by the trained weight values can extract information from the input image, thereby helping the convolutional neural network 100 to make correct prediction.

When convolutional neural network 100 has multiple convolutional layers, the initial convolutional layer (e.g., 121) tends to extract more general features, which may also be referred to as low-level features; as the depth of the convolutional neural network 100 increases, the more convolutional layers (e.g., 126) that go further back extract more complex features, such as features with high levels of semantics, the more highly semantic features are more suitable for the problem to be solved.

Since it is often necessary to reduce the number of training parameters, it is often necessary to periodically introduce pooling layers after the convolutional layer, i.e. the layers 121-126 as illustrated by 120 in fig. 3, may be one convolutional layer followed by one pooling layer, or may be multiple convolutional layers followed by one or more pooling layers. During image processing, the only purpose of the pooling layer is to reduce the spatial size of the image. The pooling layer may include an average pooling operator and/or a maximum pooling operator for sampling the input image to smaller sized images. The average pooling operator may calculate pixel values in the image over a particular range to produce an average. The max pooling operator may take the pixel with the largest value in a particular range as the result of the max pooling. In addition, just as the size of the weighting matrix used in the convolutional layer should be related to the image size, the operators in the pooling layer should also be related to the image size. The size of the image output after the processing by the pooling layer may be smaller than the size of the image input to the pooling layer, and each pixel point in the image output by the pooling layer represents an average value or a maximum value of a corresponding sub-region of the image input to the pooling layer.

After processing by convolutional layer/pooling layer 120, convolutional neural network 100 is not sufficient to output the required output information. Because, as previously described, the convolutional layer/pooling layer 120 only extracts features and reduces the parameters brought by the input image. However, to generate the final output information (class information or other relevant information as needed), the convolutional neural network 100 needs to generate one or a set of outputs of the number of classes as needed using the neural network layer 130. Therefore, the neural network layer 130 may include a plurality of hidden layers (such as 131, 132 to 13n shown in fig. 3) and an output layer 140, where parameters included in the hidden layers may be obtained by pre-training according to related training data of a specific task type, for example, the task type may include image processing and skill selection after image processing, where the image processing portion may include image recognition, image classification, image super-resolution processing, and the like, and after processing an image, the skill selection may be performed according to the acquired image information; as an example, for example, when the application is applied to super-resolution image processing, the neural network is embodied as a convolutional neural network and the task is to super-resolution process the image: inputting the low-resolution image into a convolutional neural network of the neural network, the convolutional neural network needs to identify the low-resolution image, and then obtains various image feature information in the image, for example: and determining similar image blocks similar to the low-resolution image according to the contour information, the brightness information of the image, the texture information of the image and the like. Then, the convolutional neural network combines the similar image blocks to perform super-resolution processing on the low-resolution image to generate a super-resolution image; optionally, in order to further improve the definition of the super-resolution image, the convolutional neural network identifies the high-definition image to determine a high-definition image similar to the low-resolution image, and then performs super-resolution processing using the high-definition image, the similar image block, and the low-resolution image to generate the super-resolution image, where the high-definition image and the similar image block serve as reference maps of the low-resolution image, and so on.

After the hidden layers in the neural network layer 130, i.e. the last layer of the whole convolutional neural network 100 is the output layer 140, the output layer 140 has a loss function similar to the class cross entropy, and is specifically used for calculating the prediction error, once the forward propagation (i.e. the propagation from 110 to 140 in fig. 3 is the forward propagation) of the whole convolutional neural network 100 is completed, the backward propagation (i.e. the propagation from 140 to 110 in fig. 3 is the backward propagation) starts to update the weight values and the bias of the aforementioned layers, so as to reduce the loss of the convolutional neural network 100 and the error between the result output by the convolutional neural network 100 through the output layer and the ideal result.

It should be noted that the convolutional neural network 100 shown in fig. 3 is only an example of a convolutional neural network, and in a specific application, the convolutional neural network may also exist in the form of other network models, and the description of other types of neural networks is not repeated here.

In conjunction with the above description, a specific implementation flow of the super-resolution image processing method provided by the embodiment of the present application is described below. The super-resolution image processing apparatus will be described by taking an example in which a part thereof is disposed in a terminal device and the other part thereof is disposed in a cloud computing device system. Referring to fig. 4a, fig. 4a is a schematic diagram illustrating an embodiment of a super-resolution image processing method according to an embodiment of the present application. An embodiment of a super-resolution image processing method in an embodiment of the present application includes:

401. a low resolution image is acquired.

In this embodiment, when the terminal device plays the media file (as described in the foregoing embodiment of fig. 1 a), the terminal device may obtain the low-resolution image corresponding to the media file. Specifically, when the media file is a video file, the acquired low-resolution image is any image frame in the video file, such as a key frame (I-frame), or other frames, such as P-frames or B-frames. When the media file is an image file, the acquired low-resolution image is the image file.

402. An image block set is generated from the low resolution image.

In this embodiment, after the terminal device acquires the low-resolution image, the terminal device first divides the low-resolution image into low-resolution image blocks with smaller sizes, and the low-resolution image blocks form an image block set. Specifically, the low-resolution image is divided into low-resolution image blocks with 32 pixels each, and the specific size of the low-resolution image block is determined by actual requirements (i.e., requirements of a subsequent neural network model, such as the first network model), which is not limited herein.

403. And performing convolution processing on the image block set corresponding to the low-resolution image through the first network model to generate an image characteristic data set corresponding to the low-resolution image block.

In this embodiment, the terminal device divides the low-resolution image to generate a low-resolution image block. The terminal equipment performs convolution processing on an image block set corresponding to the low-resolution image through a first network model, and the first network model outputs a convolution layer processing result, wherein the result is called an image feature data set corresponding to the low-resolution image block and comprises a feature map of the image block.

Optionally, the first network model includes a plurality of convolutional layers (more than 2 layers), and the convolutional layers may extract a feature map of the image, for example, extract edge information of the image, outline information of the image, brightness information of the image, and/or color information of the image, and so on. The convolution processing results output by the first two layers of convolution layers of the first network model are selected as a feature map of the low-resolution image block, which is also called as a low-resolution image block feature map. The set of these low resolution image block feature maps is referred to as the image feature dataset to which the low resolution image block corresponds.

404. And performing convolution processing on the image characteristic data set corresponding to the low-resolution image block through the first network model to generate a first convolution data set.

In this embodiment, since the first network model needs to classify the low-resolution image blocks, the first network model performs convolution processing on the low-resolution image blocks, and the generated image feature data set corresponding to the low-resolution image blocks is only an output result of the preliminary convolution processing. Further convolution processing needs to be performed on the image feature data set corresponding to the low-resolution image block through the first network model. The convolution processing result output by the convolution processing is used as source data of subsequent classification processing. The convolution processing result is referred to as a first convolution data set.

405. And carrying out secondary classification processing on the first convolution data set through the first network model to determine detailed image blocks and detailed image blocks.

In this embodiment, after the terminal device generates the first convolution data set, the second classification processing (softmax) is performed on the first convolution data set through the first network model, and the image block with rich details and the image block with poor details are determined. The first network model may be a classification network (regression network), and is specifically composed of several convolution layers and at least one softmax layer. When the low resolution image block is input into the first network model for processing, the result output at the last convolutional layer is called a first convolutional data set. Then, the first volume data set is input to the softmax layer for a second classification process to determine which image blocks of the set of image blocks are detail-rich image blocks and which image blocks are detail-poor image blocks. The criteria for a particular classification may be: when the feature map of an image block shows that the image block has no outline (e.g., a blue sky background), the softmax layer outputs "0" corresponding to the image block to indicate that the image block is a detail-poor image block. The detail-rich image block has rich image feature information, including but not limited to: contour information of an image, brightness information of an image, texture information of an image, and the like.

Step 401-. The above process can be described using the following formula:

CLASSIFY＝Softmax(Conv(resize(Crop(Input))_(H,W,1)))；

wherein Input indicates an Input low-resolution image, H in (H, W,1) indicates the height of the low-resolution image, "W" indicates the width of the low-resolution image, "1" indicates that the Input low-resolution image is a single-channel image (i.e., a grayscale image), "Crop" indicates that the Input low-resolution image is divided into low-resolution image blocks, "resize" indicates an image block in which the low-resolution image block is uniformly scaled to a fixed size, "Conv" indicates a convolution processing result obtained by performing convolution processing on the scaled low-resolution image block, "Softmax" indicates a result obtained by performing secondary classification processing on the convolution processing result (first convolution data set), and "CLASSIFY" indicates that the low-resolution image block is a detail-rich image block or a detail-poor image block, for example, "CLASSIFY ═ 0" indicates that the low-resolution image block is a detail-less image block.

To facilitate understanding of the above steps (401 and 405), please refer to fig. 4b, and fig. 4b is a flowchart illustrating a super-resolution image processing method according to an embodiment of the present application. And after the terminal equipment acquires the low-resolution image, generating a low-resolution image block according to the low-resolution image. The low resolution image blocks are processed using the first network model to ultimately determine which of the low resolution image blocks are detail-rich image blocks (e.g., image blocks that include a building or a face outline) and which of the low resolution image blocks are detail-poor image blocks (e.g., image blocks that include a sky background).

406. And determining similar image blocks according to the detail-rich image blocks.

In this embodiment, the terminal device determines similar image blocks according to the detail-enriched image blocks. There are a number of different schemes for how similar image blocks are determined, which are explained below in connection with the figures. First, similar image blocks are determined within the low resolution image. And secondly, determining similar image blocks in images except the low-resolution image.

First, similar image blocks are determined within the low resolution image. Referring to fig. 5, fig. 5 is a schematic diagram illustrating a process of determining similar image blocks according to an embodiment of the present disclosure.

And D1, determining the image characteristic data set corresponding to the detail-rich image block in the image characteristic data set corresponding to the low-resolution image block according to the detail-rich image block.

In step D1, after the terminal device determines which image blocks of the low-resolution image blocks are detail-rich image blocks, the terminal device determines which feature maps correspond to the detail-rich image blocks in the image feature data set corresponding to the low-resolution image blocks output by the first network model according to the detail-rich image blocks. These feature maps corresponding to the detail-rich image blocks are collectively referred to as the image feature dataset to which the detail-rich image blocks correspond.

For example: the image feature dataset to which the low-resolution image blocks correspond comprises A, B, C, D, E and F, the feature maps of these 6 low-resolution image blocks. When the first network model determines A, B, C, D, E and F of the 6 low resolution image blocks, A and B are detail-rich image blocks. Then in step D1, the terminal device finds the feature maps corresponding to the a and B image blocks in the image feature dataset corresponding to the low-resolution image block, and these determined feature maps are collectively referred to as the image feature dataset corresponding to the detail-rich image block. That is, the image feature data set corresponding to the detail-rich image block includes the feature map of the a image block and the feature map of the B image block.

And D2, carrying out binarization processing on the image feature data set corresponding to the detail-rich image blocks, calculating the similarity of any two image blocks in the detail-rich image blocks, and determining similar image blocks according to the similarity.

In this embodiment, after determining the image feature data set (i.e., the feature map of the detail-rich image block) corresponding to the detail-rich image block, the terminal device calculates the similarity between any two image blocks in the detail-rich image block for convenience of subsequent calculation. Firstly, binarization processing needs to be carried out on a feature map in an image feature data set corresponding to an image block with rich details. The binarization processing specifically refers to a process of setting the characteristic value of each pixel point on the image to be 0 or 1, that is, the whole image presents an obvious black-and-white effect. The binarization process may be generally performed using "OpenCV" or "matlab".

And secondly, after the binary data of the feature map in the image feature data set corresponding to the detail-rich image blocks are obtained, calculating to obtain the similarity of any two image blocks in the detail-rich image blocks. In this embodiment, in order to reduce the occupancy rate of the operation resources, the similarity is calculated by using an exclusive or matching algorithm. The formula for calculating the similarity specifically is as follows:

wherein, F is the similarity, N is the image size of the detail-rich image block, P (i, j) and Q (i, j) are feature maps of image blocks corresponding to any two detail-rich image blocks, i is an abscissa value of a feature map pixel of an image block, and j is an ordinate value of the feature map pixel of the image block.

The following examples illustrate: and the terminal equipment randomly selects two feature maps in the image feature data set corresponding to the detail-rich image block, and calculates the similarity of the two feature maps, wherein the two feature maps are a first feature map P and a second feature map Q. "P (i, j)" represents binarized data of the first feature map at the coordinates (i, j) in the map after the first feature map is subjected to binarization processing, and for example: "P (1, 1) ═ 1" indicates that the binarized data of the first feature map at the (1, 1) coordinates is 1. Similarly, "Q (i, j)" represents binarized data of the second feature map at the coordinates (i, j) in the map after the second feature map is subjected to the binarization processing. Because the image sizes of the first feature map and the second feature map are consistent, the binary data of the same coordinate on each feature map is selected for exclusive OR calculation. The results of the exclusive-or calculation of all coordinates on a feature map are then summed and divided by the total number of pixel coordinates ("N x N") on a feature map. And finally, calculating the similarity of the image blocks corresponding to the two feature maps.

Through the calculating method, the similarity of any two image blocks in the image characteristic data set corresponding to the image blocks with rich details can be calculated, and when the similarity is greater than a first threshold value, the two image blocks can be determined to be similar image blocks. The first threshold is determined according to practical requirements, and is not limited herein, and in an alternative, the first threshold is 0.7.

To facilitate understanding of the above steps (D1-D2), please refer to fig. 4c, and fig. 4c is a flowchart illustrating a super-resolution image processing method according to an embodiment of the present application. And after the terminal equipment acquires the low-resolution image, generating a low-resolution image block according to the low-resolution image. And processing the low-resolution image block by using the first network model, and outputting a result in the convolution processing process to obtain an image characteristic data set corresponding to the image block with rich details. And then, carrying out binarization processing on the feature map in the image feature data set corresponding to the image block with rich details. And finally, determining similar image blocks by calculating the similarity.

In this embodiment, through the above manner, a specific implementation manner of how to determine similar image blocks is provided, and the implementation flexibility of the scheme is improved. By carrying out binarization processing on the feature map and calculating the similarity by using an XOR matching algorithm, the more accurate similarity of the image block can be obtained under the condition of occupying less calculation resources.

And secondly, determining similar image blocks in images except the low-resolution image.

Note that, similar image blocks are specified for images other than the low-resolution image, and there are two cases (1) in which images other than the low-resolution image belong to the same video file as the low-resolution image. (2) And the images other than the low-resolution image and the low-resolution image do not belong to the same video file. For the case (2), the method for specifically determining similar image blocks is similar to the embodiment corresponding to fig. 5, and is not repeated here.

This case (1) is explained in the present embodiment. Referring to fig. 6, fig. 6 is a schematic diagram illustrating a process for determining similar image blocks according to an embodiment of the present disclosure.

And F1, determining the position of the low-resolution image in the video.

In step F1, the low-resolution image acquired by the terminal device is from a video file. First, when the low-resolution image is one frame (first image frame) in a video, the terminal device determines the position of the low-resolution image in the video, that is, the position information of the first image frame in the video. For example, a low resolution image corresponding to a certain detail-rich image block is determined, the low resolution image (first image frame) being located as the 10 th frame in the video file.

And F2, determining a second image frame according to the position.

In step F2, since the video file has coherence and the picture similarity of adjacent frames is high, when the low-resolution image corresponding to a certain detail-rich image block is determined, the low-resolution image is located in the video file. I.e. after the position of the first image frame is determined. And searching image frames within a certain range before and after the first image frame, and determining one or more image frames as second image frames. In an alternative implementation, the second image frame is a key frame containing complete image information.

F3, determining a similar image block from the second image frame.

In step F3, after the terminal device determines the second image frame, the terminal device decodes the second image frame to obtain a corresponding image file. The terminal device divides the image file to generate a plurality of image blocks.

Illustratively, any image block is selected from the images obtained by decoding the second image frame, any detail-rich image block is selected, and the two image blocks are used for calculation to determine which image block is a similar image block in the image file corresponding to the second image frame. The specific calculation method is similar to the corresponding flow of fig. 5, and is not described herein again.

Illustratively, image blocks in the second image frame corresponding to the detail-rich image blocks are determined to be similar image blocks. And determining the image block as a similar image block at the same coordinate position of the image file according to the coordinates of the detail-rich image block in the low-resolution image. For example: and dividing the low-resolution image into 3 x 3 image blocks, wherein the detail-enriched image block is the image block in the first row and the first column in the low-resolution image. The terminal device divides the image file obtained by decoding the second image frame into 3 × 3 (the size of the low-resolution image is consistent with that of the image file), and the terminal device determines that the image block in the first row and the first column of the image file is a similar image block.

In this embodiment, a specific implementation of how to determine similar image blocks is provided in the foregoing manner, and similar image blocks may be determined in multiple manners, so that implementation flexibility of the scheme is improved. On the premise of occupying lower computing resources, the similar image blocks can be determined, and the power consumption of the terminal equipment which deploys the super-resolution image processing method provided by the application is reduced.

407. And performing super-resolution processing on the similar image blocks and the low-resolution image to generate a super-resolution image.

In this embodiment, the terminal device performs super-resolution processing on the similar image block and the low-resolution image to generate a super-resolution image. Taking the example that the terminal device performs super-resolution image processing through the first super-resolution network model, the terminal device may generate different super-resolution images through multiple schemes. The following are described separately:

a first processing mode; the secondary super-resolution processing is not performed.

In this case, the terminal device performs the super-resolution processing using only the first super-resolution network model. Specifically, please refer to fig. 7, and fig. 7 is a schematic flowchart of a super-resolution processing according to an embodiment of the present disclosure.

G1, performing image exchange processing on the detail-rich image blocks and the similar image blocks to generate a first exchange image.

In step G1, the terminal device performs image exchange processing on the detail-rich image block and the similar image block to generate a first exchanged image. The generated first exchange image has the characteristics of both the detail-rich image block and the similar image block. Specifically, the image exchange processing may be performed by one or more of the following ways: "concat mode", "concat + add mode", or "image swap".

G2, according to the similar image blocks, determining the feature maps of the similar image blocks in the image feature data sets corresponding to the detail-rich image blocks, and performing image exchange processing on the feature maps of the similar image blocks and the feature maps of the low-resolution image blocks to generate a second exchange image.

In step G2, the terminal device determines the feature maps of the similar image blocks in the image feature data sets corresponding to the image blocks with rich details according to the similar image blocks, and performs image exchange processing on the feature maps of the similar image blocks and the feature maps of the low-resolution image blocks to generate a second exchanged image. And the generated second exchange image has the characteristics of the characteristic diagram of the similar image block and the characteristic diagram of the low-resolution image block. Specifically, the image exchange processing may be performed by one or more of the following ways: "concat mode", "concat + add mode", or "feature map Swap".

G3, performing super-resolution processing on the first exchange image, the second exchange image and the low-resolution image to generate a first image.

In step G3, the terminal device performs super-resolution processing on the first exchanged image, the second exchanged image, and the low-resolution image to generate a first image. And the similar image blocks and the feature maps of the similar image blocks are used as reference maps for super-resolution processing.

For the super-resolution processing described above, the following formula can be used to describe:

HR(H,W)＝Conv(Input(4,H,W,1))；

here, "Input (H, W, 1)" indicates that 4 similar "RGB" or "YUV" images are Input, the number of channels of the image is 1, and "HR (H, W)" indicates a super-resolution image.

G4, generating a super-resolution image according to the first image.

In step G4, if the terminal device does not perform the secondary super-resolution processing, the first image generated in step G3 is a super-resolution image. And the first image output by the first super-resolution network model is a super-resolution image.

A second treatment mode; and performing secondary super-resolution processing. If the terminal device performs the secondary super-resolution processing, the content of step G4 is specifically shown in fig. 8 a. Fig. 8a is a schematic flowchart of a super-resolution processing proposed in an embodiment of the present application.

H1, acquiring a high-definition image.

In step H1, the terminal device acquires a high-definition image, and the resolution of the high-definition image is greater than that of the low-resolution image. There are two possible sources for this high definition image. And (I) the high-definition image is from a preset high-definition gallery. And (II) the high-definition image comes from a cloud computing device system, and the cloud computing device system generates the high-definition image based on the low-resolution image sent by the terminal device by using a third super-resolution network model deployed in the cloud computing device system. The following description will be made separately.

And (I) the high-definition image is from a preset high-definition gallery. A large number of images with higher resolution are stored in the preset high-definition image library. Therefore, the terminal equipment uses the images to improve the precision of super-resolution processing and the definition of the super-resolution image.

And (II) the high-definition image comes from a cloud computing device, and the cloud computing device (the cloud computing device system) generates the high-definition image based on the low-resolution image sent by the terminal device by using a third super-resolution network model deployed in the cloud computing device system. In this case, after the terminal device acquires the low-resolution image (step 401), the terminal device transmits the low-resolution image to the cloud computing device system in which the super-resolution image processing apparatus is deployed, except that the terminal device itself performs the subsequent steps, and the cloud computing device system generates a high-definition image based on the low-resolution image (using the third super-resolution network model) by performing subsequent step 402 and step 406. And the cloud computing equipment system sends the generated high-definition image to the terminal equipment, the terminal equipment processes the generated image by using the high-definition image and the self super-resolution image, then super-resolution processing is carried out, and finally a super-resolution image is generated and serves as a reference image for the super-resolution processing. So as to improve the definition of the super-resolution image. The third super-resolution network model has the characteristics of large computing resource occupation, good super-resolution processing effect and the like (compared with the first super-resolution network model and the second super-resolution network model deployed in the terminal device).

And when the terminal equipment acquires the high-definition image, using the high-definition image as a reference image. The terminal device needs to determine similar image blocks of the high-definition image and the low-resolution image. The method for specifically determining similar image blocks is similar to the description of the foregoing embodiments, and is not repeated here.

And H2, generating an enlarged image by enlarging the image blocks with less detail.

In step H2, in order to further improve the definition of the super-resolution image, the terminal device may not enrich the image blocks with details obtained in the foregoing steps. The image blocks with the insufficient details are amplified to generate an amplified image. Specifically, the amplification process includes: bicubic or "linear" interpolation process.

When the terminal device acquires the enlarged image, the enlarged image is used as a reference image. The terminal device needs to determine similar image blocks of the enlarged image and the low resolution image. The method for specifically determining similar image blocks is similar to the description of the foregoing embodiments, and is not repeated here.

H3, performing super-resolution processing on the high-definition image, the enlarged image and the first image through the second super-resolution network model to generate a super-resolution image.

In step H3, the terminal device performs super-resolution processing on the high-definition image, the enlarged image, and the first image to generate a super-resolution image. Specifically, the terminal device performs super-resolution processing on an image block similar to the first image in the high-definition image, an image block similar to the first image in the enlarged image, and the first image through the second super-resolution network model to generate a super-resolution image. The specific process of generating the super-resolution image is similar to that shown in the foregoing steps G1-G3, and is not repeated here.

HR(H,W)＝Conv(HR_MOBILE(H,W)，HR_CLOUD(H,W))

the "HR _ MOBILE (H, W)" represents an image obtained by the terminal device (including the enlarged image and the first image), "HR _ CLOUD (H, W)" represents a high-definition image generated by the CLOUD computing device system, and "HR (H, W)" represents a super-resolution image.

In the embodiment of the application, a super-resolution image processing device deployed in a terminal device identifies an acquired low-resolution image by using a first network model, and determines image blocks with rich details and image blocks with poor details. For the image blocks with rich details, performing super-resolution processing by using a super-resolution network model; and for the image blocks with the less abundant details, performing super-resolution processing on the image blocks with the abundant details after amplification processing. Thereby effectively reducing the amount of calculation. And the energy consumption of the terminal equipment is reduced. For image blocks with rich details, the similar image blocks are further determined by using an XOR matching algorithm, so that the calculated amount is reduced and the definition of the super-resolution image is improved on the premise of improving the matching precision of the similar image blocks. And finally, performing super-resolution processing by using various different reference images, thereby effectively improving the definition of the super-resolution image. Specifically, referring to fig. 8b, fig. 8b is a schematic diagram of a simulation experiment in the embodiment of the present application, and a peak signal to noise ratio (PSNR) is a most popular and most widely used image objective evaluation index at present. "Ours (plus similarity block)" proposes a super-resolution image processing method for the present application, and it can be seen that a high PSNR is still maintained in the case of a large reduction in the amount of parameters (reduction in the amount of computation). And the PSNR is based on the error between corresponding pixel points, i.e. on the error-sensitive image quality evaluation. The index does not consider the visual characteristics of human eyes (human eyes have higher sensitivity to contrast difference with lower spatial frequency and higher sensitivity to brightness contrast difference than chroma, and the perception result of one area is influenced by the surrounding adjacent areas), so that the situation that the evaluation result is inconsistent with the subjective feeling of people often occurs. Therefore, please refer to fig. 8c and 8d, in which fig. 8c is a schematic diagram of a calculation result of an interpolation algorithm, and fig. 8d is a schematic diagram of a calculation result of a super-resolution image processing method according to an embodiment of the present application. Fig. 8c and 8d show different calculation results generated by processing the same low-resolution image by using the interpolation algorithm (Bicubic) and the super-resolution image processing method proposed in the embodiment of the present application. The method can be seen visually that the image generated by the super-resolution image processing method provided by the embodiment of the application has clearer texture details and no negative effect is introduced. Please refer to fig. 8e, fig. 8e is a schematic diagram of a simulation experiment in an embodiment of the present application. Fig. 8e shows the amount of computation saved by the super-resolution image processing method proposed in the embodiment of the present application in different scenes compared to the interpolation algorithm (Bicubic). Under different scenes, the calculation amount can be reduced by 20-60%. It should be noted that this is only one possible simulation experiment result, and other simulation experiment results may exist according to the difference of actual hardware, which is not limited herein.

Based on the foregoing embodiment, for the first network model, the super-resolution image processing apparatus may also generate a first training set to train the first network model. Specifically, please refer to fig. 9, fig. 9 is a schematic flowchart illustrating a process of generating a training set according to an embodiment of the present application.

901. An image set is acquired.

In step 901, the super-resolution image processing apparatus acquires various images, and forms an image set from these images. The set of images includes: pictures with different texture features, such as animals, sky, human faces or buildings, are collected from the Internet and the published data set, and different types of pictures are mixed in equal proportion to obtain a training set. The source is data sets such as "DIV 2K" or "Timofte 91 images", or images obtained by a search engine, etc. The super-resolution image processing device can also be used for generating an image feature data set corresponding to the detail-rich image block in the process of processing the image.

902. And filtering the image set by using a low-pass filter to generate a first sub-training set.

In step 902, a low-pass filter is used to perform filtering processing on the image set, that is, image files with less details and smoother in the image set are deleted, and a first sub-training set is generated. In an alternative implementation, the images in the first subset of training sets carry a detail-rich label (label) that identifies that the images have rich image feature information. The tag may be manually marked. The number of image feature information included in the images in the first subset of training sets is greater than the number of image feature information included in the non-detail-rich image blocks.

903. And performing data augmentation processing on the first sub-training set to generate a second sub-training set.

In step 903, the first sub-training set is subjected to data augmentation to generate a second sub-training set. Specifically, the data augmentation process includes: image inversion, image rotation, image reduction, and image stretching. The data augmentation process further includes: clipping, translation, affine, perspective, gaussian noise, non-uniform light, motion blur, random color fill, and the like. One or more of the data augmentation processes are selected to process the image files in the first subset. The data expansion process may be performed on the same image file, or may be performed on a plurality of image files, and is not limited herein.

904. A first training set is generated according to the first sub-training set and the second sub-training set.

In step 904, the super-resolution image processing apparatus generates a first training set from the first sub-training set and the second sub-training set.

In the embodiment of the application, the first network model is trained by generating the first training set, so that the precision of the first network model can be effectively improved.

The scheme provided by the embodiment of the application is mainly introduced in the aspect of a method. It is to be understood that the super-resolution image processing apparatus includes hardware structures and/or software modules corresponding to the respective functions in order to realize the functions. Those of skill in the art will readily appreciate that the various illustrative modules and algorithm steps described in connection with the embodiments disclosed herein may be implemented as hardware or combinations of hardware and computer software. Whether a function is performed as hardware or computer software drives hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The embodiment of the present application may divide the functional blocks of the super-resolution image processing apparatus according to the above-described method examples, for example, each functional block may be divided for each function, or two or more functions may be integrated in one generation block 1001. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. It should be noted that, in the embodiment of the present application, the division of the module is schematic, and is only one logic function division, and there may be another division manner in actual implementation.

Referring to fig. 10, a super-resolution image processing apparatus 1000 in the present application will be described in detail, and fig. 10 is a schematic diagram of an embodiment of the super-resolution image processing apparatus 1000 in the present application. The super-resolution image processing apparatus 1000 includes:

the super-resolution image processing apparatus 1000 includes a generation module 1001, a determination module 1002, and an acquisition module 1003:

a generating module 1001 configured to generate a detail-rich image block and a detail-poor image block according to a low-resolution image, where sizes of the detail-rich image block and the detail-poor image block are smaller than that of the low-resolution image, and a number of image feature information included in the detail-rich image block is greater than a number of image feature information included in the detail-poor image block;

a determining module 1002, configured to determine a similar image block according to the detail-rich image block generated by the generating module 1001, where a similarity between image feature information included in the similar image block and image feature information included in the detail-rich image block is greater than a first threshold;

the generating module 1001 is further configured to perform super-resolution processing on the similar image block and the low-resolution image determined by the determining module 1002 to generate a super-resolution image, where the similar image block is used as a reference map of the low-resolution image.

In some embodiments of the present application, the generating module 1001 is specifically configured to generate the image block set according to the low-resolution image, where the image block set includes at least one low-resolution image block;

the determining module 1002 is specifically configured to process the low-resolution image block generated by the generating module 1001, and determine the detail-rich image block and the detail-poor image block;

the determining module 1002 is specifically configured to perform two-class processing on the first convolution data set to determine image blocks with rich details and image blocks with poor details.

In some embodiments of the present application, the generating module 1001 is specifically configured to determine, according to the detail-rich image block, an image feature dataset corresponding to the detail-rich image block;

the generating module 1001 is specifically configured to perform binarization processing on the determined image feature data set to obtain similarity between any two image blocks in the detail-rich image block;

the determining module 1002 is specifically configured to determine the similar image block when the similarity between any two image blocks is greater than the first threshold.

In some embodiments of the present application, the similarity of any two image blocks satisfies:

In some embodiments of the present application, the determining module 1002 is specifically configured to determine a position of the low-resolution image in the video when the low-resolution image is a frame in the video;

the determining module 1002 is specifically configured to determine a second image frame according to the position, where the second image frame is an adjacent frame of the low-resolution image in the video;

the determining module 1002 is specifically configured to determine, according to the determining module 1002, that the image block in the second image frame corresponding to the detail-rich image block is the similar image block.

In some embodiments of the present application, the generating module 1001 is specifically configured to perform image swapping on the detail-rich image block generated by the generating module 1001 and the similar image block determined by the determining module 1002 to generate a first swapped image;

the generating module 1001 is specifically configured to determine a feature map of a similar image block according to the similar image block determined by the determining module 1002, perform image exchange processing on the feature map of the similar image block and the feature map of the low-resolution image block, and generate a second exchanged image, where the feature map is used to indicate image feature information of the image block;

the generating module 1001 is specifically configured to perform super-resolution processing on the first exchanged image, the second exchanged image, and the low-resolution image through the first super-resolution network model to generate a first image;

the generating module 1001 is specifically configured to generate the super-resolution image according to the first image generated by the generating module 1001.

In some embodiments of the present application, the super-resolution image processing apparatus 1000 further includes an acquisition module 1003;

the obtaining module 1003 is configured to obtain a high-definition image, where a resolution of the high-definition image is greater than a resolution of the low-resolution image;

the generating module 1001 is specifically configured to perform an amplification process on the detail-poor image block generated by the generating module 1001 to generate an amplified image, where the amplification process includes a bicubic interpolation process;

the generating module 1001 is specifically configured to generate the super-resolution image by performing super-resolution processing on the high-definition image, the enlarged image, and the first image acquired by the acquiring module 1003, wherein the high-definition image and the enlarged image serve as reference images of the first image.

In some embodiments of the present application, the high-definition image is from a remote device, and the high-definition image is an image generated by super-resolution processing on the low-resolution image by the remote device.

In some embodiments of the present application, the high definition image is from a preset high definition gallery.

In some embodiments of the present application, the determining module 1002 is further configured to determine the high-definition image in the prefabricated high-definition gallery according to the first image generated by the generating module 1001, where a similarity between the high-definition image and the first image is greater than a first threshold.

In some embodiments of the present application,

the obtaining module 1003 is further configured to obtain an image set;

the generating module 1001 is further configured to perform filtering processing on the image set acquired by the acquiring module 1003 by using a low-pass filter, so as to generate a first sub-training set;

the generating module 1001 is further configured to perform data augmentation processing on the first sub-training set generated by the generating module 1001 to generate a second sub-training set, where the data augmentation processing includes image inversion, image rotation, image reduction, and image stretching;

the generating module 1001 is further configured to generate a first training set according to the first sub-training set and the second sub-training set, where the first training set is used to train the first network model, and the first network model is used to generate image blocks with rich details and image blocks with poor details.

In some embodiments of the present application, the obtaining module 1003 may perform step 401 in the embodiment shown in fig. 4 a; the generation module 1001 may perform

steps

402 and 404 in the embodiment shown in fig. 4 a; the generation module 1001 may also perform step 407 in the embodiment shown in fig. 4 a; the determination module 1002 may perform

steps

405 and 406 in the embodiment shown in fig. 4 a.

As is apparent from the foregoing description of the embodiments, in the embodiments of the present application, the super-resolution image processing apparatus 1000 identifies an acquired low-resolution image using a first network model, and determines detail-rich image blocks and detail-poor image blocks. For the image blocks with rich details, performing super-resolution processing by using a super-resolution network model; and for the image blocks with the less abundant details, performing super-resolution processing on the image blocks with the abundant details after amplification processing. Thereby effectively reducing the amount of calculation. The power consumption of a computing device in which the super-resolution image processing apparatus is deployed is reduced. For image blocks with rich details, the similar image blocks are further determined by using an XOR matching algorithm, so that the calculated amount is reduced and the definition of the super-resolution image is improved on the premise of improving the matching precision of the similar image blocks. And finally, performing super-resolution processing by using various different reference images, thereby effectively improving the definition of the super-resolution image.

Referring to fig. 11, fig. 11 is a schematic structural diagram of a computing device provided in an embodiment of the present application, and a super-resolution image processing apparatus 1000 described in the embodiment corresponding to fig. 10 may be disposed on the computing device 1100, so as to implement the function of the super-resolution image processing apparatus in the embodiment corresponding to fig. 10, specifically, the computing device 1100 may be one computing device in a cloud computing device system, a terminal device, or an edge computing device system. It is noted that the super-resolution image processing apparatus 1000 may be deployed on the computing device 1100 to implement the functions implemented by the aforementioned super-resolution image processing apparatus. Computing device 1100 may vary widely in configuration or performance and may include one or more Central Processing Units (CPUs) 1122 (e.g., one or more processors) and memory 1132, one or more storage media 1130 (e.g., one or more mass storage devices) storing application programs 1142 or data 1144. Memory 1132 and storage media 1130 may be, among other things, transient storage or persistent storage. The program stored on the storage medium 1130 may include one or more modules (not shown), each of which may include a sequence of instruction operations for the computing device. Still further, central processor 1122 may be provided in communication with storage medium 1130 to execute a series of instruction operations on storage medium 1130 on computing device 1100.

The computing device 1100 may also include one or more power supplies 1126, one or more wired or wireless network interfaces 1150, one or more input-output interfaces 1158, and/or one or more operating systems 1141, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, and so forth.

In the embodiment of the present application, the central processor 1122 is configured to execute the super-resolution image processing method described above.

It should be noted that, the specific manner in which the cpu 1122 executes the above steps is based on the same concept as that of the above method embodiments in the present application, and the technical effect thereof is the same as that of the above method embodiments in the present application, and specific contents may refer to the description of the above method embodiments in the present application, and are not repeated herein.

It should be noted that the processor in the embodiments of the present application may be an integrated circuit chip having signal processing capability. In implementation, the steps of the above method embodiments may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software. The processor may be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor.

It will be appreciated that the memory in the embodiments of the subject application can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory. The non-volatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), or a flash Memory. Volatile Memory can be Random Access Memory (RAM), which acts as external cache Memory. By way of example, but not limitation, many forms of RAM are available, such as Static random access memory (Static RAM, SRAM), Dynamic Random Access Memory (DRAM), Synchronous Dynamic random access memory (Synchronous DRAM, SDRAM), Double Data Rate Synchronous Dynamic random access memory (DDR SDRAM), Enhanced Synchronous SDRAM (ESDRAM), Synchronous link SDRAM (SLDRAM), and Direct Rambus RAM (DR RAM). It should be noted that the memory of the systems and methods described herein is intended to comprise, without being limited to, these and any other suitable types of memory.

An embodiment of the present application further provides a computer program product, which when run on a computer, causes the computer to execute the steps performed by the super-resolution image processing apparatus in the method as described in the foregoing embodiment.

An embodiment of the present application also provides a computer-readable storage medium in which a program for performing super-resolution image processing is stored, which, when run on a computer, causes the computer to perform the steps performed by the super-resolution image processing apparatus in the method as described in the foregoing embodiment.

An embodiment of the present application further provides a chip, where the chip includes: a processing unit, which may be for example a processor, and a communication unit, which may be for example an input/output interface, a pin or a circuit, etc. The processing unit can execute the computer execution instructions stored in the storage unit to make the chip in the execution device execute the method for constructing the training set described in the above embodiment. Alternatively, the storage unit is a storage unit in the chip, such as a register, a cache, and the like, and the storage unit may also be a storage unit located outside the chip in the super-resolution image processing apparatus, such as a read-only memory (ROM) or another type of static storage device that can store static information and instructions, a Random Access Memory (RAM), and the like.

Specifically, referring to fig. 12, fig. 12 is a schematic structural diagram of a chip provided in the embodiment of the present application, where the chip may be represented as a neural network processor NPU 1200, and the NPU 1200 is mounted on a main CPU (Host CPU) as a coprocessor, and the Host CPU allocates tasks. The core portion of the NPU is an arithmetic circuit 1203, and the controller 1204 controls the arithmetic circuit 1203 to extract matrix data in the memory and perform multiplication.

In some implementations, the arithmetic circuitry 1203 internally includes multiple processing units (PEs). In some implementations, the operational circuitry 1203 is a two-dimensional systolic array. The arithmetic circuit 1203 may also be a one-dimensional systolic array or other electronic circuit capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuitry 1203 is a general-purpose matrix processor.

For example, assume that there is an input matrix A, a weight matrix B, and an output matrix C. The arithmetic circuit fetches the data corresponding to matrix B from the weight memory 1202 and buffers each PE in the arithmetic circuit. The arithmetic circuit takes the matrix a data from the input memory 1201 and performs matrix operation with the matrix B, and partial results or final results of the obtained matrix are stored in an accumulator (accumulator) 1208.

The unified memory 1206 is used for storing input data and output data. The weight data directly passes through a Memory Access Controller (DMAC) 1205, and the DMAC is transferred to the weight Memory 1202. The input data is also carried into the unified memory 1206 by the DMAC.

The BIU is a Bus Interface Unit 1210 for the interaction of the AXI Bus with the DMAC and an Instruction Fetch memory (IFB) 1209.

A Bus Interface Unit 1210(Bus Interface Unit, BIU for short) is used for the instruction fetch memory 1209 to fetch instructions from the external memory, and is also used for the storage Unit access controller 1205 to fetch the original data of the input matrix a or the weight matrix B from the external memory.

The DMAC is mainly used to transfer input data in the external memory DDR to the unified memory 1206 or to transfer weight data into the weight memory 1202 or to transfer input data into the input memory 1201.

The vector calculation unit 1207 includes a plurality of operation processing units, and performs further processing on the output of the operation circuit, such as vector multiplication, vector addition, exponential operation, logarithmic operation, magnitude comparison, and the like, if necessary. The method is mainly used for non-convolution/full-connection layer network calculation in the neural network, such as Batch Normalization, pixel-level summation, up-sampling of a feature plane and the like.

In some implementations, the vector calculation unit 1207 can store the processed output vector to the unified memory 1206. For example, the vector calculation unit 1207 may apply a linear function and/or a nonlinear function to the output of the operation circuit 1203, for example, linear interpolation is performed on the feature planes extracted by the convolution layer, and further, for example, a vector of accumulated values is used to generate an activation value. In some implementations, the vector calculation unit 1207 generates normalized values, pixel-level summed values, or both. In some implementations, the vector of processed outputs can be used as activation inputs to arithmetic circuitry 1203, e.g., for use in subsequent layers in a neural network.

An instruction fetch buffer (issue fetch buffer)1209 connected to the controller 1204, configured to store instructions used by the controller 1204;

the unified memory 1206, the input memory 1201, the weight memory 1202, and the instruction fetch memory 1209 are all On-Chip memories. The external memory is private to the NPU hardware architecture.

The operations of the layers in the super-resolution network models shown in fig. 4a to 8a can be performed by the operation circuit 1203 or the vector calculation unit 1207.

Wherein any of the aforementioned processors may be a general purpose central processing unit, a microprocessor, an ASIC, or one or more integrated circuits configured to control the execution of the programs of the method of the first aspect.

It should be noted that the above-described embodiments of the apparatus are merely schematic, where the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. In addition, in the drawings of the embodiments of the apparatus provided in the present application, the connection relationship between the modules indicates that there is a communication connection therebetween, and may be implemented as one or more communication buses or signal lines.

Through the above description of the embodiments, those skilled in the art will clearly understand that the present application can be implemented by software plus necessary general-purpose hardware, and certainly can also be implemented by special-purpose hardware including special-purpose integrated circuits, special-purpose CPUs, special-purpose memories, special-purpose components and the like. Generally, functions performed by computer programs can be easily implemented by corresponding hardware, and specific hardware structures for implementing the same functions may be various, such as analog circuits, digital circuits, or dedicated circuits. However, for the present application, the implementation of a software program is more preferable. Based on such understanding, the technical solutions of the present application may be substantially embodied in the form of a software product, which is stored in a readable storage medium, such as a floppy disk, a usb disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk of a computer, and includes several instructions for causing a computer device to execute the method according to the embodiments of the present application.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product.

The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions can be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another, for example, the computer instructions can be transmitted from one website site, computer, super-resolution image processing apparatus, computing device, or data center to another website site, computer, super-resolution image processing apparatus, computing device, or data center by wire (e.g., coaxial cable, optical fiber, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that a computer can store or a data storage device, such as a training device, a data center, etc., that incorporates one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

It should be appreciated that reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present application. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. It should be understood that, in the various embodiments of the present application, the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

Additionally, the terms "system" and "network" are often used interchangeably herein. The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.

It should be understood that in the embodiment of the present application, "B corresponding to a" means that B is associated with a, from which B can be determined. It should also be understood that determining B from a does not mean determining B from a alone, but may be determined from a and/or other information.

Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the foregoing description for the purpose of illustrating clearly the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a unit is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method of the embodiments of the present application.

In short, the above description is only a preferred embodiment of the present disclosure, and is not intended to limit the scope of the present disclosure. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A super-resolution image processing method, comprising:

generating detail-rich image blocks and detail-poor image blocks according to a low-resolution image, wherein the sizes of the detail-rich image blocks and the detail-poor image blocks are smaller than that of the low-resolution image, and the number of image characteristic information included in the detail-rich image blocks is larger than that of the detail-poor image blocks;

determining similar image blocks according to the detail-rich image blocks, wherein the similarity between the image characteristic information included in the similar image blocks and the image characteristic information included in the detail-rich image blocks is greater than a first threshold;

and performing super-resolution processing on the similar image blocks and the low-resolution image to generate a super-resolution image, wherein the similar image blocks are used as a reference image of the low-resolution image.

2. The method of claim 1, wherein the generating the detail-rich image blocks and the detail-poor image blocks from the low resolution image comprises:

generating an image block set according to the low-resolution image;

performing convolution processing on the image blocks in the image block set to generate a first convolution data set;

and carrying out secondary classification processing on the first convolution data set, and determining the detail-rich image blocks and the detail-poor image blocks.

3. The method according to claim 2, wherein said determining the similar image blocks from the detail-rich image blocks comprises:

determining an image characteristic data set corresponding to the detail-rich image blocks according to the detail-rich image blocks;

carrying out binarization processing on the determined image characteristic data set to obtain the similarity of any two image blocks in the detail-rich image blocks;

and when the similarity of any two image blocks is greater than the first threshold, determining the similar image blocks.

4. The method according to claim 3, wherein the similarity between any two image blocks satisfies the following condition:

5. The method according to claim 1, wherein when the low resolution image is a frame in a video, the determining the similar image blocks from the detail-rich image blocks comprises:

determining a location of the low resolution image in the video;

determining a second image frame according to the position, wherein the second image frame is an adjacent frame of the low-resolution image in the video;

and determining the image block at the position corresponding to the detail-rich image block in the second image frame as the similar image block.

6. The method according to any one of claims 3-5, wherein the performing super-resolution processing on the similar image blocks and the low-resolution image to generate the super-resolution image comprises:

performing image exchange processing on the detail-rich image blocks and the similar image blocks to generate a first exchange image;

determining a feature map of the similar image block according to the similar image block, and performing image exchange processing on the feature map of the similar image block and the feature map of the low-resolution image block to generate a second exchanged image, wherein the feature map is used for indicating image feature information of the image block;

performing super-resolution processing on the first exchanged image, the second exchanged image and the low-resolution image to generate a first image;

generating the super-resolution image from the first image.

7. The method of claim 6, wherein the generating the super-resolution image from the first image comprises:

acquiring a high-definition image, wherein the resolution of the high-definition image is greater than that of the low-resolution image;

amplifying the image blocks with the poor details to generate an amplified image, wherein the amplifying comprises bicubic interpolation;

and performing super-resolution processing on the high-definition image, the enlarged image and the first image to generate the super-resolution image, wherein the high-definition image and the enlarged image serve as reference images of the first image.

8. The method of claim 7, wherein the high-definition image is from a remote device, and the high-definition image is an image generated by the remote device by performing super-resolution processing on the low-resolution image.

9. The method of claim 7, wherein the high definition images are from a preset high definition gallery comprising at least one of the high definition images.

10. The method of claim 9, wherein prior to the acquiring the high definition image, the method further comprises:

and determining the high-definition images in the prefabricated high-definition gallery according to the first image, wherein the similarity between the high-definition images and the first image is greater than the first threshold value.

11. The method of claim 1, wherein prior to generating detail-rich image blocks and detail-poor image blocks from a low resolution image, the method further comprises:

acquiring an image set;

filtering the image set by using a low-pass filter to generate a first sub-training set, wherein the number of image feature information included in the images in the first sub-training set is greater than the number of image feature information included in the detail-poor image blocks;

performing data augmentation processing on the first sub-training set to generate a second sub-training set, wherein the data augmentation processing comprises image inversion, image rotation, image reduction and image stretching;

and generating a first training set according to the first sub-training set and the second sub-training set, wherein the first training set is used for training a first network model, and the first network model is used for generating the image blocks with rich details and the image blocks with poor details.

12. The method according to any one of claims 1 to 11, wherein the image characteristic information comprises edge information of the image, contour information of the image, brightness information of the image, and/or color information of the image.

13. A super-resolution image processing apparatus, comprising:

the generating module is used for generating detailed image-rich image blocks and detailed image-poor image blocks according to the low-resolution image, wherein the sizes of the detailed image-rich image blocks and the detailed image-poor image blocks are smaller than that of the low-resolution image, and the number of image feature information included in the detailed image-rich image blocks is larger than that of the image feature information included in the detailed image-poor image blocks;

the generating module is further configured to perform super-resolution processing on the similar image blocks and the low-resolution image to generate a super-resolution image, where the similar image blocks are used as a reference map of the low-resolution image.

14. The apparatus of claim 13,

the generating module is specifically configured to generate an image block set according to the low-resolution image;

the determining module is specifically configured to perform secondary classification processing on the first convolution data set, and determine the detail-rich image blocks and the detail-poor image blocks.

15. The apparatus of claim 14,

the generation module is specifically used for determining an image feature data set corresponding to the detail-rich image block according to the detail-rich image block;

the generating module is specifically configured to perform binarization processing on the determined image feature data set to obtain similarity of any two image blocks in the detail-rich image blocks;

the determining module is specifically configured to determine the similar image blocks when the similarity of any two image blocks is greater than the first threshold.

16. The apparatus according to claim 15, wherein the similarity between any two image blocks satisfies:

17. The apparatus of claim 13,

the determining module is specifically configured to determine a position of the low-resolution image in the video when the low-resolution image is a frame in the video;

18. The apparatus of any one of claims 15-17,

the generating module is specifically configured to perform image exchange processing on the detail-rich image block and the similar image block to generate a first exchange image;

the generating module is specifically configured to perform super-resolution processing on the first exchanged image, the second exchanged image, and the low-resolution image to generate a first image;

19. The apparatus of claim 18, wherein the super-resolution image processing apparatus further comprises an acquisition module;

the generation module is specifically configured to perform amplification processing on the image blocks with poor details to generate an amplified image, where the amplification processing includes bicubic interpolation processing;

the generating module is specifically configured to perform super-resolution processing on the high-definition image, the enlarged image, and the first image to generate the super-resolution image, where the high-definition image and the enlarged image serve as reference images of the first image.

20. The apparatus of claim 19, wherein the high-definition image is from a remote device, and wherein the high-definition image is an image generated by the remote device by performing super-resolution processing on the low-resolution image.

21. The apparatus of claim 19, wherein the high definition images are from a preset high definition gallery comprising at least one of the high definition images.

22. The apparatus of claim 21,

the determining module is further configured to determine the high-definition image in the prefabricated high-definition gallery according to the first image, and similarity between the high-definition image and the first image is greater than the first threshold.

23. The apparatus of claim 13,

the acquisition module is further used for acquiring an image set;

the generation module is further configured to perform data augmentation processing on the first sub-training set to generate a second sub-training set, where the data augmentation processing includes image inversion, image rotation, image reduction, and image stretching;

the generation module is further configured to generate a first training set according to the first sub-training set and the second sub-training set, where the first training set is used to train a first network model, and the first network model is used to generate the detail-rich image blocks and the detail-poor image blocks.

24. The apparatus according to any of claims 13-23, wherein the image characteristic information comprises edge information of the image, contour information of the image, brightness information of the image, and/or color information of the image.

25. A computing device comprising a memory and a processor,

the memory to store computer instructions;

the processor executes the computer instructions stored by the memory to perform the method of any of the above claims 1-12.

26. A computer readable storage medium having computer instructions stored thereon which, when executed by a computing device, cause the computing device to perform the method of any of claims 1 to 12.

27. A computer program product comprising instructions which, when run on a computer or processor, cause the computer or processor to carry out the method of any one of claims 1 to 12.