CN113763249A

CN113763249A - Text image super-resolution reconstruction method and related equipment thereof

Info

Publication number: CN113763249A
Application number: CN202111061974.7A
Authority: CN
Inventors: 郑喜民; 翟尤; 舒畅; 陈又新
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2021-09-10
Filing date: 2021-09-10
Publication date: 2021-12-07
Also published as: WO2023035531A1

Abstract

The embodiment of the application belongs to the technical field of artificial intelligence, is applied to the field of intelligent medical treatment, and relates to a text image super-resolution reconstruction method and related equipment thereof.A low-resolution picture is input into a scene text recognition model to obtain a text position and text content information; generating a text mask based on the text position information and the text content information, and up-sampling the text mask to obtain a target mask; inputting the low-resolution picture and the target mask into a countermeasure network to obtain a discrimination result, and calculating the discrimination accuracy based on the discrimination result; calculating a loss function based on the low-resolution picture and the target mask until the loss function is converged, and judging that the accuracy is lower than an accuracy threshold value to obtain a trained confrontation network; and inputting the received low-resolution picture to be converted into the trained confrontation network to obtain a target super-resolution picture. The trained countermeasure network may be stored in a blockchain. The method and the device ensure the quality of super-resolution reconstruction of the text image.

Description

Text image super-resolution reconstruction method and related equipment thereof

Technical Field

The application relates to the technical field of artificial intelligence, in particular to a text image super-resolution reconstruction method and related equipment.

Background

The super-resolution reconstruction means that for any given low-resolution picture, a corresponding high-resolution picture is generated through a convolutional neural network, and details and textures in the picture are reserved and restored as much as possible. The super-resolution reconstruction technology plays a good role in promoting the development of relevant fields such as image classification, segmentation, tracking, defogging and the like, and plays an important role in the development of neural networks.

However, the text picture is different from a natural scene, the text content has a fixed shape and clear edges, and the reconstruction requirement is higher. For ordinary pictures, most scenes in the picture are natural, and optionally, it is easy to convert a low-resolution picture into a high-resolution picture. For texts in a scene, if distortion, color mutation or character edge fusion and other scene fusion occur in a reconstructed picture and are fuzzy, the quality of the reconstructed picture can be obviously reduced.

Disclosure of Invention

The embodiment of the application aims to provide a text image super-resolution reconstruction method and related equipment thereof, so that the quality of super-resolution reconstruction of a text image is guaranteed.

In order to solve the above technical problem, an embodiment of the present application provides a text image super-resolution reconstruction method, which adopts the following technical solutions:

a text image super-resolution reconstruction method comprises the following steps:

receiving a low-resolution picture and a corresponding high-resolution picture, inputting the low-resolution picture into a pre-trained scene text recognition model, and obtaining output text position information and text content information;

generating a text mask based on the text position information and the text content information, and performing up-sampling on the text mask to obtain a target mask;

inputting the low-resolution picture and the target mask into a preset generation layer of a countermeasure network to obtain an output super-resolution picture;

inputting the super-resolution picture and the high-resolution picture into a discrimination layer of the countermeasure network at the same time, obtaining an output discrimination result, and calculating discrimination accuracy based on the discrimination result;

calculating a loss function of the countermeasure network based on the low-resolution picture and the target mask until the loss function is converged and the judgment accuracy is lower than an accuracy threshold, and obtaining the trained countermeasure network;

and receiving a low-resolution picture to be converted, and inputting the low-resolution picture to be converted into the trained countermeasure network to obtain an output target super-resolution picture.

Further, the step of generating a text mask based on the text position information and the text content information includes:

correcting the text position information based on the text content information to obtain target text position information;

generating the text mask based on the target text position information.

Further, the step of calculating the loss function of the countermeasure network based on the low resolution picture and the target mask includes:

calculating a content loss function for the countermeasure network based on the low resolution picture, the content loss function characterized by:

wherein the content of the first and second substances,

in order to be a function of the content loss,

to said high resolutionValue of pixel point of rate picture at (x, y) position, G_θG(I^LR)_x，yThe values of the pixel points of the super-resolution picture at the (x, y) position, rW and rH are the width and length of the super-resolution picture, respectively, r²WH is the total number of pixel points of the super-resolution picture.

calculating a countermeasure loss function for the countermeasure network based on the low resolution picture, the countermeasure loss function characterized by:

wherein the content of the first and second substances,

as a function of said antagonistic loss, G_θG(I^LR) As the super-resolution picture, D_θDAnd M is the total number of the super-resolution pictures as the judgment layer, and M represents the number of the super-resolution pictures.

calculating a regularization loss function of the countermeasure network based on the low resolution picture, the regularization loss function characterized by:

wherein the content of the first and second substances,

for the regularization loss function, G_θG(I^LR)_x，yFor the value of the pixel point of the super-resolution picture at the (x, y) position, rW and rH are the width and length of the target mask respectively, r²WH is the total number of pixels in the target mask, | | | | represents a norm,

the gradient is indicated.

calculating a text perception loss function of the countermeasure network based on the low resolution picture, the text perception loss function characterized by:

wherein, l^TRFor the text perception loss function, N is the total number of pixel points at the position where the text exists,

for the purpose of the mask of the said object,

and obtaining the super-resolution picture.

Further, the step of upsampling the text mask to obtain a target mask includes:

and performing multiple-time up-sampling on the text mask to obtain the target mask.

In order to solve the above technical problem, an embodiment of the present application further provides a text image super-resolution reconstruction apparatus, which adopts the following technical solutions:

a text image super-resolution reconstruction apparatus includes:

the receiving module is used for receiving a low-resolution picture and a corresponding high-resolution picture, inputting the low-resolution picture into a pre-trained scene text recognition model, and obtaining output text position information and text content information;

the up-sampling module is used for generating a text mask based on the text position information and the text content information and up-sampling the text mask to obtain a target mask;

the generation module is used for inputting the low-resolution picture and the target mask into a preset generation layer of a countermeasure network to obtain an output super-resolution picture;

the judging module is used for simultaneously inputting the super-resolution picture and the high-resolution picture into a judging layer of the countermeasure network to obtain an output judging result and calculating judging accuracy based on the judging result;

the calculation module is used for calculating a loss function of the confrontation network based on the low-resolution picture and the target mask until the loss function is converged and the judgment accuracy is lower than an accuracy threshold, and obtaining the trained confrontation network;

and the obtaining module is used for receiving the low-resolution picture to be converted, inputting the low-resolution picture to be converted into the trained confrontation network and obtaining an output target super-resolution picture.

In order to solve the above technical problem, an embodiment of the present application further provides a computer device, which adopts the following technical solutions:

a computer device comprising a memory having computer readable instructions stored therein and a processor, the processor implementing the steps of the text image super-resolution reconstruction method described above when executing the computer readable instructions.

In order to solve the above technical problem, an embodiment of the present application further provides a computer-readable storage medium, which adopts the following technical solutions:

a computer readable storage medium having computer readable instructions stored thereon, which when executed by a processor, implement the steps of the above-mentioned text image super-resolution reconstruction method.

Compared with the prior art, the embodiment of the application mainly has the following beneficial effects:

according to the method and the device, the text position and the text content information of the low-resolution picture are obtained through the received low-resolution picture, the text mask is generated based on the text position information and the text content information, the position information and the content information of the text are considered in the generation of the text mask, and then the boundary between the text and the surrounding image in the picture can be determined, so that characters in the subsequently generated super-resolution picture are clear, and the quality of the reconstructed picture is remarkably improved. The text mask is subjected to up-sampling and amplification, so that the resolution of the text mask is improved, and the super-resolution picture can be conveniently generated subsequently. And obtaining the trained confrontation network through the confrontation training of the generation layer and the discrimination layer in the confrontation network, and using the trained confrontation network to generate a target super-resolution picture with higher quality.

Drawings

In order to more clearly illustrate the solution of the present application, the drawings needed for describing the embodiments of the present application will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and that other drawings can be obtained by those skilled in the art without inventive effort.

FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;

FIG. 2 is a flow diagram of one embodiment of a text image super resolution reconstruction method according to the present application;

FIG. 3 is a schematic structural diagram of an embodiment of a super-resolution text image reconstruction apparatus according to the present application;

FIG. 4 is a schematic block diagram of one embodiment of a computer device according to the present application.

Reference numerals: 200. a computer device; 201. a memory; 202. a processor; 203. a network interface; 300. a text image super-resolution reconstruction device; 301. a receiving module; 302. an upsampling module; 303. a generation module; 304. a discrimination module; 305. a calculation module; 306. a module is obtained.

Detailed Description

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "including" and "having," and any variations thereof, in the description and claims of this application and the description of the above figures are intended to cover non-exclusive inclusions. The terms "first," "second," and the like in the description and claims of this application or in the above-described drawings are used for distinguishing between different objects and not for describing a particular order.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The

terminal devices

101, 102, 103 may have various communication client applications installed thereon, such as a web browser application, a shopping application, a search application, an instant messaging tool, a mailbox client, social platform software, and the like.

The

terminal devices

101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, mpeg compression standard Audio Layer 3), MP4 players (Moving Picture Experts Group Audio Layer IV, mpeg compression standard Audio Layer 4), laptop portable computers, desktop computers, and the like.

The server 105 may be a server providing various services, such as a background server providing support for pages displayed on the

terminal devices

101, 102, 103.

It should be noted that the text image super-resolution reconstruction method provided in the embodiments of the present application is generally executed by a server/terminal device, and accordingly, the text image super-resolution reconstruction apparatus is generally disposed in the server/terminal device.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continuing reference to FIG. 2, a flow diagram of one embodiment of a method for super-resolution reconstruction of text images according to the present application is shown. The text image super-resolution reconstruction method comprises the following steps:

s1: receiving a low-resolution picture and a corresponding high-resolution picture, inputting the low-resolution picture into a pre-trained scene text recognition model, and obtaining output text position information and text content information.

In this embodiment, the size of the Low resolution picture (LRimage) is W × H. And inputting the low-resolution picture into a scene text recognition model to obtain the position and the content of the scene text. The scene text recognition model of the application is as follows: the text recognition model ASTER (ASTER: An attentional scene text recognizer with flexible recognition) is trained in advance.

In this embodiment, an electronic device (e.g., the server/terminal device shown in fig. 1) on which the text image super-resolution reconstruction method operates may receive the low-resolution picture and the corresponding high-resolution picture through a wired connection manner or a wireless connection manner. It should be noted that the wireless connection means may include, but is not limited to, a 3G/4G connection, a WiFi connection, a bluetooth connection, a WiMAX connection, a Zigbee connection, a uwb (ultra wideband) connection, and other wireless connection means now known or developed in the future.

S2: and generating a text mask based on the text position information and the text content information, and performing up-sampling on the text mask to obtain a target mask.

In the present embodiment, a text mask (textmask) that emphasizes only a text portion is generated based on the text position information and the text content information, and the size of the text mask is the same as that of the low resolution picture. In the text mask, the pixel points where the text exists are marked as 1, the pixel points where the text does not exist are marked as 0, namely, a two-dimensional mask is obtained, and when the size of the text mask is H x W, the two-dimensional mask is the text mask of the low-resolution picture. The text mask is subjected to upsampling (upsampling) to obtain a new text mask, namely a target mask, the size of which is rW × rH, and the size of the target mask is the same as that of the generated high-resolution picture at this time, wherein for the received high-resolution picture, the size of the picture needs to be adjusted to be consistent with that of the target mask, so that subsequent calculation is facilitated. The target mask is used for subsequent supervision of the generation result of the generated layer, i.e. the super-resolution picture. According to the method and the device, the high-resolution picture does not need to be marked, and the scene text recognition part operation is completed.

Specifically, the step of generating a text mask based on the text position information and the text content information includes:

generating the text mask based on the target text position information.

In this embodiment, a text mask is generated based on text position information and the text content information. In the picture, some places are not clear or the recognition result of the position is wrong, and the computer recognizes the text content information to generate a mask with higher accuracy. For example, if the content is not considered, the output mask may be "good", whereas the content is considered for the network, the output is "good".

In addition, as another embodiment of the present application, the step of upsampling the text mask to obtain a target mask includes:

In the embodiment, the text mask is subjected to 5-time upsampling, the text mask is amplified by 5 times, the resolution of the text mask is improved, and the generated super-resolution picture is 5 times larger than the low-resolution picture.

S3: and inputting the low-resolution picture and the target mask into a preset generation layer of a countermeasure network to obtain an output super-resolution picture.

In this embodiment, after the scene text recognition, the computer generates a super-resolution picture through a generation layer of a countermeasure network (GAN): the low-resolution picture and the target mask are simultaneously input into a generation layer (Generator) of the generation countermeasure network, and the generation layer (Generator) generates a Super resolution picture (SRimage).

S4: and simultaneously inputting the super-resolution picture and the high-resolution picture into a discrimination layer of the countermeasure network to obtain an output discrimination result, and calculating the discrimination accuracy based on the discrimination result.

In this embodiment, the super-resolution picture and the High resolution picture (High resolution picture) are simultaneously input into the discrimination layer (Discriminator), and the discrimination layer outputs the discrimination result, that is, outputs the super-resolution picture or outputs the High resolution picture, for example, the discrimination layer outputs 0 or 1, where 0 represents "picture is generated picture (super-resolution picture)" and 1 represents picture is real picture (High resolution picture). According to the method, through the antagonistic training of the generation layer and the discrimination layer, as the super-resolution picture generated by the generation layer is more and more similar to the high-resolution picture under the natural scene and is more and more difficult to distinguish, when the accuracy output by the discrimination layer is lower than the accuracy threshold, whether the discrimination layer is difficult to distinguish the real picture or the super-resolution picture generated by the generation layer is determined, and the super-resolution picture generated by the generation layer is high in quality and is similar to the real picture, namely, the training target is completed and the method is used in practical application. The calculation of the accuracy rate is that the discrimination result output by the discrimination layer in a preset time period is the ratio of the correct number to the total discrimination number.

S5: and calculating a loss function of the countermeasure network based on the low-resolution picture and the target mask until the loss function is converged and the judgment accuracy is lower than an accuracy threshold, and obtaining the trained countermeasure network.

In this embodiment, the loss function mainly used in generating the picture generated by the countermeasure network includes a content loss function (contentloss), an countermeasure loss function (adaptive loss), a regularization loss function (regularization loss), and a text perceptual loss (text perceptual loss) designed for a text mask. And when the loss function and the judgment accuracy are lower than the accuracy threshold, the completion of the training of the confrontation network is determined, and the confrontation network with better performance is obtained.

Specifically, the step of calculating the loss function of the countermeasure network based on the low resolution picture and the target mask includes:

wherein the content of the first and second substances,

in order to be a function of the content loss,

for the value of the pixel point of the high resolution picture at the (x, y) position, G_θG(I^LR)_x，yThe values of the pixel points of the super-resolution picture at the (x, y) position, rW and rH are the width and length of the super-resolution picture, respectively, r²WH is the super-resolution pictureThe total number of pixels.

In this embodiment, the content loss function is calculated as the mean square error, and the super resolution picture and the high resolution picture are rW and rH in width and length, respectively. The sum of the difference values of pixels at all positions of a super-resolution picture and a high-resolution picture with the width of rW and the length of rH is calculated, and the sum is divided by the number of pixel points to serve as the text perception loss. The loss between the super-resolution picture and the high-resolution picture is calculated.

As another embodiment of the present application, the step of calculating the loss function of the countermeasure network based on the low resolution picture and the target mask includes:

wherein the content of the first and second substances,

In the present embodiment, the countermeasure loss requirement discrimination layer D successfully discriminates the super-resolution picture generated by the generation layer G from the natural high-resolution picture input thereto. Through the confrontation training of the generation layer and the discrimination layer, the quality of the super-resolution picture generated by the network is gradually improved. And M is the total number of super-resolution pictures input into the judgment layer.

Further, as another embodiment of the present application, the step of calculating the loss function of the countermeasure network based on the low resolution picture and the target mask includes:

wherein the content of the first and second substances,

the gradient is indicated.

In this embodiment, by adding the regularization loss function, overfitting of the network is prevented, and convergence of the overall loss function is accelerated.

for the purpose of the mask of the said object,

and obtaining the super-resolution picture.

In the present embodiment, a target mask is calculated

Position and generation layer of existing Chinese textAnd generating pixel value difference of the corresponding position of the picture, wherein N represents the total number of pixel points at the position of the text. And summing all the difference values, and dividing by N to obtain the text perception function. Through the text perception function, the generation layer can generate clearer text when a new picture is constructed. The method comprises the steps of marking position pixels of texts in the mask as 1 and marking position pixels of the texts in the mask as 0, and generating the target mask after upsampling. The target mask supervises the generation result of the generation layer through a text perception loss function, so that only the text is emphasized.

S6: and receiving a low-resolution picture to be converted, and inputting the low-resolution picture to be converted into the trained countermeasure network to obtain an output target super-resolution picture.

In the embodiment, according to the trained confrontation network, a target super-resolution picture with higher quality can be generated, and the clear and complete text information in the picture is ensured.

It is emphasized that, to further ensure the privacy and security of the trained countermeasure network, the trained countermeasure network may also be stored in a node of a blockchain.

The block chain referred by the application is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.

The embodiment of the application can acquire and process related data based on an artificial intelligence technology. Among them, Artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result.

The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

The method and the device can be applied to the field of intelligent medical treatment and are used for recovering low-resolution pictures in the field of medical treatment, and therefore construction of a smart city is promoted.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware associated with computer readable instructions, which can be stored in a computer readable storage medium, and when executed, can include processes of the embodiments of the methods described above. The storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a Random Access Memory (RAM).

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

With further reference to fig. 3, as an implementation of the method shown in fig. 2, the present application provides an embodiment of a text image super-resolution reconstruction apparatus, which corresponds to the embodiment of the method shown in fig. 2, and which can be applied in various electronic devices.

As shown in fig. 3, the text image super-resolution reconstruction apparatus 300 according to the present embodiment includes: a receiving module 301, an upsampling module 302, a generating module 303, a discriminating module 304, a calculating module 305 and an obtaining module 306. Wherein: the receiving module 301 is configured to receive a low-resolution picture and a corresponding high-resolution picture, input the low-resolution picture into a pre-trained scene text recognition model, and obtain output text position information and text content information; an upsampling module 302, configured to generate a text mask based on the text position information and the text content information, and perform upsampling on the text mask to obtain a target mask; a generating module 303, configured to input the low-resolution picture and the target mask into a preset generation layer of a countermeasure network, and obtain an output super-resolution picture; a discrimination module 304, configured to input the super-resolution picture and the high-resolution picture into a discrimination layer of the countermeasure network at the same time, obtain an output discrimination result, and calculate a discrimination accuracy based on the discrimination result; a calculating module 305, configured to calculate a loss function of the countermeasure network based on the low-resolution picture and the target mask until the loss function converges and the determination accuracy is lower than an accuracy threshold, to obtain a trained countermeasure network; the obtaining module 306 is configured to receive a low-resolution picture to be converted, input the low-resolution picture to be converted into a trained countermeasure network, and obtain an output target super-resolution picture.

In the embodiment, the text position and the text content information of the low-resolution picture are obtained through the received low-resolution picture, the text mask is generated based on the text position information and the text content information, the position information and the content information of the text are considered in the generation of the text mask, and then the boundary between the text in the picture and the surrounding image can be defined, so that characters in the subsequently generated super-resolution picture are clear, and the quality of the reconstructed picture is remarkably improved. The text mask is subjected to up-sampling and amplification, so that the resolution of the text mask is improved, and the super-resolution picture can be conveniently generated subsequently. And obtaining the trained confrontation network through the confrontation training of the generation layer and the discrimination layer in the confrontation network, and using the trained confrontation network to generate a target super-resolution picture with higher quality.

The up-sampling module 302 comprises a correction submodule and a generation submodule, wherein the correction submodule is used for correcting the text position information based on the text content information to obtain target text position information; the generation submodule is used for generating the text mask based on the target text position information.

In some optional implementations of the present embodiment, the upsampling module 302 is further configured to: and performing multiple-time up-sampling on the text mask to obtain the target mask.

In some optional implementations of this embodiment, the calculating module 305 is further configured to: calculating a content loss function for the countermeasure network based on the low resolution picture, the content loss function characterized by:

wherein the content of the first and second substances,

in order to be a function of the content loss,

for the value of the pixel point of the high resolution picture at the (x, y) position, G_θG(I^LR)_x，yThe values of the pixel points of the super-resolution picture at the (x, y) position, rW and rH are the width and length of the super-resolution picture, respectively, r²WH is the total number of pixel points of the super-resolution picture.

In some optional implementations of this embodiment, the calculating module 305 is further configured to: calculating a countermeasure loss function for the countermeasure network based on the low resolution picture, the countermeasure loss function characterized by:

wherein the content of the first and second substances,

In some optional implementations of this embodiment, the calculating module 305 is further configured to: calculating a regularization loss function of the countermeasure network based on the low resolution picture, the regularization loss function characterized by:

wherein the content of the first and second substances,

the gradient is indicated.

In some optional implementations of this embodiment, the calculating module 305 is further configured to: calculating a text perception loss function of the countermeasure network based on the low resolution picture, the text perception loss function characterized by:

for the purpose of the mask of the said object,

and obtaining the super-resolution picture.

In order to solve the technical problem, an embodiment of the present application further provides a computer device. Referring to fig. 4, fig. 4 is a block diagram of a basic structure of a computer device according to the present embodiment.

The computer device 200 comprises a memory 201, a processor 202, a network interface 203 communicatively connected to each other via a system bus. It is noted that only computer device 200 having

components

201 and 203 is shown, but it is understood that not all of the illustrated components are required and that more or fewer components may alternatively be implemented. As will be understood by those skilled in the art, the computer device is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.

The computer device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The computer equipment can carry out man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch panel or voice control equipment and the like.

The memory 201 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the storage 201 may be an internal storage unit of the computer device 200, such as a hard disk or a memory of the computer device 200. In other embodiments, the memory 201 may also be an external storage device of the computer device 200, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), or the like, provided on the computer device 200. Of course, the memory 201 may also include both internal and external storage devices of the computer device 200. In this embodiment, the memory 201 is generally used for storing an operating system installed in the computer device 200 and various types of application software, such as computer readable instructions of a text image super-resolution reconstruction method. Further, the memory 201 may also be used to temporarily store various types of data that have been output or are to be output.

The processor 202 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 202 is generally operative to control overall operation of the computer device 200. In this embodiment, the processor 202 is configured to execute computer readable instructions stored in the memory 201 or process data, for example, execute computer readable instructions of the text image super-resolution reconstruction method.

The network interface 203 may comprise a wireless network interface or a wired network interface, and the network interface 203 is generally used for establishing communication connection between the computer device 200 and other electronic devices.

In this embodiment, the generation of the text mask takes into account the position information and the content information of the text, so that the boundary between the text and the surrounding image in the picture can be defined, and the quality of the reconstructed picture is remarkably improved. And obtaining the trained confrontation network through the confrontation training of the generation layer and the discrimination layer in the confrontation network, and using the trained confrontation network to generate a target super-resolution picture with higher quality.

The present application further provides another embodiment, which is to provide a computer-readable storage medium storing computer-readable instructions executable by at least one processor to cause the at least one processor to perform the steps of the text image super-resolution reconstruction method as described above.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.

It is to be understood that the above-described embodiments are merely illustrative of some, but not restrictive, of the broad invention, and that the appended drawings illustrate preferred embodiments of the invention and do not limit the scope of the invention. This application is capable of embodiments in many different forms and is provided for the purpose of enabling a thorough understanding of the disclosure of the application. Although the present application has been described in detail with reference to the foregoing embodiments, it will be apparent to one skilled in the art that the present application may be practiced without modification or with equivalents of some of the features described in the foregoing embodiments. All equivalent structures made by using the contents of the specification and the drawings of the present application are directly or indirectly applied to other related technical fields and are within the protection scope of the present application.

Claims

1. A text image super-resolution reconstruction method is characterized by comprising the following steps:

2. The text image super-resolution reconstruction method according to claim 1, wherein the step of generating a text mask based on the text position information and the text content information comprises:

generating the text mask based on the target text position information.

3. The method for super-resolution reconstruction of text images according to claim 1, wherein the step of calculating the loss function of the countermeasure network based on the low-resolution picture and the target mask comprises:

wherein the content of the first and second substances,

in order to be a function of the content loss,

4. The method for super-resolution reconstruction of text images according to claim 1, wherein the step of calculating the loss function of the countermeasure network based on the low-resolution picture and the target mask comprises:

wherein the content of the first and second substances,

5. The method for super-resolution reconstruction of text images according to claim 1, wherein the step of calculating the loss function of the countermeasure network based on the low-resolution picture and the target mask comprises:

wherein the content of the first and second substances,

the gradient is indicated.

6. The method for super-resolution reconstruction of text images according to claim 1, wherein the step of calculating the loss function of the countermeasure network based on the low-resolution picture and the target mask comprises:

for the purpose of the mask of the said object,

and obtaining the super-resolution picture.

7. The text image super-resolution reconstruction method according to claim 1, wherein the step of upsampling the text mask to obtain a target mask comprises:

8. A text image super-resolution reconstruction device is characterized by comprising:

9. A computer device comprising a memory having computer-readable instructions stored therein and a processor that when executed performs the steps of the text image super-resolution reconstruction method of any one of claims 1 to 7.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon computer-readable instructions which, when executed by a processor, implement the steps of the text image super-resolution reconstruction method according to any one of claims 1 to 7.