CN113763249A - Text image super-resolution reconstruction method and related equipment thereof - Google Patents

Text image super-resolution reconstruction method and related equipment thereof Download PDF

Info

Publication number
CN113763249A
CN113763249A CN202111061974.7A CN202111061974A CN113763249A CN 113763249 A CN113763249 A CN 113763249A CN 202111061974 A CN202111061974 A CN 202111061974A CN 113763249 A CN113763249 A CN 113763249A
Authority
CN
China
Prior art keywords
text
resolution picture
super
resolution
low
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111061974.7A
Other languages
Chinese (zh)
Inventor
郑喜民
翟尤
舒畅
陈又新
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202111061974.7A priority Critical patent/CN113763249A/en
Publication of CN113763249A publication Critical patent/CN113763249A/en
Priority to PCT/CN2022/071883 priority patent/WO2023035531A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4053Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the application belongs to the technical field of artificial intelligence, is applied to the field of intelligent medical treatment, and relates to a text image super-resolution reconstruction method and related equipment thereof.A low-resolution picture is input into a scene text recognition model to obtain a text position and text content information; generating a text mask based on the text position information and the text content information, and up-sampling the text mask to obtain a target mask; inputting the low-resolution picture and the target mask into a countermeasure network to obtain a discrimination result, and calculating the discrimination accuracy based on the discrimination result; calculating a loss function based on the low-resolution picture and the target mask until the loss function is converged, and judging that the accuracy is lower than an accuracy threshold value to obtain a trained confrontation network; and inputting the received low-resolution picture to be converted into the trained confrontation network to obtain a target super-resolution picture. The trained countermeasure network may be stored in a blockchain. The method and the device ensure the quality of super-resolution reconstruction of the text image.

Description

Text image super-resolution reconstruction method and related equipment thereof
Technical Field
The application relates to the technical field of artificial intelligence, in particular to a text image super-resolution reconstruction method and related equipment.
Background
The super-resolution reconstruction means that for any given low-resolution picture, a corresponding high-resolution picture is generated through a convolutional neural network, and details and textures in the picture are reserved and restored as much as possible. The super-resolution reconstruction technology plays a good role in promoting the development of relevant fields such as image classification, segmentation, tracking, defogging and the like, and plays an important role in the development of neural networks.
However, the text picture is different from a natural scene, the text content has a fixed shape and clear edges, and the reconstruction requirement is higher. For ordinary pictures, most scenes in the picture are natural, and optionally, it is easy to convert a low-resolution picture into a high-resolution picture. For texts in a scene, if distortion, color mutation or character edge fusion and other scene fusion occur in a reconstructed picture and are fuzzy, the quality of the reconstructed picture can be obviously reduced.
Disclosure of Invention
The embodiment of the application aims to provide a text image super-resolution reconstruction method and related equipment thereof, so that the quality of super-resolution reconstruction of a text image is guaranteed.
In order to solve the above technical problem, an embodiment of the present application provides a text image super-resolution reconstruction method, which adopts the following technical solutions:
a text image super-resolution reconstruction method comprises the following steps:
receiving a low-resolution picture and a corresponding high-resolution picture, inputting the low-resolution picture into a pre-trained scene text recognition model, and obtaining output text position information and text content information;
generating a text mask based on the text position information and the text content information, and performing up-sampling on the text mask to obtain a target mask;
inputting the low-resolution picture and the target mask into a preset generation layer of a countermeasure network to obtain an output super-resolution picture;
inputting the super-resolution picture and the high-resolution picture into a discrimination layer of the countermeasure network at the same time, obtaining an output discrimination result, and calculating discrimination accuracy based on the discrimination result;
calculating a loss function of the countermeasure network based on the low-resolution picture and the target mask until the loss function is converged and the judgment accuracy is lower than an accuracy threshold, and obtaining the trained countermeasure network;
and receiving a low-resolution picture to be converted, and inputting the low-resolution picture to be converted into the trained countermeasure network to obtain an output target super-resolution picture.
Further, the step of generating a text mask based on the text position information and the text content information includes:
correcting the text position information based on the text content information to obtain target text position information;
generating the text mask based on the target text position information.
Further, the step of calculating the loss function of the countermeasure network based on the low resolution picture and the target mask includes:
calculating a content loss function for the countermeasure network based on the low resolution picture, the content loss function characterized by:
Figure BDA0003256995680000021
wherein the content of the first and second substances,
Figure BDA0003256995680000022
in order to be a function of the content loss,
Figure BDA0003256995680000023
to said high resolutionValue of pixel point of rate picture at (x, y) position, GθG(ILR)x,yThe values of the pixel points of the super-resolution picture at the (x, y) position, rW and rH are the width and length of the super-resolution picture, respectively, r2WH is the total number of pixel points of the super-resolution picture.
Further, the step of calculating the loss function of the countermeasure network based on the low resolution picture and the target mask includes:
calculating a countermeasure loss function for the countermeasure network based on the low resolution picture, the countermeasure loss function characterized by:
Figure BDA0003256995680000031
wherein the content of the first and second substances,
Figure BDA0003256995680000032
as a function of said antagonistic loss, GθG(ILR) As the super-resolution picture, DθDAnd M is the total number of the super-resolution pictures as the judgment layer, and M represents the number of the super-resolution pictures.
Further, the step of calculating the loss function of the countermeasure network based on the low resolution picture and the target mask includes:
calculating a regularization loss function of the countermeasure network based on the low resolution picture, the regularization loss function characterized by:
Figure BDA0003256995680000033
wherein the content of the first and second substances,
Figure BDA0003256995680000034
for the regularization loss function, GθG(ILR)x,yFor the value of the pixel point of the super-resolution picture at the (x, y) position, rW and rH are the width and length of the target mask respectively, r2WH is the total number of pixels in the target mask, | | | | represents a norm,
Figure BDA0003256995680000035
the gradient is indicated.
Further, the step of calculating the loss function of the countermeasure network based on the low resolution picture and the target mask includes:
calculating a text perception loss function of the countermeasure network based on the low resolution picture, the text perception loss function characterized by:
Figure BDA0003256995680000036
wherein, lTRFor the text perception loss function, N is the total number of pixel points at the position where the text exists,
Figure BDA0003256995680000037
for the purpose of the mask of the said object,
Figure BDA0003256995680000038
and obtaining the super-resolution picture.
Further, the step of upsampling the text mask to obtain a target mask includes:
and performing multiple-time up-sampling on the text mask to obtain the target mask.
In order to solve the above technical problem, an embodiment of the present application further provides a text image super-resolution reconstruction apparatus, which adopts the following technical solutions:
a text image super-resolution reconstruction apparatus includes:
the receiving module is used for receiving a low-resolution picture and a corresponding high-resolution picture, inputting the low-resolution picture into a pre-trained scene text recognition model, and obtaining output text position information and text content information;
the up-sampling module is used for generating a text mask based on the text position information and the text content information and up-sampling the text mask to obtain a target mask;
the generation module is used for inputting the low-resolution picture and the target mask into a preset generation layer of a countermeasure network to obtain an output super-resolution picture;
the judging module is used for simultaneously inputting the super-resolution picture and the high-resolution picture into a judging layer of the countermeasure network to obtain an output judging result and calculating judging accuracy based on the judging result;
the calculation module is used for calculating a loss function of the confrontation network based on the low-resolution picture and the target mask until the loss function is converged and the judgment accuracy is lower than an accuracy threshold, and obtaining the trained confrontation network;
and the obtaining module is used for receiving the low-resolution picture to be converted, inputting the low-resolution picture to be converted into the trained confrontation network and obtaining an output target super-resolution picture.
In order to solve the above technical problem, an embodiment of the present application further provides a computer device, which adopts the following technical solutions:
a computer device comprising a memory having computer readable instructions stored therein and a processor, the processor implementing the steps of the text image super-resolution reconstruction method described above when executing the computer readable instructions.
In order to solve the above technical problem, an embodiment of the present application further provides a computer-readable storage medium, which adopts the following technical solutions:
a computer readable storage medium having computer readable instructions stored thereon, which when executed by a processor, implement the steps of the above-mentioned text image super-resolution reconstruction method.
Compared with the prior art, the embodiment of the application mainly has the following beneficial effects:
according to the method and the device, the text position and the text content information of the low-resolution picture are obtained through the received low-resolution picture, the text mask is generated based on the text position information and the text content information, the position information and the content information of the text are considered in the generation of the text mask, and then the boundary between the text and the surrounding image in the picture can be determined, so that characters in the subsequently generated super-resolution picture are clear, and the quality of the reconstructed picture is remarkably improved. The text mask is subjected to up-sampling and amplification, so that the resolution of the text mask is improved, and the super-resolution picture can be conveniently generated subsequently. And obtaining the trained confrontation network through the confrontation training of the generation layer and the discrimination layer in the confrontation network, and using the trained confrontation network to generate a target super-resolution picture with higher quality.
Drawings
In order to more clearly illustrate the solution of the present application, the drawings needed for describing the embodiments of the present application will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and that other drawings can be obtained by those skilled in the art without inventive effort.
FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;
FIG. 2 is a flow diagram of one embodiment of a text image super resolution reconstruction method according to the present application;
FIG. 3 is a schematic structural diagram of an embodiment of a super-resolution text image reconstruction apparatus according to the present application;
FIG. 4 is a schematic block diagram of one embodiment of a computer device according to the present application.
Reference numerals: 200. a computer device; 201. a memory; 202. a processor; 203. a network interface; 300. a text image super-resolution reconstruction device; 301. a receiving module; 302. an upsampling module; 303. a generation module; 304. a discrimination module; 305. a calculation module; 306. a module is obtained.
Detailed Description
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "including" and "having," and any variations thereof, in the description and claims of this application and the description of the above figures are intended to cover non-exclusive inclusions. The terms "first," "second," and the like in the description and claims of this application or in the above-described drawings are used for distinguishing between different objects and not for describing a particular order.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings.
As shown in fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The terminal devices 101, 102, 103 may have various communication client applications installed thereon, such as a web browser application, a shopping application, a search application, an instant messaging tool, a mailbox client, social platform software, and the like.
The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, mpeg compression standard Audio Layer 3), MP4 players (Moving Picture Experts Group Audio Layer IV, mpeg compression standard Audio Layer 4), laptop portable computers, desktop computers, and the like.
The server 105 may be a server providing various services, such as a background server providing support for pages displayed on the terminal devices 101, 102, 103.
It should be noted that the text image super-resolution reconstruction method provided in the embodiments of the present application is generally executed by a server/terminal device, and accordingly, the text image super-resolution reconstruction apparatus is generally disposed in the server/terminal device.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
With continuing reference to FIG. 2, a flow diagram of one embodiment of a method for super-resolution reconstruction of text images according to the present application is shown. The text image super-resolution reconstruction method comprises the following steps:
s1: receiving a low-resolution picture and a corresponding high-resolution picture, inputting the low-resolution picture into a pre-trained scene text recognition model, and obtaining output text position information and text content information.
In this embodiment, the size of the Low resolution picture (LRimage) is W × H. And inputting the low-resolution picture into a scene text recognition model to obtain the position and the content of the scene text. The scene text recognition model of the application is as follows: the text recognition model ASTER (ASTER: An attentional scene text recognizer with flexible recognition) is trained in advance.
In this embodiment, an electronic device (e.g., the server/terminal device shown in fig. 1) on which the text image super-resolution reconstruction method operates may receive the low-resolution picture and the corresponding high-resolution picture through a wired connection manner or a wireless connection manner. It should be noted that the wireless connection means may include, but is not limited to, a 3G/4G connection, a WiFi connection, a bluetooth connection, a WiMAX connection, a Zigbee connection, a uwb (ultra wideband) connection, and other wireless connection means now known or developed in the future.
S2: and generating a text mask based on the text position information and the text content information, and performing up-sampling on the text mask to obtain a target mask.
In the present embodiment, a text mask (textmask) that emphasizes only a text portion is generated based on the text position information and the text content information, and the size of the text mask is the same as that of the low resolution picture. In the text mask, the pixel points where the text exists are marked as 1, the pixel points where the text does not exist are marked as 0, namely, a two-dimensional mask is obtained, and when the size of the text mask is H x W, the two-dimensional mask is the text mask of the low-resolution picture. The text mask is subjected to upsampling (upsampling) to obtain a new text mask, namely a target mask, the size of which is rW × rH, and the size of the target mask is the same as that of the generated high-resolution picture at this time, wherein for the received high-resolution picture, the size of the picture needs to be adjusted to be consistent with that of the target mask, so that subsequent calculation is facilitated. The target mask is used for subsequent supervision of the generation result of the generated layer, i.e. the super-resolution picture. According to the method and the device, the high-resolution picture does not need to be marked, and the scene text recognition part operation is completed.
Specifically, the step of generating a text mask based on the text position information and the text content information includes:
correcting the text position information based on the text content information to obtain target text position information;
generating the text mask based on the target text position information.
In this embodiment, a text mask is generated based on text position information and the text content information. In the picture, some places are not clear or the recognition result of the position is wrong, and the computer recognizes the text content information to generate a mask with higher accuracy. For example, if the content is not considered, the output mask may be "good", whereas the content is considered for the network, the output is "good".
In addition, as another embodiment of the present application, the step of upsampling the text mask to obtain a target mask includes:
and performing multiple-time up-sampling on the text mask to obtain the target mask.
In the embodiment, the text mask is subjected to 5-time upsampling, the text mask is amplified by 5 times, the resolution of the text mask is improved, and the generated super-resolution picture is 5 times larger than the low-resolution picture.
S3: and inputting the low-resolution picture and the target mask into a preset generation layer of a countermeasure network to obtain an output super-resolution picture.
In this embodiment, after the scene text recognition, the computer generates a super-resolution picture through a generation layer of a countermeasure network (GAN): the low-resolution picture and the target mask are simultaneously input into a generation layer (Generator) of the generation countermeasure network, and the generation layer (Generator) generates a Super resolution picture (SRimage).
S4: and simultaneously inputting the super-resolution picture and the high-resolution picture into a discrimination layer of the countermeasure network to obtain an output discrimination result, and calculating the discrimination accuracy based on the discrimination result.
In this embodiment, the super-resolution picture and the High resolution picture (High resolution picture) are simultaneously input into the discrimination layer (Discriminator), and the discrimination layer outputs the discrimination result, that is, outputs the super-resolution picture or outputs the High resolution picture, for example, the discrimination layer outputs 0 or 1, where 0 represents "picture is generated picture (super-resolution picture)" and 1 represents picture is real picture (High resolution picture). According to the method, through the antagonistic training of the generation layer and the discrimination layer, as the super-resolution picture generated by the generation layer is more and more similar to the high-resolution picture under the natural scene and is more and more difficult to distinguish, when the accuracy output by the discrimination layer is lower than the accuracy threshold, whether the discrimination layer is difficult to distinguish the real picture or the super-resolution picture generated by the generation layer is determined, and the super-resolution picture generated by the generation layer is high in quality and is similar to the real picture, namely, the training target is completed and the method is used in practical application. The calculation of the accuracy rate is that the discrimination result output by the discrimination layer in a preset time period is the ratio of the correct number to the total discrimination number.
S5: and calculating a loss function of the countermeasure network based on the low-resolution picture and the target mask until the loss function is converged and the judgment accuracy is lower than an accuracy threshold, and obtaining the trained countermeasure network.
In this embodiment, the loss function mainly used in generating the picture generated by the countermeasure network includes a content loss function (contentloss), an countermeasure loss function (adaptive loss), a regularization loss function (regularization loss), and a text perceptual loss (text perceptual loss) designed for a text mask. And when the loss function and the judgment accuracy are lower than the accuracy threshold, the completion of the training of the confrontation network is determined, and the confrontation network with better performance is obtained.
Specifically, the step of calculating the loss function of the countermeasure network based on the low resolution picture and the target mask includes:
calculating a content loss function for the countermeasure network based on the low resolution picture, the content loss function characterized by:
Figure BDA0003256995680000091
wherein the content of the first and second substances,
Figure BDA0003256995680000092
in order to be a function of the content loss,
Figure BDA0003256995680000093
for the value of the pixel point of the high resolution picture at the (x, y) position, GθG(ILR)x,yThe values of the pixel points of the super-resolution picture at the (x, y) position, rW and rH are the width and length of the super-resolution picture, respectively, r2WH is the super-resolution pictureThe total number of pixels.
In this embodiment, the content loss function is calculated as the mean square error, and the super resolution picture and the high resolution picture are rW and rH in width and length, respectively. The sum of the difference values of pixels at all positions of a super-resolution picture and a high-resolution picture with the width of rW and the length of rH is calculated, and the sum is divided by the number of pixel points to serve as the text perception loss. The loss between the super-resolution picture and the high-resolution picture is calculated.
As another embodiment of the present application, the step of calculating the loss function of the countermeasure network based on the low resolution picture and the target mask includes:
calculating a countermeasure loss function for the countermeasure network based on the low resolution picture, the countermeasure loss function characterized by:
Figure BDA0003256995680000101
wherein the content of the first and second substances,
Figure BDA0003256995680000102
as a function of said antagonistic loss, GθG(ILR) As the super-resolution picture, DθDAnd M is the total number of the super-resolution pictures as the judgment layer, and M represents the number of the super-resolution pictures.
In the present embodiment, the countermeasure loss requirement discrimination layer D successfully discriminates the super-resolution picture generated by the generation layer G from the natural high-resolution picture input thereto. Through the confrontation training of the generation layer and the discrimination layer, the quality of the super-resolution picture generated by the network is gradually improved. And M is the total number of super-resolution pictures input into the judgment layer.
Further, as another embodiment of the present application, the step of calculating the loss function of the countermeasure network based on the low resolution picture and the target mask includes:
calculating a regularization loss function of the countermeasure network based on the low resolution picture, the regularization loss function characterized by:
Figure BDA0003256995680000103
wherein the content of the first and second substances,
Figure BDA0003256995680000104
for the regularization loss function, GθG(ILR)x,yFor the value of the pixel point of the super-resolution picture at the (x, y) position, rW and rH are the width and length of the target mask respectively, r2WH is the total number of pixels in the target mask, | | | | represents a norm,
Figure BDA0003256995680000105
the gradient is indicated.
In this embodiment, by adding the regularization loss function, overfitting of the network is prevented, and convergence of the overall loss function is accelerated.
As another embodiment of the present application, the step of calculating the loss function of the countermeasure network based on the low resolution picture and the target mask includes:
calculating a text perception loss function of the countermeasure network based on the low resolution picture, the text perception loss function characterized by:
Figure BDA0003256995680000111
wherein, lTRFor the text perception loss function, N is the total number of pixel points at the position where the text exists,
Figure BDA0003256995680000112
for the purpose of the mask of the said object,
Figure BDA0003256995680000113
and obtaining the super-resolution picture.
In the present embodiment, a target mask is calculated
Figure BDA0003256995680000114
Position and generation layer of existing Chinese textAnd generating pixel value difference of the corresponding position of the picture, wherein N represents the total number of pixel points at the position of the text. And summing all the difference values, and dividing by N to obtain the text perception function. Through the text perception function, the generation layer can generate clearer text when a new picture is constructed. The method comprises the steps of marking position pixels of texts in the mask as 1 and marking position pixels of the texts in the mask as 0, and generating the target mask after upsampling. The target mask supervises the generation result of the generation layer through a text perception loss function, so that only the text is emphasized.
S6: and receiving a low-resolution picture to be converted, and inputting the low-resolution picture to be converted into the trained countermeasure network to obtain an output target super-resolution picture.
In the embodiment, according to the trained confrontation network, a target super-resolution picture with higher quality can be generated, and the clear and complete text information in the picture is ensured.
According to the method and the device, the text position and the text content information of the low-resolution picture are obtained through the received low-resolution picture, the text mask is generated based on the text position information and the text content information, the position information and the content information of the text are considered in the generation of the text mask, and then the boundary between the text and the surrounding image in the picture can be determined, so that characters in the subsequently generated super-resolution picture are clear, and the quality of the reconstructed picture is remarkably improved. The text mask is subjected to up-sampling and amplification, so that the resolution of the text mask is improved, and the super-resolution picture can be conveniently generated subsequently. And obtaining the trained confrontation network through the confrontation training of the generation layer and the discrimination layer in the confrontation network, and using the trained confrontation network to generate a target super-resolution picture with higher quality.
It is emphasized that, to further ensure the privacy and security of the trained countermeasure network, the trained countermeasure network may also be stored in a node of a blockchain.
The block chain referred by the application is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
The embodiment of the application can acquire and process related data based on an artificial intelligence technology. Among them, Artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result.
The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
The method and the device can be applied to the field of intelligent medical treatment and are used for recovering low-resolution pictures in the field of medical treatment, and therefore construction of a smart city is promoted.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware associated with computer readable instructions, which can be stored in a computer readable storage medium, and when executed, can include processes of the embodiments of the methods described above. The storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a Random Access Memory (RAM).
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
With further reference to fig. 3, as an implementation of the method shown in fig. 2, the present application provides an embodiment of a text image super-resolution reconstruction apparatus, which corresponds to the embodiment of the method shown in fig. 2, and which can be applied in various electronic devices.
As shown in fig. 3, the text image super-resolution reconstruction apparatus 300 according to the present embodiment includes: a receiving module 301, an upsampling module 302, a generating module 303, a discriminating module 304, a calculating module 305 and an obtaining module 306. Wherein: the receiving module 301 is configured to receive a low-resolution picture and a corresponding high-resolution picture, input the low-resolution picture into a pre-trained scene text recognition model, and obtain output text position information and text content information; an upsampling module 302, configured to generate a text mask based on the text position information and the text content information, and perform upsampling on the text mask to obtain a target mask; a generating module 303, configured to input the low-resolution picture and the target mask into a preset generation layer of a countermeasure network, and obtain an output super-resolution picture; a discrimination module 304, configured to input the super-resolution picture and the high-resolution picture into a discrimination layer of the countermeasure network at the same time, obtain an output discrimination result, and calculate a discrimination accuracy based on the discrimination result; a calculating module 305, configured to calculate a loss function of the countermeasure network based on the low-resolution picture and the target mask until the loss function converges and the determination accuracy is lower than an accuracy threshold, to obtain a trained countermeasure network; the obtaining module 306 is configured to receive a low-resolution picture to be converted, input the low-resolution picture to be converted into a trained countermeasure network, and obtain an output target super-resolution picture.
In the embodiment, the text position and the text content information of the low-resolution picture are obtained through the received low-resolution picture, the text mask is generated based on the text position information and the text content information, the position information and the content information of the text are considered in the generation of the text mask, and then the boundary between the text in the picture and the surrounding image can be defined, so that characters in the subsequently generated super-resolution picture are clear, and the quality of the reconstructed picture is remarkably improved. The text mask is subjected to up-sampling and amplification, so that the resolution of the text mask is improved, and the super-resolution picture can be conveniently generated subsequently. And obtaining the trained confrontation network through the confrontation training of the generation layer and the discrimination layer in the confrontation network, and using the trained confrontation network to generate a target super-resolution picture with higher quality.
The up-sampling module 302 comprises a correction submodule and a generation submodule, wherein the correction submodule is used for correcting the text position information based on the text content information to obtain target text position information; the generation submodule is used for generating the text mask based on the target text position information.
In some optional implementations of the present embodiment, the upsampling module 302 is further configured to: and performing multiple-time up-sampling on the text mask to obtain the target mask.
In some optional implementations of this embodiment, the calculating module 305 is further configured to: calculating a content loss function for the countermeasure network based on the low resolution picture, the content loss function characterized by:
Figure BDA0003256995680000141
wherein the content of the first and second substances,
Figure BDA0003256995680000142
in order to be a function of the content loss,
Figure BDA0003256995680000143
for the value of the pixel point of the high resolution picture at the (x, y) position, GθG(ILR)x,yThe values of the pixel points of the super-resolution picture at the (x, y) position, rW and rH are the width and length of the super-resolution picture, respectively, r2WH is the total number of pixel points of the super-resolution picture.
In some optional implementations of this embodiment, the calculating module 305 is further configured to: calculating a countermeasure loss function for the countermeasure network based on the low resolution picture, the countermeasure loss function characterized by:
Figure BDA0003256995680000144
wherein the content of the first and second substances,
Figure BDA0003256995680000145
as a function of said antagonistic loss, GθG(ILR) As the super-resolution picture, DθDAnd M is the total number of the super-resolution pictures as the judgment layer, and M represents the number of the super-resolution pictures.
In some optional implementations of this embodiment, the calculating module 305 is further configured to: calculating a regularization loss function of the countermeasure network based on the low resolution picture, the regularization loss function characterized by:
Figure BDA0003256995680000146
wherein the content of the first and second substances,
Figure BDA0003256995680000147
for the regularization loss function, GθG(ILR)x,yFor the value of the pixel point of the super-resolution picture at the (x, y) position, rW and rH are the width and length of the target mask respectively, r2WH is the total number of pixels in the target mask, | | | | represents a norm,
Figure BDA0003256995680000151
the gradient is indicated.
In some optional implementations of this embodiment, the calculating module 305 is further configured to: calculating a text perception loss function of the countermeasure network based on the low resolution picture, the text perception loss function characterized by:
Figure BDA0003256995680000152
wherein, lTRFor the text perception loss function, N is the total number of pixel points at the position where the text exists,
Figure BDA0003256995680000153
for the purpose of the mask of the said object,
Figure BDA0003256995680000154
and obtaining the super-resolution picture.
According to the method and the device, the text position and the text content information of the low-resolution picture are obtained through the received low-resolution picture, the text mask is generated based on the text position information and the text content information, the position information and the content information of the text are considered in the generation of the text mask, and then the boundary between the text and the surrounding image in the picture can be determined, so that characters in the subsequently generated super-resolution picture are clear, and the quality of the reconstructed picture is remarkably improved. The text mask is subjected to up-sampling and amplification, so that the resolution of the text mask is improved, and the super-resolution picture can be conveniently generated subsequently. And obtaining the trained confrontation network through the confrontation training of the generation layer and the discrimination layer in the confrontation network, and using the trained confrontation network to generate a target super-resolution picture with higher quality.
In order to solve the technical problem, an embodiment of the present application further provides a computer device. Referring to fig. 4, fig. 4 is a block diagram of a basic structure of a computer device according to the present embodiment.
The computer device 200 comprises a memory 201, a processor 202, a network interface 203 communicatively connected to each other via a system bus. It is noted that only computer device 200 having components 201 and 203 is shown, but it is understood that not all of the illustrated components are required and that more or fewer components may alternatively be implemented. As will be understood by those skilled in the art, the computer device is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.
The computer device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The computer equipment can carry out man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch panel or voice control equipment and the like.
The memory 201 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the storage 201 may be an internal storage unit of the computer device 200, such as a hard disk or a memory of the computer device 200. In other embodiments, the memory 201 may also be an external storage device of the computer device 200, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), or the like, provided on the computer device 200. Of course, the memory 201 may also include both internal and external storage devices of the computer device 200. In this embodiment, the memory 201 is generally used for storing an operating system installed in the computer device 200 and various types of application software, such as computer readable instructions of a text image super-resolution reconstruction method. Further, the memory 201 may also be used to temporarily store various types of data that have been output or are to be output.
The processor 202 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 202 is generally operative to control overall operation of the computer device 200. In this embodiment, the processor 202 is configured to execute computer readable instructions stored in the memory 201 or process data, for example, execute computer readable instructions of the text image super-resolution reconstruction method.
The network interface 203 may comprise a wireless network interface or a wired network interface, and the network interface 203 is generally used for establishing communication connection between the computer device 200 and other electronic devices.
In this embodiment, the generation of the text mask takes into account the position information and the content information of the text, so that the boundary between the text and the surrounding image in the picture can be defined, and the quality of the reconstructed picture is remarkably improved. And obtaining the trained confrontation network through the confrontation training of the generation layer and the discrimination layer in the confrontation network, and using the trained confrontation network to generate a target super-resolution picture with higher quality.
The present application further provides another embodiment, which is to provide a computer-readable storage medium storing computer-readable instructions executable by at least one processor to cause the at least one processor to perform the steps of the text image super-resolution reconstruction method as described above.
In this embodiment, the generation of the text mask takes into account the position information and the content information of the text, so that the boundary between the text and the surrounding image in the picture can be defined, and the quality of the reconstructed picture is remarkably improved. And obtaining the trained confrontation network through the confrontation training of the generation layer and the discrimination layer in the confrontation network, and using the trained confrontation network to generate a target super-resolution picture with higher quality.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.
It is to be understood that the above-described embodiments are merely illustrative of some, but not restrictive, of the broad invention, and that the appended drawings illustrate preferred embodiments of the invention and do not limit the scope of the invention. This application is capable of embodiments in many different forms and is provided for the purpose of enabling a thorough understanding of the disclosure of the application. Although the present application has been described in detail with reference to the foregoing embodiments, it will be apparent to one skilled in the art that the present application may be practiced without modification or with equivalents of some of the features described in the foregoing embodiments. All equivalent structures made by using the contents of the specification and the drawings of the present application are directly or indirectly applied to other related technical fields and are within the protection scope of the present application.

Claims (10)

1. A text image super-resolution reconstruction method is characterized by comprising the following steps:
receiving a low-resolution picture and a corresponding high-resolution picture, inputting the low-resolution picture into a pre-trained scene text recognition model, and obtaining output text position information and text content information;
generating a text mask based on the text position information and the text content information, and performing up-sampling on the text mask to obtain a target mask;
inputting the low-resolution picture and the target mask into a preset generation layer of a countermeasure network to obtain an output super-resolution picture;
inputting the super-resolution picture and the high-resolution picture into a discrimination layer of the countermeasure network at the same time, obtaining an output discrimination result, and calculating discrimination accuracy based on the discrimination result;
calculating a loss function of the countermeasure network based on the low-resolution picture and the target mask until the loss function is converged and the judgment accuracy is lower than an accuracy threshold, and obtaining the trained countermeasure network;
and receiving a low-resolution picture to be converted, and inputting the low-resolution picture to be converted into the trained countermeasure network to obtain an output target super-resolution picture.
2. The text image super-resolution reconstruction method according to claim 1, wherein the step of generating a text mask based on the text position information and the text content information comprises:
correcting the text position information based on the text content information to obtain target text position information;
generating the text mask based on the target text position information.
3. The method for super-resolution reconstruction of text images according to claim 1, wherein the step of calculating the loss function of the countermeasure network based on the low-resolution picture and the target mask comprises:
calculating a content loss function for the countermeasure network based on the low resolution picture, the content loss function characterized by:
Figure FDA0003256995670000011
wherein the content of the first and second substances,
Figure FDA0003256995670000012
in order to be a function of the content loss,
Figure FDA0003256995670000021
for the value of the pixel point of the high resolution picture at the (x, y) position, GθG(ILR)x,yThe values of the pixel points of the super-resolution picture at the (x, y) position, rW and rH are the width and length of the super-resolution picture, respectively, r2WH is the total number of pixel points of the super-resolution picture.
4. The method for super-resolution reconstruction of text images according to claim 1, wherein the step of calculating the loss function of the countermeasure network based on the low-resolution picture and the target mask comprises:
calculating a countermeasure loss function for the countermeasure network based on the low resolution picture, the countermeasure loss function characterized by:
Figure FDA0003256995670000022
wherein the content of the first and second substances,
Figure FDA0003256995670000023
as a function of said antagonistic loss, GθG(ILR) As the super-resolution picture, DθDAnd M is the total number of the super-resolution pictures as the judgment layer, and M represents the number of the super-resolution pictures.
5. The method for super-resolution reconstruction of text images according to claim 1, wherein the step of calculating the loss function of the countermeasure network based on the low-resolution picture and the target mask comprises:
calculating a regularization loss function of the countermeasure network based on the low resolution picture, the regularization loss function characterized by:
Figure FDA0003256995670000024
wherein the content of the first and second substances,
Figure FDA0003256995670000025
for the regularization loss function, GθG(ILR)x,yFor the value of the pixel point of the super-resolution picture at the (x, y) position, rW and rH are the width and length of the target mask respectively, r2WH is the total number of pixels in the target mask, | | | | represents a norm,
Figure FDA0003256995670000026
the gradient is indicated.
6. The method for super-resolution reconstruction of text images according to claim 1, wherein the step of calculating the loss function of the countermeasure network based on the low-resolution picture and the target mask comprises:
calculating a text perception loss function of the countermeasure network based on the low resolution picture, the text perception loss function characterized by:
Figure FDA0003256995670000031
wherein, lTRFor the text perception loss function, N is the total number of pixel points at the position where the text exists,
Figure FDA0003256995670000032
for the purpose of the mask of the said object,
Figure FDA0003256995670000033
and obtaining the super-resolution picture.
7. The text image super-resolution reconstruction method according to claim 1, wherein the step of upsampling the text mask to obtain a target mask comprises:
and performing multiple-time up-sampling on the text mask to obtain the target mask.
8. A text image super-resolution reconstruction device is characterized by comprising:
the receiving module is used for receiving a low-resolution picture and a corresponding high-resolution picture, inputting the low-resolution picture into a pre-trained scene text recognition model, and obtaining output text position information and text content information;
the up-sampling module is used for generating a text mask based on the text position information and the text content information and up-sampling the text mask to obtain a target mask;
the generation module is used for inputting the low-resolution picture and the target mask into a preset generation layer of a countermeasure network to obtain an output super-resolution picture;
the judging module is used for simultaneously inputting the super-resolution picture and the high-resolution picture into a judging layer of the countermeasure network to obtain an output judging result and calculating judging accuracy based on the judging result;
the calculation module is used for calculating a loss function of the confrontation network based on the low-resolution picture and the target mask until the loss function is converged and the judgment accuracy is lower than an accuracy threshold, and obtaining the trained confrontation network;
and the obtaining module is used for receiving the low-resolution picture to be converted, inputting the low-resolution picture to be converted into the trained confrontation network and obtaining an output target super-resolution picture.
9. A computer device comprising a memory having computer-readable instructions stored therein and a processor that when executed performs the steps of the text image super-resolution reconstruction method of any one of claims 1 to 7.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon computer-readable instructions which, when executed by a processor, implement the steps of the text image super-resolution reconstruction method according to any one of claims 1 to 7.
CN202111061974.7A 2021-09-10 2021-09-10 Text image super-resolution reconstruction method and related equipment thereof Pending CN113763249A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202111061974.7A CN113763249A (en) 2021-09-10 2021-09-10 Text image super-resolution reconstruction method and related equipment thereof
PCT/CN2022/071883 WO2023035531A1 (en) 2021-09-10 2022-01-13 Super-resolution reconstruction method for text image and related device thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111061974.7A CN113763249A (en) 2021-09-10 2021-09-10 Text image super-resolution reconstruction method and related equipment thereof

Publications (1)

Publication Number Publication Date
CN113763249A true CN113763249A (en) 2021-12-07

Family

ID=78794915

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111061974.7A Pending CN113763249A (en) 2021-09-10 2021-09-10 Text image super-resolution reconstruction method and related equipment thereof

Country Status (2)

Country Link
CN (1) CN113763249A (en)
WO (1) WO2023035531A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114172873A (en) * 2021-12-13 2022-03-11 中国平安财产保险股份有限公司 Resolution adjustment method, resolution adjustment device, server and computer-readable storage medium
WO2023035531A1 (en) * 2021-09-10 2023-03-16 平安科技(深圳)有限公司 Super-resolution reconstruction method for text image and related device thereof
CN115829837A (en) * 2022-11-15 2023-03-21 深圳市新良田科技股份有限公司 Text image super-resolution reconstruction method and system
WO2023070495A1 (en) * 2021-10-29 2023-05-04 京东方科技集团股份有限公司 Image processing method, electronic device and non-transitory computer-readable medium

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116385318B (en) * 2023-06-06 2023-10-10 湖南纵骏信息科技有限公司 Image quality enhancement method and system based on cloud desktop

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107154023B (en) * 2017-05-17 2019-11-05 电子科技大学 Based on the face super-resolution reconstruction method for generating confrontation network and sub-pix convolution
US11003865B1 (en) * 2020-05-20 2021-05-11 Google Llc Retrieval-augmented language model pre-training and fine-tuning
CN112508782B (en) * 2020-09-10 2024-04-26 浙江大华技术股份有限公司 Training method of network model, and super-resolution reconstruction method and device of face image
CN113256494B (en) * 2021-06-02 2022-11-11 同济大学 Text image super-resolution method
CN113763249A (en) * 2021-09-10 2021-12-07 平安科技(深圳)有限公司 Text image super-resolution reconstruction method and related equipment thereof

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023035531A1 (en) * 2021-09-10 2023-03-16 平安科技(深圳)有限公司 Super-resolution reconstruction method for text image and related device thereof
WO2023070495A1 (en) * 2021-10-29 2023-05-04 京东方科技集团股份有限公司 Image processing method, electronic device and non-transitory computer-readable medium
CN114172873A (en) * 2021-12-13 2022-03-11 中国平安财产保险股份有限公司 Resolution adjustment method, resolution adjustment device, server and computer-readable storage medium
CN114172873B (en) * 2021-12-13 2023-05-30 中国平安财产保险股份有限公司 Resolution adjustment method, resolution adjustment device, server and computer readable storage medium
CN115829837A (en) * 2022-11-15 2023-03-21 深圳市新良田科技股份有限公司 Text image super-resolution reconstruction method and system

Also Published As

Publication number Publication date
WO2023035531A1 (en) 2023-03-16

Similar Documents

Publication Publication Date Title
CN113763249A (en) Text image super-resolution reconstruction method and related equipment thereof
US20230081645A1 (en) Detecting forged facial images using frequency domain information and local correlation
US11436863B2 (en) Method and apparatus for outputting data
CN109858333B (en) Image processing method, image processing device, electronic equipment and computer readable medium
CN112686243A (en) Method and device for intelligently identifying picture characters, computer equipment and storage medium
CN113673519B (en) Character recognition method based on character detection model and related equipment thereof
CN110795714A (en) Identity authentication method and device, computer equipment and storage medium
CN112749695A (en) Text recognition method and device
CN112418292A (en) Image quality evaluation method and device, computer equipment and storage medium
CN113254491A (en) Information recommendation method and device, computer equipment and storage medium
CN112330331A (en) Identity verification method, device and equipment based on face recognition and storage medium
CN114241459B (en) Driver identity verification method and device, computer equipment and storage medium
CN114529574A (en) Image matting method and device based on image segmentation, computer equipment and medium
CN112016502B (en) Safety belt detection method, safety belt detection device, computer equipment and storage medium
CN112634158A (en) Face image recovery method and device, computer equipment and storage medium
CN113012075A (en) Image correction method and device, computer equipment and storage medium
CN112651399B (en) Method for detecting same-line characters in inclined image and related equipment thereof
CN115510186A (en) Instant question and answer method, device, equipment and storage medium based on intention recognition
CN114861241A (en) Anti-peeping screen method based on intelligent detection and related equipment thereof
CN112434746B (en) Pre-labeling method based on hierarchical migration learning and related equipment thereof
CN112669244A (en) Face image enhancement method and device, computer equipment and readable storage medium
CN116774973A (en) Data rendering method, device, computer equipment and storage medium
CN114241411B (en) Counting model processing method and device based on target detection and computer equipment
CN113362249B (en) Text image synthesis method, text image synthesis device, computer equipment and storage medium
CN115601235A (en) Image super-resolution network training method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination