CN113763249A - Text image super-resolution reconstruction method and related equipment thereof - Google Patents
Text image super-resolution reconstruction method and related equipment thereof Download PDFInfo
- Publication number
- CN113763249A CN113763249A CN202111061974.7A CN202111061974A CN113763249A CN 113763249 A CN113763249 A CN 113763249A CN 202111061974 A CN202111061974 A CN 202111061974A CN 113763249 A CN113763249 A CN 113763249A
- Authority
- CN
- China
- Prior art keywords
- text
- resolution picture
- super
- resolution
- low
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 48
- 238000005070 sampling Methods 0.000 claims abstract description 17
- 230000006870 function Effects 0.000 claims description 83
- 230000008447 perception Effects 0.000 claims description 16
- 239000000126 substance Substances 0.000 claims description 12
- 238000004364 calculation method Methods 0.000 claims description 6
- 230000003042 antagnostic effect Effects 0.000 claims description 5
- 238000013473 artificial intelligence Methods 0.000 abstract description 6
- 238000005516 engineering process Methods 0.000 description 10
- 238000012549 training Methods 0.000 description 10
- 238000010586 diagram Methods 0.000 description 6
- 230000003321 amplification Effects 0.000 description 4
- 238000004891 communication Methods 0.000 description 4
- 238000003199 nucleic acid amplification method Methods 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 3
- 230000006835 compression Effects 0.000 description 2
- 238000007906 compression Methods 0.000 description 2
- 238000012937 correction Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 230000004927 fusion Effects 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000035772 mutation Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4053—Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/50—Image enhancement or restoration using two or more images, e.g. averaging or subtraction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
Abstract
The embodiment of the application belongs to the technical field of artificial intelligence, is applied to the field of intelligent medical treatment, and relates to a text image super-resolution reconstruction method and related equipment thereof.A low-resolution picture is input into a scene text recognition model to obtain a text position and text content information; generating a text mask based on the text position information and the text content information, and up-sampling the text mask to obtain a target mask; inputting the low-resolution picture and the target mask into a countermeasure network to obtain a discrimination result, and calculating the discrimination accuracy based on the discrimination result; calculating a loss function based on the low-resolution picture and the target mask until the loss function is converged, and judging that the accuracy is lower than an accuracy threshold value to obtain a trained confrontation network; and inputting the received low-resolution picture to be converted into the trained confrontation network to obtain a target super-resolution picture. The trained countermeasure network may be stored in a blockchain. The method and the device ensure the quality of super-resolution reconstruction of the text image.
Description
Technical Field
The application relates to the technical field of artificial intelligence, in particular to a text image super-resolution reconstruction method and related equipment.
Background
The super-resolution reconstruction means that for any given low-resolution picture, a corresponding high-resolution picture is generated through a convolutional neural network, and details and textures in the picture are reserved and restored as much as possible. The super-resolution reconstruction technology plays a good role in promoting the development of relevant fields such as image classification, segmentation, tracking, defogging and the like, and plays an important role in the development of neural networks.
However, the text picture is different from a natural scene, the text content has a fixed shape and clear edges, and the reconstruction requirement is higher. For ordinary pictures, most scenes in the picture are natural, and optionally, it is easy to convert a low-resolution picture into a high-resolution picture. For texts in a scene, if distortion, color mutation or character edge fusion and other scene fusion occur in a reconstructed picture and are fuzzy, the quality of the reconstructed picture can be obviously reduced.
Disclosure of Invention
The embodiment of the application aims to provide a text image super-resolution reconstruction method and related equipment thereof, so that the quality of super-resolution reconstruction of a text image is guaranteed.
In order to solve the above technical problem, an embodiment of the present application provides a text image super-resolution reconstruction method, which adopts the following technical solutions:
a text image super-resolution reconstruction method comprises the following steps:
receiving a low-resolution picture and a corresponding high-resolution picture, inputting the low-resolution picture into a pre-trained scene text recognition model, and obtaining output text position information and text content information;
generating a text mask based on the text position information and the text content information, and performing up-sampling on the text mask to obtain a target mask;
inputting the low-resolution picture and the target mask into a preset generation layer of a countermeasure network to obtain an output super-resolution picture;
inputting the super-resolution picture and the high-resolution picture into a discrimination layer of the countermeasure network at the same time, obtaining an output discrimination result, and calculating discrimination accuracy based on the discrimination result;
calculating a loss function of the countermeasure network based on the low-resolution picture and the target mask until the loss function is converged and the judgment accuracy is lower than an accuracy threshold, and obtaining the trained countermeasure network;
and receiving a low-resolution picture to be converted, and inputting the low-resolution picture to be converted into the trained countermeasure network to obtain an output target super-resolution picture.
Further, the step of generating a text mask based on the text position information and the text content information includes:
correcting the text position information based on the text content information to obtain target text position information;
generating the text mask based on the target text position information.
Further, the step of calculating the loss function of the countermeasure network based on the low resolution picture and the target mask includes:
calculating a content loss function for the countermeasure network based on the low resolution picture, the content loss function characterized by:
wherein the content of the first and second substances,in order to be a function of the content loss,to said high resolutionValue of pixel point of rate picture at (x, y) position, GθG(ILR)x,yThe values of the pixel points of the super-resolution picture at the (x, y) position, rW and rH are the width and length of the super-resolution picture, respectively, r2WH is the total number of pixel points of the super-resolution picture.
Further, the step of calculating the loss function of the countermeasure network based on the low resolution picture and the target mask includes:
calculating a countermeasure loss function for the countermeasure network based on the low resolution picture, the countermeasure loss function characterized by:
wherein the content of the first and second substances,as a function of said antagonistic loss, GθG(ILR) As the super-resolution picture, DθDAnd M is the total number of the super-resolution pictures as the judgment layer, and M represents the number of the super-resolution pictures.
Further, the step of calculating the loss function of the countermeasure network based on the low resolution picture and the target mask includes:
calculating a regularization loss function of the countermeasure network based on the low resolution picture, the regularization loss function characterized by:
wherein the content of the first and second substances,for the regularization loss function, GθG(ILR)x,yFor the value of the pixel point of the super-resolution picture at the (x, y) position, rW and rH are the width and length of the target mask respectively, r2WH is the total number of pixels in the target mask, | | | | represents a norm,the gradient is indicated.
Further, the step of calculating the loss function of the countermeasure network based on the low resolution picture and the target mask includes:
calculating a text perception loss function of the countermeasure network based on the low resolution picture, the text perception loss function characterized by:
wherein, lTRFor the text perception loss function, N is the total number of pixel points at the position where the text exists,for the purpose of the mask of the said object,and obtaining the super-resolution picture.
Further, the step of upsampling the text mask to obtain a target mask includes:
and performing multiple-time up-sampling on the text mask to obtain the target mask.
In order to solve the above technical problem, an embodiment of the present application further provides a text image super-resolution reconstruction apparatus, which adopts the following technical solutions:
a text image super-resolution reconstruction apparatus includes:
the receiving module is used for receiving a low-resolution picture and a corresponding high-resolution picture, inputting the low-resolution picture into a pre-trained scene text recognition model, and obtaining output text position information and text content information;
the up-sampling module is used for generating a text mask based on the text position information and the text content information and up-sampling the text mask to obtain a target mask;
the generation module is used for inputting the low-resolution picture and the target mask into a preset generation layer of a countermeasure network to obtain an output super-resolution picture;
the judging module is used for simultaneously inputting the super-resolution picture and the high-resolution picture into a judging layer of the countermeasure network to obtain an output judging result and calculating judging accuracy based on the judging result;
the calculation module is used for calculating a loss function of the confrontation network based on the low-resolution picture and the target mask until the loss function is converged and the judgment accuracy is lower than an accuracy threshold, and obtaining the trained confrontation network;
and the obtaining module is used for receiving the low-resolution picture to be converted, inputting the low-resolution picture to be converted into the trained confrontation network and obtaining an output target super-resolution picture.
In order to solve the above technical problem, an embodiment of the present application further provides a computer device, which adopts the following technical solutions:
a computer device comprising a memory having computer readable instructions stored therein and a processor, the processor implementing the steps of the text image super-resolution reconstruction method described above when executing the computer readable instructions.
In order to solve the above technical problem, an embodiment of the present application further provides a computer-readable storage medium, which adopts the following technical solutions:
a computer readable storage medium having computer readable instructions stored thereon, which when executed by a processor, implement the steps of the above-mentioned text image super-resolution reconstruction method.
Compared with the prior art, the embodiment of the application mainly has the following beneficial effects:
according to the method and the device, the text position and the text content information of the low-resolution picture are obtained through the received low-resolution picture, the text mask is generated based on the text position information and the text content information, the position information and the content information of the text are considered in the generation of the text mask, and then the boundary between the text and the surrounding image in the picture can be determined, so that characters in the subsequently generated super-resolution picture are clear, and the quality of the reconstructed picture is remarkably improved. The text mask is subjected to up-sampling and amplification, so that the resolution of the text mask is improved, and the super-resolution picture can be conveniently generated subsequently. And obtaining the trained confrontation network through the confrontation training of the generation layer and the discrimination layer in the confrontation network, and using the trained confrontation network to generate a target super-resolution picture with higher quality.
Drawings
In order to more clearly illustrate the solution of the present application, the drawings needed for describing the embodiments of the present application will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and that other drawings can be obtained by those skilled in the art without inventive effort.
FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;
FIG. 2 is a flow diagram of one embodiment of a text image super resolution reconstruction method according to the present application;
FIG. 3 is a schematic structural diagram of an embodiment of a super-resolution text image reconstruction apparatus according to the present application;
FIG. 4 is a schematic block diagram of one embodiment of a computer device according to the present application.
Reference numerals: 200. a computer device; 201. a memory; 202. a processor; 203. a network interface; 300. a text image super-resolution reconstruction device; 301. a receiving module; 302. an upsampling module; 303. a generation module; 304. a discrimination module; 305. a calculation module; 306. a module is obtained.
Detailed Description
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "including" and "having," and any variations thereof, in the description and claims of this application and the description of the above figures are intended to cover non-exclusive inclusions. The terms "first," "second," and the like in the description and claims of this application or in the above-described drawings are used for distinguishing between different objects and not for describing a particular order.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings.
As shown in fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The terminal devices 101, 102, 103 may have various communication client applications installed thereon, such as a web browser application, a shopping application, a search application, an instant messaging tool, a mailbox client, social platform software, and the like.
The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, mpeg compression standard Audio Layer 3), MP4 players (Moving Picture Experts Group Audio Layer IV, mpeg compression standard Audio Layer 4), laptop portable computers, desktop computers, and the like.
The server 105 may be a server providing various services, such as a background server providing support for pages displayed on the terminal devices 101, 102, 103.
It should be noted that the text image super-resolution reconstruction method provided in the embodiments of the present application is generally executed by a server/terminal device, and accordingly, the text image super-resolution reconstruction apparatus is generally disposed in the server/terminal device.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
With continuing reference to FIG. 2, a flow diagram of one embodiment of a method for super-resolution reconstruction of text images according to the present application is shown. The text image super-resolution reconstruction method comprises the following steps:
s1: receiving a low-resolution picture and a corresponding high-resolution picture, inputting the low-resolution picture into a pre-trained scene text recognition model, and obtaining output text position information and text content information.
In this embodiment, the size of the Low resolution picture (LRimage) is W × H. And inputting the low-resolution picture into a scene text recognition model to obtain the position and the content of the scene text. The scene text recognition model of the application is as follows: the text recognition model ASTER (ASTER: An attentional scene text recognizer with flexible recognition) is trained in advance.
In this embodiment, an electronic device (e.g., the server/terminal device shown in fig. 1) on which the text image super-resolution reconstruction method operates may receive the low-resolution picture and the corresponding high-resolution picture through a wired connection manner or a wireless connection manner. It should be noted that the wireless connection means may include, but is not limited to, a 3G/4G connection, a WiFi connection, a bluetooth connection, a WiMAX connection, a Zigbee connection, a uwb (ultra wideband) connection, and other wireless connection means now known or developed in the future.
S2: and generating a text mask based on the text position information and the text content information, and performing up-sampling on the text mask to obtain a target mask.
In the present embodiment, a text mask (textmask) that emphasizes only a text portion is generated based on the text position information and the text content information, and the size of the text mask is the same as that of the low resolution picture. In the text mask, the pixel points where the text exists are marked as 1, the pixel points where the text does not exist are marked as 0, namely, a two-dimensional mask is obtained, and when the size of the text mask is H x W, the two-dimensional mask is the text mask of the low-resolution picture. The text mask is subjected to upsampling (upsampling) to obtain a new text mask, namely a target mask, the size of which is rW × rH, and the size of the target mask is the same as that of the generated high-resolution picture at this time, wherein for the received high-resolution picture, the size of the picture needs to be adjusted to be consistent with that of the target mask, so that subsequent calculation is facilitated. The target mask is used for subsequent supervision of the generation result of the generated layer, i.e. the super-resolution picture. According to the method and the device, the high-resolution picture does not need to be marked, and the scene text recognition part operation is completed.
Specifically, the step of generating a text mask based on the text position information and the text content information includes:
correcting the text position information based on the text content information to obtain target text position information;
generating the text mask based on the target text position information.
In this embodiment, a text mask is generated based on text position information and the text content information. In the picture, some places are not clear or the recognition result of the position is wrong, and the computer recognizes the text content information to generate a mask with higher accuracy. For example, if the content is not considered, the output mask may be "good", whereas the content is considered for the network, the output is "good".
In addition, as another embodiment of the present application, the step of upsampling the text mask to obtain a target mask includes:
and performing multiple-time up-sampling on the text mask to obtain the target mask.
In the embodiment, the text mask is subjected to 5-time upsampling, the text mask is amplified by 5 times, the resolution of the text mask is improved, and the generated super-resolution picture is 5 times larger than the low-resolution picture.
S3: and inputting the low-resolution picture and the target mask into a preset generation layer of a countermeasure network to obtain an output super-resolution picture.
In this embodiment, after the scene text recognition, the computer generates a super-resolution picture through a generation layer of a countermeasure network (GAN): the low-resolution picture and the target mask are simultaneously input into a generation layer (Generator) of the generation countermeasure network, and the generation layer (Generator) generates a Super resolution picture (SRimage).
S4: and simultaneously inputting the super-resolution picture and the high-resolution picture into a discrimination layer of the countermeasure network to obtain an output discrimination result, and calculating the discrimination accuracy based on the discrimination result.
In this embodiment, the super-resolution picture and the High resolution picture (High resolution picture) are simultaneously input into the discrimination layer (Discriminator), and the discrimination layer outputs the discrimination result, that is, outputs the super-resolution picture or outputs the High resolution picture, for example, the discrimination layer outputs 0 or 1, where 0 represents "picture is generated picture (super-resolution picture)" and 1 represents picture is real picture (High resolution picture). According to the method, through the antagonistic training of the generation layer and the discrimination layer, as the super-resolution picture generated by the generation layer is more and more similar to the high-resolution picture under the natural scene and is more and more difficult to distinguish, when the accuracy output by the discrimination layer is lower than the accuracy threshold, whether the discrimination layer is difficult to distinguish the real picture or the super-resolution picture generated by the generation layer is determined, and the super-resolution picture generated by the generation layer is high in quality and is similar to the real picture, namely, the training target is completed and the method is used in practical application. The calculation of the accuracy rate is that the discrimination result output by the discrimination layer in a preset time period is the ratio of the correct number to the total discrimination number.
S5: and calculating a loss function of the countermeasure network based on the low-resolution picture and the target mask until the loss function is converged and the judgment accuracy is lower than an accuracy threshold, and obtaining the trained countermeasure network.
In this embodiment, the loss function mainly used in generating the picture generated by the countermeasure network includes a content loss function (contentloss), an countermeasure loss function (adaptive loss), a regularization loss function (regularization loss), and a text perceptual loss (text perceptual loss) designed for a text mask. And when the loss function and the judgment accuracy are lower than the accuracy threshold, the completion of the training of the confrontation network is determined, and the confrontation network with better performance is obtained.
Specifically, the step of calculating the loss function of the countermeasure network based on the low resolution picture and the target mask includes:
calculating a content loss function for the countermeasure network based on the low resolution picture, the content loss function characterized by:
wherein the content of the first and second substances,in order to be a function of the content loss,for the value of the pixel point of the high resolution picture at the (x, y) position, GθG(ILR)x,yThe values of the pixel points of the super-resolution picture at the (x, y) position, rW and rH are the width and length of the super-resolution picture, respectively, r2WH is the super-resolution pictureThe total number of pixels.
In this embodiment, the content loss function is calculated as the mean square error, and the super resolution picture and the high resolution picture are rW and rH in width and length, respectively. The sum of the difference values of pixels at all positions of a super-resolution picture and a high-resolution picture with the width of rW and the length of rH is calculated, and the sum is divided by the number of pixel points to serve as the text perception loss. The loss between the super-resolution picture and the high-resolution picture is calculated.
As another embodiment of the present application, the step of calculating the loss function of the countermeasure network based on the low resolution picture and the target mask includes:
calculating a countermeasure loss function for the countermeasure network based on the low resolution picture, the countermeasure loss function characterized by:
wherein the content of the first and second substances,as a function of said antagonistic loss, GθG(ILR) As the super-resolution picture, DθDAnd M is the total number of the super-resolution pictures as the judgment layer, and M represents the number of the super-resolution pictures.
In the present embodiment, the countermeasure loss requirement discrimination layer D successfully discriminates the super-resolution picture generated by the generation layer G from the natural high-resolution picture input thereto. Through the confrontation training of the generation layer and the discrimination layer, the quality of the super-resolution picture generated by the network is gradually improved. And M is the total number of super-resolution pictures input into the judgment layer.
Further, as another embodiment of the present application, the step of calculating the loss function of the countermeasure network based on the low resolution picture and the target mask includes:
calculating a regularization loss function of the countermeasure network based on the low resolution picture, the regularization loss function characterized by:
wherein the content of the first and second substances,for the regularization loss function, GθG(ILR)x,yFor the value of the pixel point of the super-resolution picture at the (x, y) position, rW and rH are the width and length of the target mask respectively, r2WH is the total number of pixels in the target mask, | | | | represents a norm,the gradient is indicated.
In this embodiment, by adding the regularization loss function, overfitting of the network is prevented, and convergence of the overall loss function is accelerated.
As another embodiment of the present application, the step of calculating the loss function of the countermeasure network based on the low resolution picture and the target mask includes:
calculating a text perception loss function of the countermeasure network based on the low resolution picture, the text perception loss function characterized by:
wherein, lTRFor the text perception loss function, N is the total number of pixel points at the position where the text exists,for the purpose of the mask of the said object,and obtaining the super-resolution picture.
In the present embodiment, a target mask is calculatedPosition and generation layer of existing Chinese textAnd generating pixel value difference of the corresponding position of the picture, wherein N represents the total number of pixel points at the position of the text. And summing all the difference values, and dividing by N to obtain the text perception function. Through the text perception function, the generation layer can generate clearer text when a new picture is constructed. The method comprises the steps of marking position pixels of texts in the mask as 1 and marking position pixels of the texts in the mask as 0, and generating the target mask after upsampling. The target mask supervises the generation result of the generation layer through a text perception loss function, so that only the text is emphasized.
S6: and receiving a low-resolution picture to be converted, and inputting the low-resolution picture to be converted into the trained countermeasure network to obtain an output target super-resolution picture.
In the embodiment, according to the trained confrontation network, a target super-resolution picture with higher quality can be generated, and the clear and complete text information in the picture is ensured.
According to the method and the device, the text position and the text content information of the low-resolution picture are obtained through the received low-resolution picture, the text mask is generated based on the text position information and the text content information, the position information and the content information of the text are considered in the generation of the text mask, and then the boundary between the text and the surrounding image in the picture can be determined, so that characters in the subsequently generated super-resolution picture are clear, and the quality of the reconstructed picture is remarkably improved. The text mask is subjected to up-sampling and amplification, so that the resolution of the text mask is improved, and the super-resolution picture can be conveniently generated subsequently. And obtaining the trained confrontation network through the confrontation training of the generation layer and the discrimination layer in the confrontation network, and using the trained confrontation network to generate a target super-resolution picture with higher quality.
It is emphasized that, to further ensure the privacy and security of the trained countermeasure network, the trained countermeasure network may also be stored in a node of a blockchain.
The block chain referred by the application is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
The embodiment of the application can acquire and process related data based on an artificial intelligence technology. Among them, Artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result.
The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
The method and the device can be applied to the field of intelligent medical treatment and are used for recovering low-resolution pictures in the field of medical treatment, and therefore construction of a smart city is promoted.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware associated with computer readable instructions, which can be stored in a computer readable storage medium, and when executed, can include processes of the embodiments of the methods described above. The storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a Random Access Memory (RAM).
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
With further reference to fig. 3, as an implementation of the method shown in fig. 2, the present application provides an embodiment of a text image super-resolution reconstruction apparatus, which corresponds to the embodiment of the method shown in fig. 2, and which can be applied in various electronic devices.
As shown in fig. 3, the text image super-resolution reconstruction apparatus 300 according to the present embodiment includes: a receiving module 301, an upsampling module 302, a generating module 303, a discriminating module 304, a calculating module 305 and an obtaining module 306. Wherein: the receiving module 301 is configured to receive a low-resolution picture and a corresponding high-resolution picture, input the low-resolution picture into a pre-trained scene text recognition model, and obtain output text position information and text content information; an upsampling module 302, configured to generate a text mask based on the text position information and the text content information, and perform upsampling on the text mask to obtain a target mask; a generating module 303, configured to input the low-resolution picture and the target mask into a preset generation layer of a countermeasure network, and obtain an output super-resolution picture; a discrimination module 304, configured to input the super-resolution picture and the high-resolution picture into a discrimination layer of the countermeasure network at the same time, obtain an output discrimination result, and calculate a discrimination accuracy based on the discrimination result; a calculating module 305, configured to calculate a loss function of the countermeasure network based on the low-resolution picture and the target mask until the loss function converges and the determination accuracy is lower than an accuracy threshold, to obtain a trained countermeasure network; the obtaining module 306 is configured to receive a low-resolution picture to be converted, input the low-resolution picture to be converted into a trained countermeasure network, and obtain an output target super-resolution picture.
In the embodiment, the text position and the text content information of the low-resolution picture are obtained through the received low-resolution picture, the text mask is generated based on the text position information and the text content information, the position information and the content information of the text are considered in the generation of the text mask, and then the boundary between the text in the picture and the surrounding image can be defined, so that characters in the subsequently generated super-resolution picture are clear, and the quality of the reconstructed picture is remarkably improved. The text mask is subjected to up-sampling and amplification, so that the resolution of the text mask is improved, and the super-resolution picture can be conveniently generated subsequently. And obtaining the trained confrontation network through the confrontation training of the generation layer and the discrimination layer in the confrontation network, and using the trained confrontation network to generate a target super-resolution picture with higher quality.
The up-sampling module 302 comprises a correction submodule and a generation submodule, wherein the correction submodule is used for correcting the text position information based on the text content information to obtain target text position information; the generation submodule is used for generating the text mask based on the target text position information.
In some optional implementations of the present embodiment, the upsampling module 302 is further configured to: and performing multiple-time up-sampling on the text mask to obtain the target mask.
In some optional implementations of this embodiment, the calculating module 305 is further configured to: calculating a content loss function for the countermeasure network based on the low resolution picture, the content loss function characterized by:
wherein the content of the first and second substances,in order to be a function of the content loss,for the value of the pixel point of the high resolution picture at the (x, y) position, GθG(ILR)x,yThe values of the pixel points of the super-resolution picture at the (x, y) position, rW and rH are the width and length of the super-resolution picture, respectively, r2WH is the total number of pixel points of the super-resolution picture.
In some optional implementations of this embodiment, the calculating module 305 is further configured to: calculating a countermeasure loss function for the countermeasure network based on the low resolution picture, the countermeasure loss function characterized by:
wherein the content of the first and second substances,as a function of said antagonistic loss, GθG(ILR) As the super-resolution picture, DθDAnd M is the total number of the super-resolution pictures as the judgment layer, and M represents the number of the super-resolution pictures.
In some optional implementations of this embodiment, the calculating module 305 is further configured to: calculating a regularization loss function of the countermeasure network based on the low resolution picture, the regularization loss function characterized by:
wherein the content of the first and second substances,for the regularization loss function, GθG(ILR)x,yFor the value of the pixel point of the super-resolution picture at the (x, y) position, rW and rH are the width and length of the target mask respectively, r2WH is the total number of pixels in the target mask, | | | | represents a norm,the gradient is indicated.
In some optional implementations of this embodiment, the calculating module 305 is further configured to: calculating a text perception loss function of the countermeasure network based on the low resolution picture, the text perception loss function characterized by:
wherein, lTRFor the text perception loss function, N is the total number of pixel points at the position where the text exists,for the purpose of the mask of the said object,and obtaining the super-resolution picture.
According to the method and the device, the text position and the text content information of the low-resolution picture are obtained through the received low-resolution picture, the text mask is generated based on the text position information and the text content information, the position information and the content information of the text are considered in the generation of the text mask, and then the boundary between the text and the surrounding image in the picture can be determined, so that characters in the subsequently generated super-resolution picture are clear, and the quality of the reconstructed picture is remarkably improved. The text mask is subjected to up-sampling and amplification, so that the resolution of the text mask is improved, and the super-resolution picture can be conveniently generated subsequently. And obtaining the trained confrontation network through the confrontation training of the generation layer and the discrimination layer in the confrontation network, and using the trained confrontation network to generate a target super-resolution picture with higher quality.
In order to solve the technical problem, an embodiment of the present application further provides a computer device. Referring to fig. 4, fig. 4 is a block diagram of a basic structure of a computer device according to the present embodiment.
The computer device 200 comprises a memory 201, a processor 202, a network interface 203 communicatively connected to each other via a system bus. It is noted that only computer device 200 having components 201 and 203 is shown, but it is understood that not all of the illustrated components are required and that more or fewer components may alternatively be implemented. As will be understood by those skilled in the art, the computer device is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.
The computer device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The computer equipment can carry out man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch panel or voice control equipment and the like.
The memory 201 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the storage 201 may be an internal storage unit of the computer device 200, such as a hard disk or a memory of the computer device 200. In other embodiments, the memory 201 may also be an external storage device of the computer device 200, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), or the like, provided on the computer device 200. Of course, the memory 201 may also include both internal and external storage devices of the computer device 200. In this embodiment, the memory 201 is generally used for storing an operating system installed in the computer device 200 and various types of application software, such as computer readable instructions of a text image super-resolution reconstruction method. Further, the memory 201 may also be used to temporarily store various types of data that have been output or are to be output.
The processor 202 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 202 is generally operative to control overall operation of the computer device 200. In this embodiment, the processor 202 is configured to execute computer readable instructions stored in the memory 201 or process data, for example, execute computer readable instructions of the text image super-resolution reconstruction method.
The network interface 203 may comprise a wireless network interface or a wired network interface, and the network interface 203 is generally used for establishing communication connection between the computer device 200 and other electronic devices.
In this embodiment, the generation of the text mask takes into account the position information and the content information of the text, so that the boundary between the text and the surrounding image in the picture can be defined, and the quality of the reconstructed picture is remarkably improved. And obtaining the trained confrontation network through the confrontation training of the generation layer and the discrimination layer in the confrontation network, and using the trained confrontation network to generate a target super-resolution picture with higher quality.
The present application further provides another embodiment, which is to provide a computer-readable storage medium storing computer-readable instructions executable by at least one processor to cause the at least one processor to perform the steps of the text image super-resolution reconstruction method as described above.
In this embodiment, the generation of the text mask takes into account the position information and the content information of the text, so that the boundary between the text and the surrounding image in the picture can be defined, and the quality of the reconstructed picture is remarkably improved. And obtaining the trained confrontation network through the confrontation training of the generation layer and the discrimination layer in the confrontation network, and using the trained confrontation network to generate a target super-resolution picture with higher quality.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.
It is to be understood that the above-described embodiments are merely illustrative of some, but not restrictive, of the broad invention, and that the appended drawings illustrate preferred embodiments of the invention and do not limit the scope of the invention. This application is capable of embodiments in many different forms and is provided for the purpose of enabling a thorough understanding of the disclosure of the application. Although the present application has been described in detail with reference to the foregoing embodiments, it will be apparent to one skilled in the art that the present application may be practiced without modification or with equivalents of some of the features described in the foregoing embodiments. All equivalent structures made by using the contents of the specification and the drawings of the present application are directly or indirectly applied to other related technical fields and are within the protection scope of the present application.
Claims (10)
1. A text image super-resolution reconstruction method is characterized by comprising the following steps:
receiving a low-resolution picture and a corresponding high-resolution picture, inputting the low-resolution picture into a pre-trained scene text recognition model, and obtaining output text position information and text content information;
generating a text mask based on the text position information and the text content information, and performing up-sampling on the text mask to obtain a target mask;
inputting the low-resolution picture and the target mask into a preset generation layer of a countermeasure network to obtain an output super-resolution picture;
inputting the super-resolution picture and the high-resolution picture into a discrimination layer of the countermeasure network at the same time, obtaining an output discrimination result, and calculating discrimination accuracy based on the discrimination result;
calculating a loss function of the countermeasure network based on the low-resolution picture and the target mask until the loss function is converged and the judgment accuracy is lower than an accuracy threshold, and obtaining the trained countermeasure network;
and receiving a low-resolution picture to be converted, and inputting the low-resolution picture to be converted into the trained countermeasure network to obtain an output target super-resolution picture.
2. The text image super-resolution reconstruction method according to claim 1, wherein the step of generating a text mask based on the text position information and the text content information comprises:
correcting the text position information based on the text content information to obtain target text position information;
generating the text mask based on the target text position information.
3. The method for super-resolution reconstruction of text images according to claim 1, wherein the step of calculating the loss function of the countermeasure network based on the low-resolution picture and the target mask comprises:
calculating a content loss function for the countermeasure network based on the low resolution picture, the content loss function characterized by:
wherein the content of the first and second substances,in order to be a function of the content loss,for the value of the pixel point of the high resolution picture at the (x, y) position, GθG(ILR)x,yThe values of the pixel points of the super-resolution picture at the (x, y) position, rW and rH are the width and length of the super-resolution picture, respectively, r2WH is the total number of pixel points of the super-resolution picture.
4. The method for super-resolution reconstruction of text images according to claim 1, wherein the step of calculating the loss function of the countermeasure network based on the low-resolution picture and the target mask comprises:
calculating a countermeasure loss function for the countermeasure network based on the low resolution picture, the countermeasure loss function characterized by:
5. The method for super-resolution reconstruction of text images according to claim 1, wherein the step of calculating the loss function of the countermeasure network based on the low-resolution picture and the target mask comprises:
calculating a regularization loss function of the countermeasure network based on the low resolution picture, the regularization loss function characterized by:
wherein the content of the first and second substances,for the regularization loss function, GθG(ILR)x,yFor the value of the pixel point of the super-resolution picture at the (x, y) position, rW and rH are the width and length of the target mask respectively, r2WH is the total number of pixels in the target mask, | | | | represents a norm,the gradient is indicated.
6. The method for super-resolution reconstruction of text images according to claim 1, wherein the step of calculating the loss function of the countermeasure network based on the low-resolution picture and the target mask comprises:
calculating a text perception loss function of the countermeasure network based on the low resolution picture, the text perception loss function characterized by:
7. The text image super-resolution reconstruction method according to claim 1, wherein the step of upsampling the text mask to obtain a target mask comprises:
and performing multiple-time up-sampling on the text mask to obtain the target mask.
8. A text image super-resolution reconstruction device is characterized by comprising:
the receiving module is used for receiving a low-resolution picture and a corresponding high-resolution picture, inputting the low-resolution picture into a pre-trained scene text recognition model, and obtaining output text position information and text content information;
the up-sampling module is used for generating a text mask based on the text position information and the text content information and up-sampling the text mask to obtain a target mask;
the generation module is used for inputting the low-resolution picture and the target mask into a preset generation layer of a countermeasure network to obtain an output super-resolution picture;
the judging module is used for simultaneously inputting the super-resolution picture and the high-resolution picture into a judging layer of the countermeasure network to obtain an output judging result and calculating judging accuracy based on the judging result;
the calculation module is used for calculating a loss function of the confrontation network based on the low-resolution picture and the target mask until the loss function is converged and the judgment accuracy is lower than an accuracy threshold, and obtaining the trained confrontation network;
and the obtaining module is used for receiving the low-resolution picture to be converted, inputting the low-resolution picture to be converted into the trained confrontation network and obtaining an output target super-resolution picture.
9. A computer device comprising a memory having computer-readable instructions stored therein and a processor that when executed performs the steps of the text image super-resolution reconstruction method of any one of claims 1 to 7.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon computer-readable instructions which, when executed by a processor, implement the steps of the text image super-resolution reconstruction method according to any one of claims 1 to 7.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111061974.7A CN113763249A (en) | 2021-09-10 | 2021-09-10 | Text image super-resolution reconstruction method and related equipment thereof |
PCT/CN2022/071883 WO2023035531A1 (en) | 2021-09-10 | 2022-01-13 | Super-resolution reconstruction method for text image and related device thereof |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111061974.7A CN113763249A (en) | 2021-09-10 | 2021-09-10 | Text image super-resolution reconstruction method and related equipment thereof |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113763249A true CN113763249A (en) | 2021-12-07 |
Family
ID=78794915
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111061974.7A Pending CN113763249A (en) | 2021-09-10 | 2021-09-10 | Text image super-resolution reconstruction method and related equipment thereof |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN113763249A (en) |
WO (1) | WO2023035531A1 (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114172873A (en) * | 2021-12-13 | 2022-03-11 | 中国平安财产保险股份有限公司 | Resolution adjustment method, resolution adjustment device, server and computer-readable storage medium |
WO2023035531A1 (en) * | 2021-09-10 | 2023-03-16 | 平安科技(深圳)有限公司 | Super-resolution reconstruction method for text image and related device thereof |
CN115829837A (en) * | 2022-11-15 | 2023-03-21 | 深圳市新良田科技股份有限公司 | Text image super-resolution reconstruction method and system |
WO2023070495A1 (en) * | 2021-10-29 | 2023-05-04 | 京东方科技集团股份有限公司 | Image processing method, electronic device and non-transitory computer-readable medium |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116385318B (en) * | 2023-06-06 | 2023-10-10 | 湖南纵骏信息科技有限公司 | Image quality enhancement method and system based on cloud desktop |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107154023B (en) * | 2017-05-17 | 2019-11-05 | 电子科技大学 | Based on the face super-resolution reconstruction method for generating confrontation network and sub-pix convolution |
US11003865B1 (en) * | 2020-05-20 | 2021-05-11 | Google Llc | Retrieval-augmented language model pre-training and fine-tuning |
CN112508782B (en) * | 2020-09-10 | 2024-04-26 | 浙江大华技术股份有限公司 | Training method of network model, and super-resolution reconstruction method and device of face image |
CN113256494B (en) * | 2021-06-02 | 2022-11-11 | 同济大学 | Text image super-resolution method |
CN113763249A (en) * | 2021-09-10 | 2021-12-07 | 平安科技(深圳)有限公司 | Text image super-resolution reconstruction method and related equipment thereof |
-
2021
- 2021-09-10 CN CN202111061974.7A patent/CN113763249A/en active Pending
-
2022
- 2022-01-13 WO PCT/CN2022/071883 patent/WO2023035531A1/en unknown
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2023035531A1 (en) * | 2021-09-10 | 2023-03-16 | 平安科技(深圳)有限公司 | Super-resolution reconstruction method for text image and related device thereof |
WO2023070495A1 (en) * | 2021-10-29 | 2023-05-04 | 京东方科技集团股份有限公司 | Image processing method, electronic device and non-transitory computer-readable medium |
CN114172873A (en) * | 2021-12-13 | 2022-03-11 | 中国平安财产保险股份有限公司 | Resolution adjustment method, resolution adjustment device, server and computer-readable storage medium |
CN114172873B (en) * | 2021-12-13 | 2023-05-30 | 中国平安财产保险股份有限公司 | Resolution adjustment method, resolution adjustment device, server and computer readable storage medium |
CN115829837A (en) * | 2022-11-15 | 2023-03-21 | 深圳市新良田科技股份有限公司 | Text image super-resolution reconstruction method and system |
Also Published As
Publication number | Publication date |
---|---|
WO2023035531A1 (en) | 2023-03-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113763249A (en) | Text image super-resolution reconstruction method and related equipment thereof | |
US20230081645A1 (en) | Detecting forged facial images using frequency domain information and local correlation | |
US11436863B2 (en) | Method and apparatus for outputting data | |
CN109858333B (en) | Image processing method, image processing device, electronic equipment and computer readable medium | |
CN112686243A (en) | Method and device for intelligently identifying picture characters, computer equipment and storage medium | |
CN113673519B (en) | Character recognition method based on character detection model and related equipment thereof | |
CN110795714A (en) | Identity authentication method and device, computer equipment and storage medium | |
CN112749695A (en) | Text recognition method and device | |
CN112418292A (en) | Image quality evaluation method and device, computer equipment and storage medium | |
CN113254491A (en) | Information recommendation method and device, computer equipment and storage medium | |
CN112330331A (en) | Identity verification method, device and equipment based on face recognition and storage medium | |
CN114241459B (en) | Driver identity verification method and device, computer equipment and storage medium | |
CN114529574A (en) | Image matting method and device based on image segmentation, computer equipment and medium | |
CN112016502B (en) | Safety belt detection method, safety belt detection device, computer equipment and storage medium | |
CN112634158A (en) | Face image recovery method and device, computer equipment and storage medium | |
CN113012075A (en) | Image correction method and device, computer equipment and storage medium | |
CN112651399B (en) | Method for detecting same-line characters in inclined image and related equipment thereof | |
CN115510186A (en) | Instant question and answer method, device, equipment and storage medium based on intention recognition | |
CN114861241A (en) | Anti-peeping screen method based on intelligent detection and related equipment thereof | |
CN112434746B (en) | Pre-labeling method based on hierarchical migration learning and related equipment thereof | |
CN112669244A (en) | Face image enhancement method and device, computer equipment and readable storage medium | |
CN116774973A (en) | Data rendering method, device, computer equipment and storage medium | |
CN114241411B (en) | Counting model processing method and device based on target detection and computer equipment | |
CN113362249B (en) | Text image synthesis method, text image synthesis device, computer equipment and storage medium | |
CN115601235A (en) | Image super-resolution network training method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |