CN114202765A - Image text recognition method and storage medium - Google Patents

Image text recognition method and storage medium Download PDF

Info

Publication number
CN114202765A
CN114202765A CN202111330318.2A CN202111330318A CN114202765A CN 114202765 A CN114202765 A CN 114202765A CN 202111330318 A CN202111330318 A CN 202111330318A CN 114202765 A CN114202765 A CN 114202765A
Authority
CN
China
Prior art keywords
text
information
image
model
text information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111330318.2A
Other languages
Chinese (zh)
Inventor
陈江海
梁懿
苏江文
卢伟龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Information and Telecommunication Co Ltd
Fujian Yirong Information Technology Co Ltd
Great Power Science and Technology Co of State Grid Information and Telecommunication Co Ltd
Original Assignee
State Grid Information and Telecommunication Co Ltd
Fujian Yirong Information Technology Co Ltd
Great Power Science and Technology Co of State Grid Information and Telecommunication Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Information and Telecommunication Co Ltd, Fujian Yirong Information Technology Co Ltd, Great Power Science and Technology Co of State Grid Information and Telecommunication Co Ltd filed Critical State Grid Information and Telecommunication Co Ltd
Priority to CN202111330318.2A priority Critical patent/CN114202765A/en
Publication of CN114202765A publication Critical patent/CN114202765A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Character Discrimination (AREA)

Abstract

The invention relates to an image text recognition method and a storage medium, wherein the method comprises the following steps: s1: receiving first image text information; the image text information comprises first text information and background image information; s2: extracting first text information and background image information, and determining parameter information corresponding to the first text information; s3: acquiring one or more pieces of character information from a character database, and processing the acquired character information by adopting parameter information corresponding to the first text information to obtain second text information; s4: and synthesizing the second text information and the background image information into second image text information, and inputting the second image text information into a text detection model for training. By the scheme, the training data volume of the detection model can be effectively expanded, and the accuracy of the trained model on text detection is improved.

Description

Image text recognition method and storage medium
Technical Field
The present invention relates to the field of image recognition, and in particular, to an image text recognition method and a storage medium.
Background
Image text Recognition, or OCR (Optical Character Recognition), refers to an emerging technology that recognizes words in an image and returns them in the form of text. OCR recognition technology has evolved through several stages, from the first printed english language that can only recognize a given font to the present time, which can recognize many national characters including handwriting.
In recent years, with the rapid development of the field of artificial intelligence, there has been an increasing demand for text content recognition of documents such as digitally processed scannings and photographed images. The image text recognition technology becomes a necessary link for intelligent processing of the current unstructured files. And plays a vital role in a plurality of business fields, such as marketing archive identification, audit document identification, engineering file identification, electronic license identification and the like.
At present, a certain method exists for universal image text recognition, but the defects of low recognition accuracy, low recognition speed, incapability of recognizing bent characters, incapability of supporting multi-language hybrid recognition and the like exist.
For example, in a patent application with an application number of [ CN201911221023.4 ], entitled "a method for recognizing text of images in natural scenes based on pruning depth model", an image text recognition method is proposed, which performs feature extraction through Darknet, performs target detection by combining with YoloV3, recognizes bbox in a text area in an image, and then performs recognition. This solution has the following drawbacks: 1. the adoption of Darknet as a backbone can result in slower overall recognition speed; 2. YoloV3 is generally used in target detection scenarios where text detection accuracy is more general. In conclusion, the scheme has the obvious defects of low recognition speed, low recognition accuracy and the like.
For another example, in a patent application with an application number of [ CN202110584533.9 ] and a title of "optical character fast recognition method and system", a character fast recognition method is proposed, which comprises the following basic steps: (1) text detection is carried out through a DB algorithm; (2) and performing text recognition by adopting a CRNN algorithm. The scheme represents the current common OCR recognition method, detection is carried out through a DB algorithm, and recognition is carried out by adopting a CRNN algorithm. This solution has the following drawbacks: 1. the DB algorithm adopts single-line character detection, a plurality of detection boxes are required to be adopted for respectively identifying the multi-line character detection, the identification rate of an overlong text (for example, the length of a character exceeds 25 characters) is low, and other measures (for example, a sliding window) are required to be matched to improve the accuracy rate; 2. although the prediction speed of the CRNN algorithm is slightly high, the recognition accuracy of the CRNN algorithm is obviously lower than that of algorithms such as SRN and NRTR.
Disclosure of Invention
Therefore, a technical scheme for image text recognition needs to be provided to solve the problems of low recognition rate, low speed and the like of the existing image text recognition method.
To achieve the above object, in a first aspect, the present application provides an image text recognition method, including the steps of:
s1: receiving first image text information; the image text information comprises first text information and background image information;
s2: extracting first text information and background image information, and determining parameter information corresponding to the first text information;
s3: acquiring one or more pieces of character information from a character database, and processing the acquired character information by adopting parameter information corresponding to the first text information to obtain second text information;
s4: and synthesizing the second text information and the background image information into second image text information, and inputting the second image text information into a text detection model for training.
As an alternative embodiment, step S3 includes:
and randomly acquiring one or more character information from the character database, repeating the steps for multiple times, and processing the acquired multiple character information by adopting the parameter information corresponding to the first text information to obtain multiple second text information.
As an alternative embodiment, the parameter information includes any one or more of a font, a font size, a font style, a color, a typesetting mode, and a decoration effect.
As an alternative embodiment, the first image text information includes any one or more of invoice data, tickets, business licenses, electronic itineraries, identity cards, social security cards, and bank cards.
As an alternative embodiment, the text detection model is the ResNet50_ vd and SAST algorithm detection model; the method specifically comprises the following steps: ResNet50_ vd is adopted as a network structure, and a full connection layer in the network structure is replaced by an FCN full convolution layer.
As an alternative embodiment, the loss function of the text detection model is as follows: l istotal=λ1Ltcl2Ltco3Ltvo4Ltbo
Wherein, λ 1, λ 2, λ 3 and λ 4 are weighted values, and tcl, tco, tvo and tvo represent four characteristic diagrams; tcl represents a text area where the first text information is located; tco, tvo denote the amount of pixel shift compared to tcl; the method specifically comprises the following steps: tco feature map is text pixel center offset relative to tcl feature map; the tvo feature map is the pixel offset relative to the four bounding box vertices of the text of the tvl feature map; tbo is the offset relative to the upper and lower bounds of the tcl profile.
As an alternative embodiment, λ 1 ═ 1.0; λ 2 is 0.5; λ 3 ═ 0.5; λ 4 is 1.0.
As an alternative embodiment, step S4 is followed by step S5:
inputting the output result of the text detection model into a text recognition model for training; the text detection model is the Resnet50_ vd _ fpn and the SRN algorithm recognition model.
As an alternative embodiment, the method further comprises:
optimizing the trained model, specifically comprising: and distilling, quantifying and cutting the trained model in sequence to obtain the final model.
In a second aspect, the present application also provides a storage medium storing a computer program which, when executed by a processor, performs the method steps as in the first aspect of the present application.
The invention relates to an image text recognition method and a storage medium, which are different from the prior art, and the method comprises the following steps: s1: receiving first image text information; the image text information comprises first text information and background image information; s2: extracting first text information and background image information, and determining parameter information corresponding to the first text information; s3: acquiring one or more pieces of character information from a character database, and processing the acquired character information by adopting parameter information corresponding to the first text information to obtain second text information; s4: and synthesizing the second text information and the background image information into second image text information, and inputting the second image text information into a text detection model for training. According to the scheme, the second text information is obtained by expanding according to the first text information, the second text information and the background image information are synthesized into the second image text information and then transmitted to the text detection model for training, the training data volume of the model is effectively improved, and the accuracy of the trained model for text detection is further improved.
Drawings
Fig. 1 is a flowchart of an image text recognition method according to an embodiment of the present invention;
FIG. 2 is a flow chart of a method for recognizing text in an image according to another embodiment of the present invention;
FIG. 3 is a flow chart of a method for image text recognition according to another embodiment of the present invention;
FIG. 4 is a flow chart of model training according to an embodiment of the present invention;
FIG. 5 is a flow chart of model optimization according to an embodiment of the present invention;
FIG. 6 is a flow chart of predictive identification according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of a SAST algorithm according to an embodiment of the present invention;
fig. 8 is a schematic diagram of a GSRM model structure according to an embodiment of the present invention.
Detailed Description
To explain technical contents, structural features, and objects and effects of the technical solutions in detail, the following detailed description is given with reference to the accompanying drawings in conjunction with the embodiments.
Fig. 1 is a flowchart illustrating an image text recognition method according to an embodiment of the present invention. The method comprises the following steps: s1: receiving first image text information; the image text information comprises first text information and background image information; s2: extracting first text information and background image information, and determining parameter information corresponding to the first text information; s3: acquiring one or more pieces of character information from a character database, and processing the acquired character information by adopting parameter information corresponding to the first text information to obtain second text information; s4: and synthesizing the second text information and the background image information into second image text information, and inputting the second image text information into a text detection model for training.
As an alternative embodiment, the parameter information includes any one or more of a font, a font size, a font style, a color, a typesetting mode, and a decoration effect. The typesetting mode comprises the space, line spacing and the like among a plurality of fonts, and the modification effect can be some effects added on the basis of the characters, such as shadow and the like.
As an alternative embodiment, the first image text information includes any one or more of invoice data, tickets, business licenses, electronic itineraries, identity cards, social security cards, and bank cards. Of course, in other embodiments, the first image text information may also be other image data containing text.
In the present application, the text database refers to a dictionary database containing a plurality of characters, the text information refers to data information containing one or more characters, and the background image information refers to background information corresponding to the position of the text information on the image. Generally, the detection of the text information on the image is to detect a text box in which the text information is located first, and then recognize the characters in the text box, so that the background image information of the application may be the background information left after the character information is subtracted from the text box.
In the above scheme, the expansion of the training data can be realized by acquiring one or more pieces of character information from the character database, and processing the acquired character information by using the parameter information corresponding to the first text information to obtain the second text information. Meanwhile, as the parameter information of the second text information is completely consistent with the parameter information of the first text information, after the parameter information of the second text information is input into the text detection model for training, the recognition speed of the text detection model for the text information of the parameter information type can be greatly enhanced, the accuracy of text recognition is improved, and the effect is remarkable.
As an alternative embodiment, as shown in fig. 2, step S3 includes: s31 randomly obtains one or more text messages from the text database, repeats the process a plurality of times, and processes the obtained plurality of text messages using the parameter information corresponding to the first text message to obtain a plurality of second text messages. The step S4 includes S41 combining each piece of second text information with the background image information to obtain a plurality of pieces of second image text information, and inputting the obtained plurality of pieces of second image text information into the text detection model for training. The character information required by the generation of the second text information is randomly acquired from the character database, and the acquired characters are synthesized with the corresponding background image information in the first image text information each time, so that a plurality of second text information with the same style as the first text information can be obtained, and the rapid identification of the training model to the type of character information can be greatly enhanced because the second text information and the first image text information adopt similar backgrounds.
In the present application, the number of words of the characters included in the second text information may be the same as or different from the number of words of the characters included in the first text information. Preferably, when the second text information is generated, the same number of characters as the number of characters included in the first text information may be randomly acquired from the character database.
As an alternative embodiment, in the present application, the text detection model is the ResNet50_ vd and SAST algorithm detection model; the method specifically comprises the following steps: ResNet50_ vd is adopted as a network structure, and a full connection layer in the network structure is replaced by an FCN full convolution layer.
In the application, a mode of combining ResNet50_ vd (backbone) and SAST algorithms can be adopted as the algorithms corresponding to the text detection model of the application for the text detection model, and verification on a plurality of training sets shows that the combination of ResNet50_ vd and SAST algorithms is adopted as the text detection model, so that the effect of the text detection model is obviously superior to that of the common text detection model which adopts ResNet34_ vd, MobileNet V3 and the like as network structures; DB, EAST, etc. are combined as the text recognition model of the algorithm.
The main principle of SAST is shown in fig. 7, and specifically includes: using ResNet50_ vd as the network structure of the network, the last full connection layer is replaced with an FCN full convolution layer, and the semantic segmentation result having the same size as the original image is output. And feature points of different levels of feature maps are fused for multiple times (such as three times) by using an FPN algorithm, so that a feature network can contain more information of objects with different sizes.
Preferably, the output of the SAST network is divided into four parts, which are tcl, tco, tvo profiles, respectively. The loss function of the text detection model is as follows: l istotal=λ1Ltcl2Ltco3Ltvo4Ltbo(ii) a Wherein, λ 1, λ 2, λ 3 and λ 4 are weighted values, and tcl, tco, tvo and tvo represent four characteristic diagrams; tcl represents a text area where the first text information is located; tco, tvo denote the amount of pixel shift compared to tcl; the method specifically comprises the following steps:tco feature map is text pixel center offset relative to tcl feature map; the tvo feature map is the pixel offset relative to the four bounding box vertices of the text of the tvl feature map; tbo is the offset relative to the upper and lower bounds of the tcl profile.
Preferably, λ 1 ═ 1.0; λ 2 is 0.5; λ 3 ═ 0.5; λ 4 is 1.0. In this application, λ 1, λ 2, λ 3 and λ 4 are used to balance the four tasks, i.e. to make them equally important in this model, so we set {1.0,0.5,0.5,1.0} to make the four loss gradient values equally effective in back propagation.
As shown in fig. 3, step S4 is followed by step S5: inputting the output result of the text detection model into a text recognition model for training; the text detection model is the Resnet50_ vd _ fpn and the SRN algorithm recognition model.
In the application, by adopting Resnet50_ vd _ fpn (backbone) and SRN algorithms as the network structure and algorithm of text recognition and verifying on a plurality of public data sets, the effect is obviously superior to that of common text recognition models which take Resnet34_ vd, MobileNet V3 and the like as network structures and combine CRNN, Rosetta, StarNet, RARE and the like as algorithms.
The main steps of the SRN are generally as follows: the method comprises the steps of firstly recoding sequence characteristics by utilizing a character reading and writing sequence to obtain a primary identification result, and then re-integrating the primary identification result into the sequence characteristics, namely judging whether the sequence characteristics are correct or not from the whole layer, then determining whether fine adjustment is needed or not, and then obtaining the identification result again. SRNs generally consist of four parts: the system comprises a backbone network, a Parallel Visual Attention Module (PVAM), a Global Semantic Reasoning Module (GSRM) and a Visual Semantic Fusion Decoder (VSFD). The present invention adopts Resnet50_ vd _ fpn as the backbone network of SRNs. PVAM is used to generate N aligned one-dimensional features G, where each feature corresponds to a character in the text and captures the aligned visual information. These N one-dimensional features G are then fed into the GSRM to capture the semantic information S. Finally, VSFD fuses aligned visual features G and semantic information S together to predict N characters. The GSRM model structure is shown in FIG. 8.
After the text detection and the text recognition are finished, the method also aims at the training model for finishing the text recognition, and adopts the residual data in each random sample to carry out evaluation respectively to obtain the model evaluation data of each model, then continuously adjusts the model hyper-parameter, repeats the text detection/text recognition steps until the best evaluation index is obtained, and solidifies the hyper-parameter; and storing the model set with the highest evaluation index as a primary available model.
Specifically, as shown in fig. 4, the present application is obtained by the following steps: firstly, step S41 training data preparation is carried out; then, performing step S42 data expansion, specifically, the data expansion may be performed according to the method shown in fig. 1; then, performing step S43 text detection/text recognition model training; then, carrying out step S44 model evaluation; the original model publishing then proceeds to step S45. Through steps S41-S45, an initial training model may be obtained.
As an optional embodiment, after obtaining the initial training model, the trained initial model may be optimized, specifically including: and distilling, quantifying and cutting the trained model in sequence to obtain the final model.
As shown in fig. 5, the model optimization method is specifically as follows:
the process first proceeds to step S51 where an initial model is input. Specifically, an initial text detection model and a text recognition module obtained after last training are used as input models of the model optimization step.
And then to step S52 model distillation. In the present application, the distillation model uses the transfer learning, and another simple network (student model) is trained by using the output of a complex model (Teacher model) trained in advance as a supervision signal. The goal of model distillation is to let the student model learn the generalization ability of the teacher model, and the final result will be better than the student model that fits the training data alone. Meanwhile, the student model adopts a lightweight backbone, so that the volume of the model file can be greatly reduced, and the prediction speed is improved. In the present application, the model trained in the foregoing manner is used as a Teacher model, MobileNetV3 as a backbone of a student model, and softmax _ with _ cross _ entry _ loss as a function of distillation strategy loss.
Then, the model quantization is performed in step S53. The purpose of model quantization is to reduce the neural network parameters, improve the speed and reduce the memory by quantizing the problems of less quantity of parameters, large calculation amount and large memory occupation of the existing convolutional neural network, and the final purpose is to reduce the volume of a model file, reduce the memory occupation and improve the prediction speed. In the application, a BNN algorithm is adopted for model quantization, a binary weight is adopted for replacing a floating point weight in an activation value in forward and reverse training of a neural network, and a model quantization formula is as follows:
Figure BDA0003348563250000081
wherein x isnThe value of the nth bit of the method is expressed by 8 bits.
And then proceeds to step S54 model clipping. The model cutting is to judge the importance of the parameters through the sensitivity analysis of the trained model parameters, and cut the unimportant connection or filter to reduce the redundancy of the model, thereby reducing the file volume of the model and improving the prediction speed. Since most of the neurons are activated towards zero and the 0-activated neurons are redundant, eliminating them can greatly reduce the size and the operation amount of the model without influencing the performance of the model, and the number of 0-activated values in each filter is measured by a variable apoz (average persistence of zeros) as a criterion for evaluating whether a filter is important. APoZ is defined as follows:
Figure BDA0003348563250000091
and then proceeds to step S55 for final model release. After the initial model is processed through steps S52-S54, a finally usable text detection model and a text recognition model can be obtained.
After the final model is obtained, the recognition and prediction of the image can be carried out. As shown in fig. 6, the method comprises the following steps:
the flow first proceeds to step S61 to input an image to be recognized. The image to be recognized is particularly an image containing text information.
The process then proceeds to step S62 where classification adjustment is performed according to the image orientation. This step may be implemented by an image orientation classifier for identifying whether there is a rotation angle in the input image, such as: and if the rotation angle is 90 degrees, 180 degrees or 270 degrees, automatically correcting and repairing. A large number of practices prove that the recognition effect of the image after rotation is directly input into the model is greatly reduced, because the data is not included when the training data of the model in the early stage is collected, and the training data is enlarged by 4 times by adding the data into the data, so that the whole training duration is influenced. The accuracy of final text recognition can be effectively improved by judging and adjusting the image direction optimization. Preferably, in the present application, the image classification algorithm of CNN is used for image classification.
Then, the process proceeds to step S63 for text detection. Specifically, the image is sent to a text detection model, and an area set where text information is located is returned, that is, the aforementioned text box information is returned.
And then proceeds to step S64 text recognition. Specifically, the text box is sent to the text recognition model, and the text information in the text box is returned, that is, the text information is extracted from the text box as mentioned above.
And then proceeds to step S65 for output. The output text recognition result may be displayed through the display unit.
In a second aspect, the present application also provides a storage medium storing a computer program which, when executed by a processor, performs the method steps as in the first aspect of the present application.
Preferably, the Processor is an electronic component having a data Processing function, such as a Central Processing Unit (CPU), a Digital Signal Processor (DSP) or a System on Chip (SoC).
Preferably, the storage medium is an electronic component with a data storage function, including but not limited to: RAM, ROM, magnetic disk, magnetic tape, optical disk, flash memory, U disk, removable hard disk, memory card, memory stick, etc.
The invention provides an image text recognition algorithm based on deep learning, which forms end-to-end text detection and recognition capabilities by constructing an image classification model, a text block detection model and a text recognition model, and amplifies training data in a self-defined image enhancement mode. In addition, the method greatly improves the identification accuracy and the identification speed through a series of model optimization strategies.
The method of the present application has the following three advantages:
(1) the method adopts a two-stage recognition method, is based on a deep learning technology, and combines a unique data enhancement scheme, so that the accuracy of text detection and text recognition is greatly improved.
In the existing deep learning two-stage image text recognition method, data often becomes an important influence factor for restricting the final index of a model. This is due to the difficulty of training data collection and the time consuming labeling process. In conventional image data enhancement schemes, for example: the measures of randomly adjusting brightness, randomly adjusting contrast, Gaussian blur and the like cannot play a role in improving model indexes in the task of image text recognition. Therefore, the scheme provides a novel data enhancement scheme. And extracting the text foreground style and the picture background in the existing training data. And a new random text is adopted to fuse the foreground style of the text and the background of the picture, so as to generate new training data. A large number of practical verifications show that the data enhancement in the mode can generally improve the final identification accuracy by more than 10%.
(2) Through a series of model compression algorithms, the model volume is reduced, and the prediction speed is improved.
In order to pursue high accuracy, residual error networks with a large number of layers, such as ResNet50 or ResNet101, are often used as a text detection model and a text recognition model obtained by backsnone training in the prior art, and the method has the problems of large model file volume and low prediction speed. In order to increase the prediction speed and reduce the volume of a model file on the basis of ensuring the accuracy of the model as much as possible, the invention adopts various compression methods to greatly increase the prediction speed and reduce the volume of the model. Such as: l1NormFilterPruner (L1-norm statistic), Embedding quantization, etc.
(3) And a model distillation algorithm is adopted, the generalization capability of the model is improved, and the accuracy of the model is finally improved.
The deep learning model cannot achieve good effects in subsequent practical application only by fitting training data, and is only optimally applied (namely, generalization capability) only by learning how to generalize to new data. The goal of model distillation is to make the student model (new model) learn the generalization ability of the teacher model (original model or model ensemble), and the obtained result is better than that of the student simply fitting the training data. According to the invention, the model is distilled after training by taking ResNet101 as the teacher network for distillation training, so that the generalization capability of the model can be effectively improved, and the accuracy of the model is finally improved.
It should be noted that, although the above embodiments have been described herein, the invention is not limited thereto. Therefore, based on the innovative concepts of the present invention, the technical solutions of the present invention can be directly or indirectly applied to other related technical fields by making changes and modifications to the embodiments described herein, or by using equivalent structures or equivalent processes performed in the content of the present specification and the attached drawings, which are included in the scope of the present invention.

Claims (10)

1. An image text recognition method, characterized in that the method comprises the steps of:
s1: receiving first image text information; the image text information comprises first text information and background image information;
s2: extracting the first text information and the background image information, and determining parameter information corresponding to the first text information;
s3: acquiring one or more pieces of character information from a character database, and processing the acquired character information by adopting parameter information corresponding to the first text information to obtain second text information;
s4: and synthesizing the second text information and the background image information into second image text information, and inputting the second image text information into a text detection model for training.
2. The image text recognition method according to claim 1, wherein step S3 includes:
and randomly acquiring one or more character information from a character database, repeating the steps for multiple times, and processing the acquired character information by adopting the parameter information corresponding to the first text information to obtain a plurality of second text information.
3. The image text recognition method according to claim 1 or 2, wherein the parameter information includes any one or more of a font, a font size, a font style, a color, a layout style, and a decoration effect.
4. The image text recognition method according to claim 1 or 2, wherein the first image text information includes any one or more of invoice data, tickets, business licenses, electronic itineraries, identification cards, social security cards, and bank cards.
5. The image text recognition method of claim 1, wherein the text detection model is a ResNet50_ vd and SAST algorithm detection model; the method specifically comprises the following steps: ResNet50_ vd is adopted as a network structure, and a full connection layer in the network structure is replaced by an FCN full convolution layer.
6. The image text recognition method of claim 5, wherein the loss function of the text detection model is as follows: l istotal=λ1Ltcl2Ltco3Ltvo4Ltbo
Wherein, λ 1, λ 2, λ 3 and λ 4 are weighted values, and tcl, tco, tvo and tvo represent four characteristic diagrams; tcl represents a text area where the first text information is located; tco, tvo denote the amount of pixel shift compared to tcl; the method specifically comprises the following steps: tco feature map is text pixel center offset relative to tcl feature map; the tvo feature map is the pixel offset relative to the four bounding box vertices of the text of the tvl feature map; tbo is the offset relative to the upper and lower bounds of the tcl profile.
7. The image text recognition method according to claim 6, wherein λ 1 ═ 1.0; λ 2 is 0.5; λ 3 ═ 0.5; λ 4 is 1.0.
8. The image text recognition method of claim 1, wherein step S4 is followed by step S5:
inputting the output result of the text detection model into a text recognition model for training; the text detection model is Resnet50_ vd _ fpn and the SRN algorithm recognition model.
9. The image text recognition method of claim 1, wherein the method further comprises:
optimizing the trained model, specifically comprising: and distilling, quantifying and cutting the trained model in sequence to obtain the final model.
10. A storage medium, characterized in that the storage medium stores a computer program which, when executed by a processor, carries out the method steps of any one of claims 1 to 9.
CN202111330318.2A 2021-11-11 2021-11-11 Image text recognition method and storage medium Pending CN114202765A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111330318.2A CN114202765A (en) 2021-11-11 2021-11-11 Image text recognition method and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111330318.2A CN114202765A (en) 2021-11-11 2021-11-11 Image text recognition method and storage medium

Publications (1)

Publication Number Publication Date
CN114202765A true CN114202765A (en) 2022-03-18

Family

ID=80647285

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111330318.2A Pending CN114202765A (en) 2021-11-11 2021-11-11 Image text recognition method and storage medium

Country Status (1)

Country Link
CN (1) CN114202765A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115393872A (en) * 2022-10-27 2022-11-25 腾讯科技(深圳)有限公司 Method, device and equipment for training text classification model and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115393872A (en) * 2022-10-27 2022-11-25 腾讯科技(深圳)有限公司 Method, device and equipment for training text classification model and storage medium
CN115393872B (en) * 2022-10-27 2023-01-17 腾讯科技(深圳)有限公司 Method, device and equipment for training text classification model and storage medium

Similar Documents

Publication Publication Date Title
CN110738207B (en) Character detection method for fusing character area edge information in character image
CN110334705B (en) Language identification method of scene text image combining global and local information
CN111931684B (en) Weak and small target detection method based on video satellite data identification features
CN111325203B (en) American license plate recognition method and system based on image correction
CN110322495B (en) Scene text segmentation method based on weak supervised deep learning
CN110866140A (en) Image feature extraction model training method, image searching method and computer equipment
CN112232149A (en) Document multi-mode information and relation extraction method and system
CN112232371B (en) American license plate recognition method based on YOLOv3 and text recognition
CN111738055B (en) Multi-category text detection system and bill form detection method based on same
CN111428593A (en) Character recognition method and device, electronic equipment and storage medium
CN112307919B (en) Improved YOLOv 3-based digital information area identification method in document image
CN110826609B (en) Double-current feature fusion image identification method based on reinforcement learning
CN113256649B (en) Remote sensing image station selection and line selection semantic segmentation method based on deep learning
JP2023514294A (en) Explanable active learning method using Bayesian dual autoencoder for object detector and active learning device using it
CN112348028A (en) Scene text detection method, correction method, device, electronic equipment and medium
CN115131797A (en) Scene text detection method based on feature enhancement pyramid network
CN111680684B (en) Spine text recognition method, device and storage medium based on deep learning
CN114202765A (en) Image text recognition method and storage medium
CN111832497B (en) Text detection post-processing method based on geometric features
CN113096133A (en) Method for constructing semantic segmentation network based on attention mechanism
CN113570540A (en) Image tampering blind evidence obtaining method based on detection-segmentation architecture
CN111209886A (en) Rapid pedestrian re-identification method based on deep neural network
US20230186600A1 (en) Method of clustering using encoder-decoder model based on attention mechanism and storage medium for image recognition
JP6778625B2 (en) Image search system, image search method and image search program
Castillo et al. Object detection in digital documents based on machine learning algorithms

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination