CN112580507B - Deep learning text character detection method based on image moment correction - Google Patents

Deep learning text character detection method based on image moment correction Download PDF

Info

Publication number
CN112580507B
CN112580507B CN202011506599.8A CN202011506599A CN112580507B CN 112580507 B CN112580507 B CN 112580507B CN 202011506599 A CN202011506599 A CN 202011506599A CN 112580507 B CN112580507 B CN 112580507B
Authority
CN
China
Prior art keywords
character
loss
box
label
loss function
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011506599.8A
Other languages
Chinese (zh)
Other versions
CN112580507A (en
Inventor
田辉
刘其开
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei High Dimensional Data Technology Co ltd
Original Assignee
Hefei High Dimensional Data Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei High Dimensional Data Technology Co ltd filed Critical Hefei High Dimensional Data Technology Co ltd
Priority to CN202011506599.8A priority Critical patent/CN112580507B/en
Publication of CN112580507A publication Critical patent/CN112580507A/en
Application granted granted Critical
Publication of CN112580507B publication Critical patent/CN112580507B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)
  • Character Discrimination (AREA)

Abstract

The invention discloses a deep learning text character detection method based on image moment correction, which specifically comprises the following steps: preparing a data set, manually correcting a box with inaccurate pre-labeling, generating a heat map label in a Gaussian heat map form according to the box, defining a neural network structure and a loss function, pre-training, expanding a training sample set of an actual scene, performing self-adaptive binarization operation on the expanded training sample set, calculating Hu moment feature vectors of each character, taking an orientation quantity mean value as an auxiliary label of the character, and modifying the loss function form to perform fine-tuning training, model test and verification; according to the method, the optimized loss function is formed by combining the heat map label and the moment feature vector label, so that the accuracy of the character box frame is improved, and the problems of over-segmentation and under-segmentation of the character frame are solved; the problem of insufficient character level labeling is solved by preprocessing the sample set after expanding, and the character detection generalization capability is better.

Description

Deep learning text character detection method based on image moment correction
Technical Field
The invention belongs to the field of target detection, and particularly relates to a deep learning text character detection method based on image moment correction.
Background
At present, text detection has wide application in the field of computer vision, such as real-time translation, image retrieval, scene analysis, geographic positioning, blind navigation and the like, so that the method has extremely high application value and research significance in scene understanding and text analysis.
The existing text detection methods are divided into the following categories:
1. the traditional image processing method is based on the feature detection of manual design, such as MSER (maximum stable extremum region) and SWT (stroke width transformation), and mainly processes the text detection of printing fonts and printing scanning scenes, and has poor text detection effect on natural scenes;
2. The Two-stage method based on deep learning generates candidate areas and extracts corresponding features, performs network training fine adjustment and outputs a corresponding text area box, and has the advantages of higher precision, good performance on small-scale target detection, calculated amount sharing, low reasoning speed and longer training period;
3. The One-stage method based on deep learning directly skips the step of generating a candidate frame and predicts a text region frame of a target end to end, and has the advantages of high reasoning speed, lower precision than two-stage and poor small target detection effect.
Most of the existing text detection algorithm technologies are based on the position coordinates of the output text line area, for example, a reference network CTPN in the existing text detection technologies is improved based on a Two-stage method, and on the basis of FASTER RCNN, the special improvement of horizontal arrangement or vertical arrangement of target texts is combined, and the output text line area is output. Existing text detection algorithm techniques do not provide accurate character-level text detection and thus provide limited information.
The existing text detection algorithm at the character level is based on the concept of semantic segmentation, a label replaces a pixel-level block heat map with a Gaussian center heat map, a regional score or a compact score is adopted to optimize a network, and the final character frame is obtained by performing binarization processing on a probability map in post-processing. The character-level text detection can output the coordinates of a single character frame body and the coordinates of a text line area, so that the output information is richer, and the larger requirements of customers can be met. However, the existing algorithm for detecting the text at the character level is affected by parameters and the complex Chinese text scene where the parameters are located, and the segmented character frames can be over-segmented or under-segmented, which respectively correspond to the rectangular frame and the darkened rectangular frame shown in fig. 4.
Disclosure of Invention
In order to solve the problems, the invention provides a deep learning text character detection method based on image moment correction, which comprises the following steps:
A: preparing a data set, pre-marking samples randomly sampled in the data set, and storing box boxes of each character of the samples;
B: manually correcting the box with inaccurate pre-labeling, and generating a heat map label in a Gaussian heat map form according to the box;
c: defining a neural network structure and a loss function loss cross;
d: performing preliminary pre-training by adopting the determined network structure and the loss function loss cross in the step C;
E: extending a training sample set of an actual scene;
F: performing self-adaptive binarization operation on the training sample set expanded in the step E, and calculating Hu moment feature vectors of each character, wherein an orientation quantity average value is used as an auxiliary label of the character;
g: modifying a loss function form, adding a regular term branch, and performing fine tuning training by using the modified loss function loss by using the extended training sample set;
H: model test and verification, namely modifying parameters theta of the Gaussian heat map generated by the pre-labeling, and drawing an accuracy change curve of a character box under different theta thresholds, so that proper parameters theta are selected according to requirements.
Further, the method comprises the steps of,
The data set in the step A mainly comprises ICDAR2017, ICDAR2019 and data in CTW, and a public character segmentation model trained by EasyOCR is adopted to pre-label samples randomly sampled in the data set.
Further, the method comprises the steps of,
The pre-labeling inaccuracy in the step B specifically refers to over-segmentation or under-segmentation of the character box;
The over-segmentation means that the character box does not contain all the current characters into the box, and the under-segmentation means that other characters or symbols except the current characters exist in the character box.
Further, the method comprises the steps of,
And B, mapping the box frame to a two-dimensional Gaussian graph by adopting perspective transformation to generate a Gaussian thermal graph type label.
Further, the method comprises the steps of,
The specific operation of determining the neural network structure in the step C is as follows:
the network inputs samples with preset sizes, adopts a VGG16 reference network as a characteristic extraction network and takes U-net as a decoding network;
outputting a pixel score matrix representing the confidence region;
The loss function loss cross in step C is determined by the following method:
The loss function loss cross employs pixel level cross entropy loss, i.e., by setting the theta threshold for the label heat map, greater than the theta threshold is considered a character region, represented by category 1, and less than the theta threshold is represented by non-character region, represented by category 0.
Further, the method comprises the steps of,
The method for expanding the training sample set of the actual scene in the step E is to shoot interfaces containing documents of a computer screen at random screenshot or different angles, pretag the interfaces with a pretrained model, and manually correct the interfaces in the step B.
Further, the method comprises the steps of,
The theta threshold is obtained by the following steps:
carrying out Gaussian smoothing on the thermal icon label, and calculating a gradient map of the thermal icon label;
Determining communication areas under different thresholds according to a watershed algorithm, and taking the minimum circumscribed rectangle under each communication area, namely a character frame under the threshold;
And randomly counting and sampling a plurality of words, judging the accuracy of the minimum external frame under the corresponding different thresholds, and taking the threshold with the highest accuracy as the theta threshold.
Further, the method comprises the steps of,
The modified loss function loss in step G is the loss function loss cross in step C plus
Loss of L2: loss=loss cross+m*lossL2
Wherein the method comprises the steps ofThe L2 loss representing the moment characteristics of the samples, m represents the number of samples, K represents the number of characters of a single sample, y ij represents the mean value of moment characteristic vectors corresponding to the jth character in the ith sample, and f (x ij) represents the mean value of moment characteristic vectors corresponding to the jth character in the ith sample of network output prediction.
Further, the method comprises the steps of,
And (3) the sample in the test and verification of the model in the step (H) is a character in a text scene of the shot or screenshot of the arbitrarily selected computer document.
The invention has the advantages that:
The detection method of the invention provides a method for representing the center of a single character based on image moment characteristics and providing more robust auxiliary information, namely, the accuracy of a character box is improved by combining a Gaussian heat map and moment characteristics to form an optimized loss function, the character detection segmentation capability of a model is improved by combining a segmentation task (heat map label) and a regression task (moment characteristic label), and the problems of over segmentation and under segmentation of character frames are solved; in addition, a sample is synthesized through text scenes in the screen shots, a preliminary character text detection model is pre-trained, then pre-labeling is carried out in a real text sample, corrected text is carried out manually, moment characteristics of each character in the real sample are calculated, and the corrected text is used as a regular term of a loss function in training fine tuning. The preprocessing mode makes up the problem of insufficient character level labeling on one hand, and on the other hand, in the text scene of actual printing photographing or screenshot, the character detection generalization capability is better.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 illustrates a prior art character segmentation algorithm flow diagram;
FIG. 2 shows a flow chart of a character segmentation algorithm according to an embodiment of the present invention;
FIG. 3 shows an exemplary diagram of a sample tag Gao Situ of the present invention;
fig. 4 shows an exemplary diagram of an over-segmentation or under-segmentation phenomenon.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Since the sample background of the natural scene is complex, the calculated image moment features can cause deviation, so that the image moment feature values are calculated only for the screenshot of the background of the computer document or the shot specific scene. The characteristics of moments of different orders are also different, the origin moment or the center moment is used as the characteristic of the image, the characteristic cannot be guaranteed to have translation, rotation and proportion invariance at the same time, if the characteristic of the image is represented by the center moment only, the characteristic has translation invariance, and the normalized center moment has translation invariance, proportion invariance and rotation invariance, so that Hu moment vectors are used as auxiliary information, and more priori knowledge of a network is provided for training.
The invention discloses a method for detecting deep learning text characters based on image moment correction, which comprises the following steps:
Preparing a data set; the public Chinese data set used in the method mainly comprises ICDAR2017 data set, ICDAR2019 data set and CTW (wild Chinese text) data, wherein the CTW data has higher diversity and complexity, and comprises a plane text, a convex text, a city street view text, a village street view text, a text under a weak lighting condition, a long-distance text, a part of display text and the like; for each image, all Chinese characters are marked in the data set; for each Chinese character, the dataset is labeled with its character class, bounding box. Pre-labeling randomly sampled samples in the dataset by using a public character segmentation model trained by simple optical character recognition (EasyOCR), and storing a box of each character of each sample;
Developing a simple fine-tuning man-machine interaction labeling interface, which is similar to a target detection labeling tool, can automatically load pictures and json format labels of corresponding pictures, and then manually correcting some character box boxes with inaccurate pre-labeling in a pop-up dialog box, wherein inaccurate prediction refers to insufficient frames (over segmentation) of current characters or areas from the frames to adjacent characters, commas and the like (under segmentation), and specific examples can be seen in a rectangular box (over segmentation) and a darkened rectangular box (under segmentation) in fig. 4; and generating a Gaussian heat graph type label according to the box of the character, wherein in the step, the box of the character is mapped onto a two-dimensional Gaussian graph through perspective transformation so as to represent the heat graph label of the character, as shown in a sample label Gao Situ of fig. 3.
C defines network structure and loss function: the network inputs a sample with the size of h x w x 3, adopts a VGG16 reference network as a feature extraction network, takes an improved U-net as a decoding network, outputs a pixel score matrix (the specific structure is shown in figure 2) representing a confidence coefficient region, h represents the height of an image input into the network, w represents the width of the image input into the network, and 3 is the number of RGB channels;
The loss function uses pixel level cross entropy loss, and by setting a theta threshold to the label heat map, a character region is considered to be a type 1 if the theta threshold is larger, and a non-character region is considered to be a type 0 if the theta threshold is smaller.
Therefore, the accuracy of the parameter theta at different values needs to be compared, so that the best parameter theta is selected, and the theta threshold value is obtained by testing in an actual training sample by means of a watershed algorithm in graphics, and the method mainly comprises the following steps of:
firstly, carrying out Gaussian smoothing treatment on a label heat map, calculating a gradient map of the label heat map, then determining communication areas under different thresholds according to a watershed algorithm, taking the minimum circumscribed rectangle under each communication area (namely a character frame under the threshold), randomly counting and sampling a plurality of words, artificially and subjectively judging the accuracy of the minimum circumscribed frame under the threshold, and taking a threshold with relatively high accuracy as a theta threshold.
And D, pre-training, namely performing preliminary pre-training by adopting the network structure and the loss function defined in the step C.
E, expanding a training sample set of an actual scene, randomly capturing a screenshot or shooting interfaces, such as web pages, word documents and the like, containing documents of a computer screen under different angles, pre-marking by using a pre-trained model, and manually correcting by using the mode of the step B.
F, carrying out self-adaptive binarization on the sample expanded in the step E to obtain a binary image, then calculating Hu moment feature vectors of each character, and taking the average value of the Hu moment feature vectors as an auxiliary label of the character; in theory, the mean value of moment characteristics of a character area is not greatly different, compared with a non-character area, the moment characteristic value of the character area is much larger, moment characteristic branches are introduced, on one hand, the attention of a model can be more inclined to the character area, and the detection is facilitated; on the other hand, the moment characteristic mean value can guide the network to learn more accurate character frames, and is beneficial to segmentation.
G, modifying a loss function form, adding a regular term branch, and performing fine tuning training by using the modified loss function by using the expanded training sample set; model training is performed by using the expanded training samples, and the steps distinguish the details of pre-training as follows: modifying a loss function of the network, namely adding a regular term branch taking Hu moment feature vectors as auxiliary label information, and carrying out joint training after original cross entropy loss cross and L2 loss due to character frame moment vectors, wherein the value of m is 0.01-0.05;
loss=losscross+m*lossL2
Wherein the method comprises the steps of L2 loss representing the feature of the moment of the sample, m representing the number of samples, and K representing the number of characters of a single sample. y ij represents the mean value of the moment feature vector corresponding to the j-th character in the i-th sample, the mean value is taken as a moment feature label, f (x ij) represents the mean value of the moment feature vector corresponding to the j-th character in the i-th sample of the network output prediction, and L2 represents the least square error.
The H model test and verification, the model of the method is mainly used for improving the character detection problem of a text scene shot by a computer document, so that the test and verification are carried out by adopting a sample under the scene, and the accuracy of character segmentation is counted; since the pre-labeled thermodynamic diagram is affected by the parameter theta. Therefore, the accuracy of the parameter theta at different values needs to be compared, so that the best parameter theta is selected. And modifying the parameter theta of the Gaussian heat map generated by pre-labeling, and drawing an accuracy change curve of the character box under different theta thresholds, so that the proper parameter theta is selected according to the requirement.
FIG. 1 illustrates a character segmentation algorithm representative of the prior art
Scaling an input sample to a size of h x w x 3 as network input, adopting a VGG16 reference network as a feature extraction network, and enabling the higher the stage of the extraction network, the more abstract the corresponding generated feature map, i.e. reducing the size to 1/2; in order to fuse the information of the bottom layer features and the high layer features, the decoding network U-net will make the feature map of a certain output layer and the feature map of a certain stage of the extraction network have the same size through up-sampling, so as to perform merging and fusion, and finally output a pixel score matrix representing the character connection confidence coefficient region through a 1*1 convolution layer. The main idea is to predict character detection frames by using segmentation tasks, character connection confidence coefficient matrixes are added on output branches to solve the problem of character positioning of non-rectangular areas, and the synthesized character data sets are utilized to perform weak supervision learning to complete the pre-training tasks of the model, so that the character segmentation effect under the whole natural scene is improved.
FIG. 2 shows a character segmentation algorithm of the present method
The method is basically the same as the method in the aspect of network structure, the size and output of an input sample are different, the input size is h x w x 3 structure, a VGG16 reference network is adopted as a feature extraction network and a decoding network to fuse the features of a high layer and a bottom layer, a pixel score matrix representing character moment mean value vectors is output through a 1*1 convolution layer, and moment feature vectors are obtained through introducing full-connection layer branch output. The two branches mainly combine segmentation and regression tasks, and replace the box coordinates of target detection by moment features, so that the moment feature vectors are more robust in the positioning segmentation task of Chinese character texts with relatively consistent aspect ratios due to the characteristics of the moment feature vectors. By constructing a batch of data sets related to the actual application of the algorithm, such as a text data set of a computer shooting and screenshot scene; the method aims to solve the problem of character-level text detection; by using the concept of semantic segmentation, each character is labeled by adopting a Gaussian heat map, and the higher the pixel value of the heat map is, the closer the pixel is to the center point of the character.
Although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (7)

1. A method for detecting deep learning text characters based on image moment correction, the method comprising the steps of:
A: preparing a data set, pre-marking samples randomly sampled in the data set, and storing box boxes of each character of the samples;
B: manually correcting the box with inaccurate pre-labeling, and generating a heat map label in a Gaussian heat map form according to the box;
c: defining a neural network structure and a loss function loss cross;
d: performing preliminary pre-training by adopting the determined network structure and the loss function loss cross in the step C;
E: extending a training sample set of an actual scene;
F: performing self-adaptive binarization operation on the training sample set expanded in the step E, and calculating Hu moment feature vectors of each character, wherein an orientation quantity average value is used as an auxiliary label of the character;
G: modifying a loss function form, adding a regular term branch, and performing fine tuning training by using the modified loss function loss by using the extended training sample set; the modified loss function loss is the loss function loss cross plus L2 loss: loss=loss cross+m*lossL2
Wherein the method comprises the steps ofL2 loss representing sample moment characteristics, m represents the number of samples, K represents the number of characters of a single sample, yij represents the average value of moment characteristic vectors corresponding to the jth character in the ith sample, and f (xij) represents the average value of moment characteristic vectors corresponding to the jth character in the ith sample of network output prediction;
h: model test and verification, namely modifying a parameter theta of the Gaussian heat diagram generated by the pre-labeling, and drawing an accuracy rate change curve of a character box under different theta thresholds, so that a proper parameter theta is selected according to requirements; the theta threshold is obtained by the following steps:
carrying out Gaussian smoothing on the thermal icon label, and calculating a gradient map of the thermal icon label;
Determining communication areas under different thresholds according to a watershed algorithm, and taking the minimum circumscribed rectangle under each communication area, namely a character frame under the threshold;
And randomly counting and sampling a plurality of words, judging the accuracy of the minimum external frame under the corresponding different thresholds, and taking the threshold with the highest accuracy as the theta threshold.
2. The method for deep learning text character detection based on image moment correction as claimed in claim 1, wherein,
The data set in the step A mainly comprises ICDAR2017, ICDAR2019 and data in CTW, and a public character segmentation model trained by EasyOCR is adopted to pre-label samples randomly sampled in the data set.
3. The method for deep learning text character detection based on image moment correction as claimed in claim 1, wherein,
The pre-labeling inaccuracy in the step B specifically refers to over-segmentation or under-segmentation of the character box;
The over-segmentation means that the character box does not contain all the current characters into the box, and the under-segmentation means that other characters or symbols except the current characters exist in the character box.
4. The method for deep learning text character detection based on image moment correction as claimed in claim 1, wherein,
And B, mapping the box frame to a two-dimensional Gaussian graph by adopting perspective transformation to generate a Gaussian thermal graph type label.
5. The method for deep learning text character detection based on image moment correction as claimed in claim 1, wherein,
The specific operation of determining the neural network structure in the step C is as follows:
the network inputs samples with preset sizes, adopts a VGG16 reference network as a characteristic extraction network and takes U-net as a decoding network;
outputting a pixel score matrix representing the confidence region;
The loss function loss cross in step C is determined by the following method:
The loss function loss cross employs pixel level cross entropy loss, i.e., by setting the theta threshold for the label heat map, greater than the theta threshold is considered a character region, represented by category 1, and less than the theta threshold is represented by non-character region, represented by category 0.
6. The method for detecting deep learning text characters based on image moment correction according to any one of claims 1 to 5, wherein,
The method for expanding the training sample set of the actual scene in the step E is to shoot interfaces containing documents of a computer screen at random screenshot or different angles, pretag the interfaces with a pretrained model, and manually correct the interfaces in the step B.
7. The method for deep learning text character detection based on image moment correction as claimed in claim 6, wherein,
And (3) the sample in the test and verification of the model in the step (H) is a character in a text scene of the shot or screenshot of the arbitrarily selected computer document.
CN202011506599.8A 2020-12-18 2020-12-18 Deep learning text character detection method based on image moment correction Active CN112580507B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011506599.8A CN112580507B (en) 2020-12-18 2020-12-18 Deep learning text character detection method based on image moment correction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011506599.8A CN112580507B (en) 2020-12-18 2020-12-18 Deep learning text character detection method based on image moment correction

Publications (2)

Publication Number Publication Date
CN112580507A CN112580507A (en) 2021-03-30
CN112580507B true CN112580507B (en) 2024-05-31

Family

ID=75136268

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011506599.8A Active CN112580507B (en) 2020-12-18 2020-12-18 Deep learning text character detection method based on image moment correction

Country Status (1)

Country Link
CN (1) CN112580507B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113221867A (en) * 2021-05-11 2021-08-06 北京邮电大学 Deep learning-based PCB image character detection method
CN113313720B (en) * 2021-06-30 2024-03-29 上海商汤科技开发有限公司 Object segmentation method and device
CN113743416B (en) * 2021-08-24 2024-03-05 的卢技术有限公司 Data enhancement method for non-real sample situation in OCR field
CN114579046B (en) * 2022-01-21 2024-01-02 南华大学 Cloud storage similar data detection method and system
CN117649672B (en) * 2024-01-30 2024-04-26 湖南大学 Font type visual detection method and system based on active learning and transfer learning

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104899821A (en) * 2015-05-27 2015-09-09 合肥高维数据技术有限公司 Method for erasing visible watermark of document image
WO2017185257A1 (en) * 2016-04-27 2017-11-02 北京中科寒武纪科技有限公司 Device and method for performing adam gradient descent training algorithm
RU2656708C1 (en) * 2017-06-29 2018-06-06 Самсунг Электроникс Ко., Лтд. Method for separating texts and illustrations in images of documents using a descriptor of document spectrum and two-level clustering
CN108399421A (en) * 2018-01-31 2018-08-14 南京邮电大学 A kind of zero sample classification method of depth of word-based insertion
EP3422254A1 (en) * 2017-06-29 2019-01-02 Samsung Electronics Co., Ltd. Method and apparatus for separating text and figures in document images
EP3499457A1 (en) * 2017-12-15 2019-06-19 Samsung Display Co., Ltd System and method of defect detection on a display
CN110717492A (en) * 2019-10-16 2020-01-21 电子科技大学 Method for correcting direction of character string in drawing based on joint features
WO2020046960A1 (en) * 2018-08-31 2020-03-05 Alibaba Group Holding Limited System and method for optimizing damage detection results
CN111079638A (en) * 2019-12-13 2020-04-28 河北爱尔工业互联网科技有限公司 Target detection model training method, device and medium based on convolutional neural network
CN111222434A (en) * 2019-12-30 2020-06-02 深圳市爱协生科技有限公司 Method for obtaining evidence of synthesized face image based on local binary pattern and deep learning
CN111553346A (en) * 2020-04-26 2020-08-18 佛山市南海区广工大数控装备协同创新研究院 Scene text detection method based on character region perception

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7958068B2 (en) * 2007-12-12 2011-06-07 International Business Machines Corporation Method and apparatus for model-shared subspace boosting for multi-label classification

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104899821A (en) * 2015-05-27 2015-09-09 合肥高维数据技术有限公司 Method for erasing visible watermark of document image
WO2017185257A1 (en) * 2016-04-27 2017-11-02 北京中科寒武纪科技有限公司 Device and method for performing adam gradient descent training algorithm
RU2656708C1 (en) * 2017-06-29 2018-06-06 Самсунг Электроникс Ко., Лтд. Method for separating texts and illustrations in images of documents using a descriptor of document spectrum and two-level clustering
EP3422254A1 (en) * 2017-06-29 2019-01-02 Samsung Electronics Co., Ltd. Method and apparatus for separating text and figures in document images
EP3499457A1 (en) * 2017-12-15 2019-06-19 Samsung Display Co., Ltd System and method of defect detection on a display
CN108399421A (en) * 2018-01-31 2018-08-14 南京邮电大学 A kind of zero sample classification method of depth of word-based insertion
WO2020046960A1 (en) * 2018-08-31 2020-03-05 Alibaba Group Holding Limited System and method for optimizing damage detection results
CN110717492A (en) * 2019-10-16 2020-01-21 电子科技大学 Method for correcting direction of character string in drawing based on joint features
CN111079638A (en) * 2019-12-13 2020-04-28 河北爱尔工业互联网科技有限公司 Target detection model training method, device and medium based on convolutional neural network
CN111222434A (en) * 2019-12-30 2020-06-02 深圳市爱协生科技有限公司 Method for obtaining evidence of synthesized face image based on local binary pattern and deep learning
CN111553346A (en) * 2020-04-26 2020-08-18 佛山市南海区广工大数控装备协同创新研究院 Scene text detection method based on character region perception

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
A Segmentation Algorithm for Touching Character Based on the Invariant Moments and Profile Feature;Junming Chang 等;《2012 International Conference on Control Engineering and Communication Technology》;20121231;188-191 *
Promising Techniques for Anomaly Detection on Network Traffic;Tian, H 等;《COMPUTER SCIENCE AND INFORMATION SYSTEMS》;20171130;第14卷(第3期);597-609 *
一种基于图像矩和纹理特征的自然场景文本检测算法;杨玲玲 等;《小型微型计算机***》;20160630;第37卷(第06期);1313-1317 *
基于多尺度图像融合的新闻视频文字区域检测定位算法;章慧 等;《贵州大学学报(自然科学版)》;20121215;第29卷(第06期);86-90 *
基于栈式降噪自编码神经网络的车牌字符识别;贾文其 等;《计算机工程与设计》;20160331;第37卷(第03期);751-756 *
基于语义分割的食品标签文本检测;田萱 等;《农业机械学报》;20200831;第51卷(第08期);336-343 *
计算机游戏著作权保护问题研究;田辉;《中国博士学位论文全文数据库社会科学Ⅰ辑》;20190915(第(2019)09期);G117-3 *

Also Published As

Publication number Publication date
CN112580507A (en) 2021-03-30

Similar Documents

Publication Publication Date Title
CN112580507B (en) Deep learning text character detection method based on image moment correction
CN111723585B (en) Style-controllable image text real-time translation and conversion method
CN111325203B (en) American license plate recognition method and system based on image correction
WO2023134073A1 (en) Artificial intelligence-based image description generation method and apparatus, device, and medium
US11475681B2 (en) Image processing method, apparatus, electronic device and computer readable storage medium
CN103049763B (en) Context-constraint-based target identification method
CN110647829A (en) Bill text recognition method and system
CN113673338B (en) Automatic labeling method, system and medium for weak supervision of natural scene text image character pixels
CN108509881A (en) A kind of the Off-line Handwritten Chinese text recognition method of no cutting
CN110969129B (en) End-to-end tax bill text detection and recognition method
CN111368846B (en) Road ponding identification method based on boundary semantic segmentation
CN111914698B (en) Human body segmentation method, segmentation system, electronic equipment and storage medium in image
CN112287941B (en) License plate recognition method based on automatic character region perception
CN113158977B (en) Image character editing method for improving FANnet generation network
CN111738055A (en) Multi-class text detection system and bill form detection method based on same
CN112069900A (en) Bill character recognition method and system based on convolutional neural network
CN111523622B (en) Method for simulating handwriting by mechanical arm based on characteristic image self-learning
CN111353544A (en) Improved Mixed Pooling-Yolov 3-based target detection method
CN112070174A (en) Text detection method in natural scene based on deep learning
CN112070040A (en) Text line detection method for video subtitles
CN111507337A (en) License plate recognition method based on hybrid neural network
CN114612732A (en) Sample data enhancement method, system and device, medium and target classification method
CN114943888B (en) Sea surface small target detection method based on multi-scale information fusion
CN113361467A (en) License plate recognition method based on field adaptation
CN110991374B (en) Fingerprint singular point detection method based on RCNN

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant