CN111753812A - Text recognition method and equipment - Google Patents
Text recognition method and equipment Download PDFInfo
- Publication number
- CN111753812A CN111753812A CN202010752292.XA CN202010752292A CN111753812A CN 111753812 A CN111753812 A CN 111753812A CN 202010752292 A CN202010752292 A CN 202010752292A CN 111753812 A CN111753812 A CN 111753812A
- Authority
- CN
- China
- Prior art keywords
- text
- boundary
- bounding box
- curve
- feature map
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 56
- 238000012545 processing Methods 0.000 claims abstract description 32
- 238000010586 diagram Methods 0.000 claims abstract description 25
- 238000001514 detection method Methods 0.000 claims description 76
- 230000015654 memory Effects 0.000 claims description 23
- 230000008569 process Effects 0.000 abstract description 15
- 238000013461 design Methods 0.000 description 14
- 238000000605 extraction Methods 0.000 description 11
- 238000012549 training Methods 0.000 description 8
- 239000000284 extract Substances 0.000 description 4
- 230000005291 magnetic effect Effects 0.000 description 4
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 238000003062 neural network model Methods 0.000 description 2
- 238000011176 pooling Methods 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/049—Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/14—Image acquisition
- G06V30/148—Segmentation of character regions
- G06V30/153—Segmentation of character regions using recognition of characters or words
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a text recognition method and equipment, wherein the method comprises the following steps: the method comprises the steps of obtaining a curve text image to be recognized, extracting features of the curve text image to obtain a feature map, processing the feature map by adopting a target network model to obtain a text boundary box in the feature map, wherein the target network model is used for detecting a text boundary in the image, recognizing the text boundary box and obtaining a curve text in the text boundary box. The embodiment of the invention processes the characteristic diagram by using the target network model to obtain the text boundary box of the curve text in the characteristic diagram, namely the area where the curve text in the characteristic diagram is located, realizes the accurate positioning of the curve text, then performs text recognition on the text boundary box to obtain the curve text in the text boundary box, realizes the automatic recognition of the curve text, and improves the efficiency and the accuracy of the curve text recognition.
Description
Technical Field
The embodiment of the invention relates to the technical field of images, in particular to a text recognition method and text recognition equipment.
Background
The frame number is a unique identification code of the vehicle, and is generally formed by combining 17 digits and letters, and some of the frame numbers are arc-shaped (for example, double-row arc-shaped frame numbers), namely, are arranged in a curve. Currently, when a frame number of a vehicle needs to be recorded (for example, a vehicle dealer is counting vehicle inventory), a relevant person is required to manually identify the frame number of the vehicle and perform recording.
However, the inventors found that at least the following problems exist in the prior art: because need artifical discernment frame number, lead to the recognition rate of frame number lower, and the mistake appears easily to cause the recognition efficiency and the rate of accuracy of frame number to hang down.
Disclosure of Invention
The embodiment of the invention provides a text recognition method and text recognition equipment, and aims to solve the problems of low recognition efficiency and low accuracy in the prior art.
In a first aspect, an embodiment of the present invention provides a text recognition method, including:
acquiring a curve text image to be identified, and extracting the characteristics of the curve text image to obtain a characteristic diagram;
processing the feature map by adopting a target network model to obtain a text boundary box in the feature map, wherein the target network model is used for detecting a text boundary in an image;
and identifying the text boundary box to obtain a curve text in the text boundary box.
In one possible design, the performing feature extraction on the curved text image to obtain a feature map includes:
and extracting the characteristics of the curve text image by adopting a shared network model to obtain the characteristic diagram, wherein a lower layer network layer in the shared network model is connected with a higher layer network layer.
In a possible design, if the target network model includes a rectangular bounding box detection model and a boundary point detection model, the processing the feature map by using the target network model to obtain the text bounding box in the feature map includes:
performing text box detection on the feature map by using the rectangular bounding box detection model to obtain a text rectangular bounding box in the feature map, wherein the text rectangular bounding box comprises a curve text, and the rectangular bounding box detection model is used for extracting the text rectangular bounding box in an image;
carrying out boundary point detection on the text rectangular bounding box by adopting the boundary point detection model to obtain boundary points of the curve text in the text rectangular bounding box, wherein the boundary point detection model is used for detecting the boundary points of the text;
and obtaining the text bounding box according to the boundary points and the feature map.
In a possible design, if the curved text image includes a car frame number image, the recognizing the text bounding box to obtain the curved text in the text bounding box includes:
and recognizing the frame number of the text boundary box by adopting a text recognition model so as to obtain the frame number in the text boundary box.
In one possible design, the method further includes:
determining whether the curve text in the text bounding box is in a horizontal state;
and if not, horizontally correcting the text bounding box.
In a possible design, the performing boundary point detection on the text rectangle bounding box by using the boundary point detection model to obtain boundary points of a curve text in the text rectangle bounding box includes:
detecting the boundary points of the rectangular text bounding box by adopting the boundary point detection model to obtain the offset distance between the boundary points and the equal division points of the long edges of the rectangular text bounding box;
and obtaining the coordinates of the long-edge equally-dividing points, and determining the coordinates of the boundary points according to the coordinates of the long-edge equally-dividing points and the offset distance.
In one possible design, the obtaining the text bounding box according to the boundary point and the feature map includes:
determining a curve text boundary in the feature map based on the coordinates of the boundary points;
and connecting the curve text boundaries to obtain the text boundary box.
In a second aspect, an embodiment of the present invention provides a text recognition apparatus, including:
the image acquisition module is used for acquiring a curve text image to be identified and extracting the characteristics of the curve text image to obtain a characteristic diagram;
the processing module is used for processing the feature map by adopting a target network model to obtain a text boundary box in the feature map, wherein the target network model is used for detecting a text boundary in an image;
the processing module is further configured to identify the text bounding box to obtain a curve text in the text bounding box.
In one possible design, the image acquisition module is further configured to:
and extracting the characteristics of the curve text image by adopting a shared network model to obtain the characteristic diagram, wherein a lower layer network layer in the shared network model is connected with a higher layer network layer.
In one possible design, the target network model includes a rectangular bounding box detection model and a boundary point detection model, and the processing module is further configured to:
performing text box detection on the feature map by using the rectangular bounding box detection model to obtain a text rectangular bounding box in the feature map, wherein the text rectangular bounding box comprises a curve text, and the rectangular bounding box detection model is used for extracting the text rectangular bounding box in an image; carrying out boundary point detection on the text rectangular bounding box by adopting the boundary point detection model to obtain boundary points of the curve text in the text rectangular bounding box, wherein the boundary point detection model is used for detecting the boundary points of the text; and obtaining the text bounding box according to the boundary points and the feature map.
In one possible design, the curved text image includes a frame number image, and the processing module is further configured to:
and recognizing the frame number of the text boundary box by adopting a text recognition model so as to obtain the frame number in the text boundary box.
In one possible design, the processing module is further to:
determining whether the curve text in the text bounding box is in a horizontal state; and if not, horizontally correcting the text bounding box.
In one possible design, the processing module is further to:
detecting the boundary points of the rectangular text bounding box by adopting the boundary point detection model to obtain the offset distance between the boundary points and the equal division points of the long edges of the rectangular text bounding box; and obtaining the coordinates of the long-edge equally-dividing points, and determining the coordinates of the boundary points according to the coordinates of the long-edge equally-dividing points and the offset distance.
In one possible design, the processing module is further to:
determining a curve text boundary in the feature map based on the coordinates of the boundary points; and connecting the curve text boundaries to obtain the text boundary box.
In a third aspect, an embodiment of the present invention provides an electronic device, including: at least one processor and memory;
the memory stores computer-executable instructions;
the at least one processor executing the computer-executable instructions stored by the memory causes the at least one processor to perform the text recognition method as described above in the first aspect and various possible designs of the first aspect.
In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, where computer-executable instructions are stored, and when a processor executes the computer-executable instructions, the text recognition method according to the first aspect and various possible designs of the first aspect is implemented.
The text recognition method and the text recognition equipment provided by the invention have the advantages that the curve text image to be recognized is obtained, the characteristic extraction is carried out on the curve text image to obtain the characteristic diagram, the characteristic diagram is processed by adopting the target network model to obtain the text boundary box in the characteristic diagram, wherein the target network model is used for detecting the text boundary in the image and recognizing the text boundary box to obtain the curve text in the text boundary box. The method and the device have the advantages that the characteristic graph which can represent the characteristics of the curve text image is obtained by extracting the characteristics of the curve text image to be recognized, the characteristic graph is processed by the target network model, the text boundary box of the curve text in the characteristic graph is obtained, the area where the curve text in the characteristic graph is located is obtained, the curve text is accurately positioned, text recognition is conducted on the text boundary box, the curve text in the text boundary box is obtained, automatic recognition of the curve text is achieved, the efficiency and the accuracy of the curve text recognition are improved, manual recognition of the curve text in the image is not needed, and the problems of low existing recognition efficiency and accuracy are solved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a diagram of a curved text provided by an embodiment of the present invention;
fig. 2 is a first flowchart of a text recognition method according to an embodiment of the present invention;
fig. 3 is a schematic flowchart of a text recognition method according to an embodiment of the present invention;
FIG. 4 is a diagram illustrating a shared network model according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of boundary points provided by an embodiment of the present invention;
FIG. 6 is a first diagram illustrating a text box according to an embodiment of the present invention;
fig. 7 is a third schematic flowchart of a text recognition method according to an embodiment of the present invention;
FIG. 8 is a second diagram of a text box according to an embodiment of the present invention;
fig. 9 is a schematic structural diagram of a text recognition apparatus according to an embodiment of the present invention;
fig. 10 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The frame numbers are generally arcuate, i.e., arranged in a curve (such as the frame numbers shown in fig. 1). Currently, when a frame number of a vehicle needs to be recorded (for example, a vehicle dealer is counting vehicle inventory), a relevant person is required to manually identify the frame number of the vehicle and perform recording. However, the frame number needs to be manually identified, so that the identification speed of the frame number is low, and errors are easy to occur, so that the identification efficiency and accuracy of the frame number are low.
Therefore, aiming at the problems, the technical idea of the invention is to detect and identify the frame number in the image based on an end-to-end detection and identification network framework, namely, to extract the characteristics of the frame number image based on a constructed shared network model to obtain a corresponding characteristic map, and the lower network layer and the upper network layer of the shared network model are connected, so that the obtained characteristic map can better reflect the characteristics of the frame number image, the characteristic map is input into a rectangular bounding box detection model to obtain a text rectangular bounding box, then the frame number in the text rectangular bounding box is detected by using the boundary point detection model to obtain the coordinates of a boundary point, the coordinates of the boundary point are combined with the characteristic map to determine a text boundary box, thereby realizing the accurate positioning of the frame number, the text boundary box carries less background information, and the text boundary box is rotated, the method has the advantages that the text boundary frame is in a horizontal state, the text boundary frame in the horizontal state is input into the text recognition model, the text recognition model recognizes the frame number in the text boundary frame, automatic recognition of the frame number is achieved, the model used for detecting the frame number is fused with the model used for recognition, namely, the frame number in the image is detected and recognized by the end-to-end detection and recognition network frame, the video memory occupation required by the running of the electronic equipment can be effectively reduced, the influence of background information brought by positioning on the recognition process can be avoided, the speed of detecting and recognizing the frame number can be greatly increased, the accuracy of recognizing the frame number can be increased, the frame number does not need to be recognized manually, and the problems of low existing recognition efficiency and accuracy are avoided.
The following describes the technical solutions of the present disclosure and how to solve the above technical problems in detail by specific examples. Several of these specific examples may be combined with each other below, and some of the same or similar concepts or processes may not be repeated in some examples. Examples of the present disclosure will now be described with reference to the accompanying drawings.
Fig. 2 is a first flowchart illustrating a text recognition method according to an embodiment of the present invention, where an execution main body of the embodiment may be an electronic device, and the embodiment is not limited herein. As shown in fig. 2, the method includes:
s201, obtaining a curve text image to be identified, and performing feature extraction on the curve text image to obtain a feature map.
In this embodiment, a curved text image to be recognized is obtained, where the curved text image includes curved texts to be recognized, and the curved texts are texts arranged in a curve, such as car frame numbers.
The curved text image may be sent by a server or other terminals, or may be imported through an associated transmission device (e.g., a usb disk), and the source of the curved text image is not limited in the present invention.
Optionally, the curved text image is a frame number image, and the frame number image includes a frame number.
In this embodiment, after the curve text image is obtained, feature extraction is performed on the curve text image to obtain a feature map corresponding to the curve text image, and the feature map can better reflect the features of the curve text image, that is, can better protrude the curve text in the curve text image, so that the subsequent positioning and identification of the curve text according to the feature map are more accurate.
S202, processing the feature graph by adopting a target network model to obtain a text boundary box in the feature graph, wherein the target network model is used for detecting the text boundary in the image.
In this embodiment, after obtaining the feature map corresponding to the curve text image, the target network model is used to process the feature map, that is, the text boundary of the curve text in the feature map is detected to obtain the text boundary box in the feature map, that is, the text region in the feature map is obtained, where the text region includes less background information, that is, includes less non-curve text information, so as to implement accurate positioning of the curve text, that is, implement automatic detection of the curve text, so as to identify the curve text by using the text boundary box, and implement accurate identification of the curve text.
The target network model comprises a rectangular surrounding frame detection model and a boundary point detection model.
And S203, identifying the text boundary box to obtain a curve text in the text boundary box.
In this embodiment, after obtaining the text bounding box in the feature map, that is, after obtaining the text region in the curved text image, text recognition is performed on the text bounding box to determine the curved text in the text bounding box, so as to implement recognition of the curved text.
From the above description, it can be known that feature extraction is performed on a curve text image to be recognized to obtain a feature map which can more represent features of the curve text image, the feature map is processed by using a target network model to obtain a text boundary box of a curve text in the feature map, that is, an area where the curve text in the feature map is located is obtained, accurate positioning of the curve text is realized, text recognition is performed on the text boundary box to obtain the curve text in the text boundary box, automatic recognition of the curve text is realized, efficiency and accuracy of curve text recognition are improved, manual recognition of the curve text in the image is not needed, and the problems of low existing recognition efficiency and accuracy are avoided.
Fig. 3 is a schematic flowchart of a second method for text recognition according to an embodiment of the present invention, in this embodiment, on the basis of the embodiment of fig. 2, after a feature map is obtained, a curve text portion in the feature map is located by using a relevant model, that is, boundary points of the curve text are detected, and a corresponding text boundary box is obtained, so that the curve text is determined according to the text boundary box. The following describes a process of detecting boundary points of a curve text by using a correlation model, with reference to a specific embodiment. As shown in fig. 3, the method includes:
s301, obtaining a curve text image to be identified, and performing feature extraction on the curve text image to obtain a feature map.
In this embodiment, when performing feature extraction on a curved text image, a shared network model is used to perform feature extraction on the curved text image, and the curved text image is input into the shared network model so that the shared network model performs feature extraction on the curved text image, so as to obtain a required feature map, where the feature map can better reflect a feature part in the curved text image, that is, can better protrude the curved text part in the curved text image, and reduce the influence of a noise part.
Wherein the shared network model includes a plurality of first network layers and a plurality of second network layers. The plurality of first network layers are connected in sequence. For each first network layer, the first network layer receives the feature map output by the last first network layer, and performs downsampling (i.e., dimension reduction) processing on the feature map to obtain a first feature map with a size equal to a preset size corresponding to the first network layer. The plurality of second network layers are connected in sequence, and each second network layer is connected with the corresponding first network layer. For each second network layer, the second network layer receives the first feature map output by the corresponding first network layer and receives the second feature map output by the previous second network layer, and performs up-sampling (i.e. dimension-increasing) processing on the first feature map and the second feature map to obtain a second feature map with a size of a preset size corresponding to the second network layer. And taking the second characteristic diagram input by the last second network layer as a required characteristic diagram. As shown in fig. 4, network layers 1, 2, 3, and 4 are all first network layers, and network layers 4, 5, 6, and 7 are all second network layers. Network layers 1, 2 and 3 are connected in sequence, network layers 5, 6 and 7 are connected in sequence, and network layers 5, 6 and 7 are connected with their corresponding first network layers, e.g., network layer 5 is connected with network layer 3. The size of the feature map output by each network layer is the corresponding size, for example, the size of the feature map output by the network layer 2 is one eighth of the original map. The feature map output by the last second network layer, i.e. the network layer 7, is taken as the required feature map.
The first network layer may include a convolution layer or a pooling layer, the second network layer may include a reverse convolution layer or a pooling layer, and a user may set a structure of the network layer according to implementation requirements.
Optionally, the shared network model is constructed based on a Segnet network framework, and the shared network model may perform comprehensive processing on the lowvelel semantic feature map and the high level semantic feature map, that is, perform comprehensive processing on the first feature map and the second feature map, so that the obtained feature maps can protrude the feature part of the curve text image.
S302, text box detection is carried out on the feature map by adopting a rectangular surrounding box detection model so as to obtain a text rectangular surrounding box in the feature map, wherein the text rectangular surrounding box comprises a curve text, and the rectangular surrounding box detection model is used for extracting the text rectangular surrounding box in the image.
And S303, carrying out boundary point detection on the rectangular text bounding box by adopting a boundary point detection model to obtain boundary points of the curve text in the rectangular text bounding box, wherein the boundary point detection model is used for detecting the boundary points of the text.
In this embodiment, a rectangular bounding box detection model is used to perform text box detection on a feature map, that is, the feature map is input into the rectangular bounding box detection model, the rectangular bounding box detection model extracts candidate regions in the feature map, and after candidate regions are extracted, regression is performed on the central offset, width, height and inclination angle of a target rectangular box, so as to finally obtain multi-directional rectangular box information that bounds text information, that is, a text rectangular bounding box in the feature map is obtained.
The rectangular bounding box detection model is actually a multidirectional rectangular bounding box detection model, and can extract a rectangular box surrounding a curve text in an image. The process of extracting the rectangular frame surrounding the curve text in the image by the rectangular surrounding frame detection model is the existing process, and is not repeated here.
In this embodiment, optionally, the rectangular bounding box detection model generates a network (RPN) model for the Region.
In this embodiment, after the rectangular text bounding box is obtained, there is more background noise in the rectangular text bounding box, that is, there are more non-curved text portions, so to improve the accuracy of subsequent text recognition, the rectangular text bounding box needs to be further segmented (that is, extracted) according to the boundary of the curved text to obtain a text bounding box including fewer non-curved text portions.
Specifically, when the rectangular text bounding box is further divided according to the boundary of the curved text, the rectangular text bounding box needs to be input into the boundary point detection model first, so that the boundary point detection model performs boundary point detection on the rectangular text bounding box to obtain the boundary points of the curved text in the rectangular text bounding box.
Optionally, the implementation manner of step S303 is: and detecting the boundary points of the rectangular text bounding box by adopting a boundary point detection model to obtain the offset distance between the boundary points and the equal division points of the long edges on the rectangular text bounding box. And obtaining the coordinates of the long-edge equally-dividing points, and determining the coordinates of the boundary points according to the coordinates of the long-edge equally-dividing points and the offset distance.
Specifically, a rectangular text bounding box is input into a boundary point detection model, the boundary point detection model extracts boundary points of a curve text in the rectangular text bounding box, each boundary point corresponds to a long-edge equal division point on one rectangular text bounding box, then for each boundary point, an offset distance between the boundary point and the corresponding long-edge equal division point is determined, an offset distance corresponding to the boundary point is obtained, and output is performed.
And for each boundary point, obtaining the coordinates of the boundary point according to the coordinates of the long-edge equally-divided points corresponding to the boundary point and the offset distance corresponding to the boundary point based on a preset coordinate calculation formula.
Wherein the offset distance comprises an abscissa offset distance and an ordinate offset distance. The preset coordinate calculation formula is(x′b,y′b) Is the coordinate of the boundary point, (x'd,y′d) Coordinates of equal points of the long side, w, corresponding to the boundary points0Is the width of the rectangular bounding box of the text (i.e., the length of the long side of the rectangular bounding box of the text), h0The height of the text rectangle bounding box (i.e., the length of the short side of the text rectangle bounding box).
Optionally, when determining the boundary point of the curve text, the boundary point may be determined by using a preset boundary point generating algorithm, where a code of the preset boundary point generating algorithm is as follows:
it should be noted that, the equal division point of the long edge on the text rectangle bounding box is a default point, which may be set according to actual requirements, for example, it is set as an eighth division point, and seven equal division points of the long edge exist on one long edge of the text rectangle bounding box.
In this embodiment, the rectangular bounding box detection model and the boundary point detection model are both trained network models, the rectangular bounding box detection model can accurately extract a text rectangular bounding box in an image, and the boundary point detection model can determine boundary points of a curve text in the text rectangular bounding box and offset distances between the boundary points and equal division points of a long edge of the text rectangular bounding box.
And S304, obtaining a text boundary box according to the boundary points and the feature map.
In this embodiment, the boundary points are mapped onto the feature map, and the portion surrounded by the boundary points on the feature map is extracted to obtain a text bounding box including a curved text, where the text bounding box includes less background noise than a rectangular bounding box, so that text recognition is performed in the following process, and accuracy of the text recognition is ensured.
In this embodiment, optionally, when mapping the boundary point onto the feature map, the coordinate of the boundary segment may be used for mapping, so as to extract a portion surrounded by the boundary point on the feature map to obtain the text bounding box, where the specific process is as follows: and determining the curve text boundary in the feature map based on the coordinates of the boundary points. And connecting the curve text boundaries to obtain a text boundary box.
Specifically, the position of the boundary point on the feature map is determined according to the coordinate of the boundary point, so that a curve text boundary is obtained, and the curve text boundary is composed of a plurality of boundary points. The boundary points on the boundary of the curved text are sequentially connected to obtain a text boundary box surrounding the curved text, for example, the boundary points in fig. 5 are sequentially connected to obtain the text boundary box in fig. 6.
S305, recognizing the text boundary box to obtain a curve text in the text boundary box.
In this embodiment, after the text bounding box is obtained, that is, after the curve text portion is obtained, text recognition is performed on the curve text portion to obtain a curve text in the curve text portion, so as to implement automatic recognition of the curve text.
When the text bounding box is recognized, the text recognition model can be used for recognizing the text.
In addition, in the present embodiment, a multitask-based loss is adopted for training network models in the end-to-end framework, such as a rectangular bounding box detection model, a boundary point detection model and a text recognition model. I.e. total lossRect+lossBound+lossREC。lossRectLoss representing rectangular bounding box detection modelBoundLoss representing boundary point detection modelRECRepresents the loss of the text recognition model.
In addition, when the network model in the end-to-end frame is trained, a training sample image marked manually is obtained, and the training sample is input into the end-to-end frame, so that the related network model utilizes the training sample to train the model.
Fig. 7 is a schematic flow chart of a text recognition method according to an embodiment of the present invention, where on the basis of any of the above embodiments, when recognizing a frame number in a frame number image, the embodiment needs to first detect a frame number region in the frame number image to obtain a corresponding text bounding box, implement accurate positioning of the frame number, and then recognize the text bounding box to obtain the frame number in the text bounding box, and implement automatic recognition of the frame number. The process of identifying a carriage number in a text bounding box is described below in conjunction with a specific embodiment. As shown in fig. 7, the method includes:
s701, acquiring a frame number image to be identified, and performing feature extraction on the frame number image to obtain a feature map.
S702, processing the feature map by adopting a target network model to obtain a text boundary box in the feature map, wherein the target network model is used for detecting the text boundary in the image.
The implementation process of steps S701 to S702 in this embodiment is similar to the implementation process of steps S201 to S202 in the embodiment of fig. 2, and is not described again here.
And S703, recognizing the frame number of the text boundary box by adopting a text recognition model so as to obtain the frame number in the text boundary box.
In this embodiment, when the curve text image includes the frame number image, it indicates that the frame number in the frame number image needs to be acquired, a text bounding box in the feature map corresponding to the frame number image is acquired, that is, after the frame number region in the feature map is acquired, the text bounding box includes the frame number, text recognition is performed on the text bounding box by using a trained text recognition model, that is, the text bounding box is input into the text recognition model, and the text recognition model performs text recognition on the text boundary, determines the frame number in the text bounding box, and implements automatic recognition of the frame number.
The text recognition model is a trained Long Short-term memory (LSTM) network model, and the trained LSTM network model can accurately recognize the frame number.
In addition, the training process of the text recognition model is similar to that of the existing neural network model, that is, a required training sample is obtained first, and then the basic neural network model is trained by using the training sample, so that the trained text recognition model is obtained, and the detailed description of the training process is omitted here.
In addition, in any embodiment, optionally, since the accuracy of the text recognition model recognizing the text in the horizontal line is higher, in order to improve the accuracy of the text recognition model recognizing the curved text in the text bounding box, that is, the frame number, after the text bounding box is obtained, it is determined whether the curved text in the text bounding box is in the horizontal state. If the text bounding box is not in the horizontal state, which indicates that the curved text in the text bounding box is not in the horizontal arrangement but in the curved arrangement (as shown in a in fig. 8), the curved text in the text bounding box is horizontally rectified so that the curved text is in the horizontal state, i.e. the curved text in the text bounding box is in the horizontal arrangement (as shown in b in fig. 8). And inputting the text bounding box with the curve text in the horizontal line into a text recognition model so that the text recognition model recognizes the text in the horizontal state in the text bounding box.
After the curve text in the text boundary box is determined to be in the horizontal state, the curve text is indicated to be in horizontal arrangement, and the text boundary box with the curve text in the horizontal state is directly input into a text recognition model to realize accurate recognition of the text.
In this embodiment, the end-to-end-based detection and identification network framework detects and identifies the frame number in the frame number image, that is, the text bounding box in the frame number image is determined by using the correlation model, the non-frame number part included in the text bounding box is less, so that the accurate positioning of the frame number is realized, that is, the accurate detection of the frame number is realized, then the text bounding box is identified by using the text identification model to obtain the corresponding frame number, so that the automatic identification of the frame number is realized, and because the text bounding box comprising less background is identified, the influence of the background part on the frame number can be reduced, so that the accuracy of the identification is ensured.
Based on the same idea, an embodiment of the present specification further provides a device corresponding to the foregoing method, and fig. 9 is a schematic structural diagram of a text recognition device provided in an embodiment of the present invention. As shown in fig. 9, the text recognition apparatus 90 includes: an image acquisition module 901 and a processing module 902.
The image obtaining module 901 is configured to obtain a curve text image to be identified, and perform feature extraction on the curve text image to obtain a feature map.
And the processing module 902 is configured to process the feature map by using a target network model to obtain a text bounding box in the feature map, where the target network model is used to detect a text boundary in the image.
The processing module 902 is further configured to identify the text bounding box to obtain a curve text in the text bounding box.
In another embodiment of the present invention, based on the embodiment shown in fig. 9, the target network model includes a rectangular bounding box detection model and a boundary point detection model, and the processing module 902 is further configured to:
and detecting the text box of the feature graph by adopting a rectangular surrounding box detection model so as to obtain the text rectangular surrounding box in the feature graph, wherein the text rectangular surrounding box comprises a curve text, and the rectangular surrounding box detection model is used for extracting the text rectangular surrounding box in the image. And carrying out boundary point detection on the text rectangular bounding box by adopting a boundary point detection model to obtain boundary points of the curve text in the text rectangular bounding box, wherein the boundary point detection model is used for detecting the boundary points of the text. And obtaining a text boundary box according to the boundary points and the feature map.
In this embodiment, optionally, the image obtaining module 901 is further configured to:
and extracting the characteristics of the curve text image by adopting a shared network model to obtain a characteristic diagram, wherein a lower layer network layer and a higher layer network layer in the shared network model are connected.
In this embodiment, optionally, the processing module 902 is further configured to:
and detecting the boundary points of the rectangular text bounding box by adopting a boundary point detection model to obtain the offset distance between the boundary points and the equal division points of the long edges on the rectangular text bounding box. And obtaining the coordinates of the long-edge equally-dividing points, and determining the coordinates of the boundary points according to the coordinates of the long-edge equally-dividing points and the offset distance.
In this embodiment, optionally, the processing module 902 is further configured to:
and determining the curve text boundary in the feature map based on the coordinates of the boundary points. And connecting the curve text boundaries to obtain a text boundary box.
In another embodiment of the present invention, based on the embodiment shown in fig. 9, the curved text image includes a frame number image, and the processing module 902 is further configured to:
and recognizing the frame number of the text boundary box by adopting a text recognition model so as to obtain the frame number in the text boundary box.
In addition, optionally, the processing module 902 is further configured to:
it is determined whether the curved text in the text bounding box is in a horizontal state. And if the text bounding box is not in a horizontal state, horizontally correcting the text bounding box.
The device provided in this embodiment may be used to implement the technical solution of the above method embodiment, and the implementation principle and technical effect are similar, which are not described herein again.
Fig. 10 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present invention. As shown in fig. 10, the electronic apparatus 100 of the present embodiment includes: a processor 1001 and a memory 1002; wherein the content of the first and second substances,
a memory 1002 for storing computer-executable instructions;
the processor 1001 is configured to execute the computer-executable instructions stored in the memory to implement the steps performed by the receiving device in the above embodiments. Reference may be made in particular to the description relating to the method embodiments described above.
Alternatively, the memory 1002 may be separate or integrated with the processor 1001.
When the memory 1002 is provided separately, the train user identification device further includes a bus 1003 for connecting the memory 1002 and the processor 1001.
The embodiment of the present invention further provides a computer-readable storage medium, where computer-executable instructions are stored in the computer-readable storage medium, and when a processor executes the computer-executable instructions, the text recognition method as described above is implemented.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described device embodiments are merely illustrative, and for example, the division of the modules is only one logical division, and other divisions may be realized in practice, for example, a plurality of modules may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or modules, and may be in an electrical, mechanical or other form.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each module may exist alone physically, or two or more modules are integrated into one unit. The unit formed by the modules can be realized in a hardware form, and can also be realized in a form of hardware and a software functional unit.
The integrated module implemented in the form of a software functional module may be stored in a computer-readable storage medium. The software functional module is stored in a storage medium and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute some steps of the methods according to the embodiments of the present application.
It should be understood that the Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present invention may be embodied directly in a hardware processor, or in a combination of the hardware and software modules within the processor.
The memory may comprise a high-speed RAM memory, and may further comprise a non-volatile storage NVM, such as at least one disk memory, and may also be a usb disk, a removable hard disk, a read-only memory, a magnetic or optical disk, etc.
The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, the buses in the figures of the present application are not limited to only one bus or one type of bus.
The storage medium may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.
An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in an Application Specific Integrated Circuits (ASIC). Of course, the processor and the storage medium may reside as discrete components in an electronic device or host device.
Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.
Claims (10)
1. A text recognition method, comprising:
acquiring a curve text image to be identified, and extracting the characteristics of the curve text image to obtain a characteristic diagram;
processing the feature map by adopting a target network model to obtain a text boundary box in the feature map, wherein the target network model is used for detecting a text boundary in an image;
and identifying the text boundary box to obtain a curve text in the text boundary box.
2. The method of claim 1, wherein the extracting the features of the curved text image to obtain a feature map comprises:
and extracting the characteristics of the curve text image by adopting a shared network model to obtain the characteristic diagram, wherein a lower layer network layer in the shared network model is connected with a higher layer network layer.
3. The method according to claim 1, wherein the target network model includes a rectangular bounding box detection model and a boundary point detection model, and the processing the feature map to obtain the text bounding box in the feature map by using the target network model includes:
performing text box detection on the feature map by using the rectangular bounding box detection model to obtain a text rectangular bounding box in the feature map, wherein the text rectangular bounding box comprises a curve text, and the rectangular bounding box detection model is used for extracting the text rectangular bounding box in an image;
carrying out boundary point detection on the text rectangular bounding box by adopting the boundary point detection model to obtain boundary points of the curve text in the text rectangular bounding box, wherein the boundary point detection model is used for detecting the boundary points of the text;
and obtaining the text bounding box according to the boundary points and the feature map.
4. The method according to any one of claims 1 to 3, wherein if the curved text image includes a car frame number image, the recognizing the text bounding box to obtain the curved text in the text bounding box comprises:
and recognizing the frame number of the text boundary box by adopting a text recognition model so as to obtain the frame number in the text boundary box.
5. The method of claim 1, further comprising:
determining whether the curve text in the text bounding box is in a horizontal state;
and if not, horizontally correcting the text bounding box.
6. The method according to claim 3, wherein the performing boundary point detection on the rectangular bounding box of the text by using the boundary point detection model to obtain boundary points of the curved text in the rectangular bounding box of the text comprises:
detecting the boundary points of the rectangular text bounding box by adopting the boundary point detection model to obtain the offset distance between the boundary points and the equal division points of the long edges of the rectangular text bounding box;
and obtaining the coordinates of the long-edge equally-dividing points, and determining the coordinates of the boundary points according to the coordinates of the long-edge equally-dividing points and the offset distance.
7. The method according to claim 6, wherein the obtaining the text bounding box according to the boundary point and the feature map comprises:
determining a curve text boundary in the feature map based on the coordinates of the boundary points;
and connecting the curve text boundaries to obtain the text boundary box.
8. A text recognition apparatus, comprising:
the image acquisition module is used for acquiring a curve text image to be identified and extracting the characteristics of the curve text image to obtain a characteristic diagram;
the processing module is used for processing the feature map by adopting a target network model to obtain a text boundary box in the feature map, wherein the target network model is used for detecting a text boundary in an image;
the processing module is further configured to identify the text bounding box to obtain a curve text in the text bounding box.
9. An electronic device, comprising: at least one processor and memory;
the memory stores computer-executable instructions;
the at least one processor executing the computer-executable instructions stored by the memory causes the at least one processor to perform the text recognition method of any of claims 1-7.
10. A computer-readable storage medium having computer-executable instructions stored thereon which, when executed by a processor, implement the text recognition method of any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010752292.XA CN111753812A (en) | 2020-07-30 | 2020-07-30 | Text recognition method and equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010752292.XA CN111753812A (en) | 2020-07-30 | 2020-07-30 | Text recognition method and equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111753812A true CN111753812A (en) | 2020-10-09 |
Family
ID=72712378
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010752292.XA Pending CN111753812A (en) | 2020-07-30 | 2020-07-30 | Text recognition method and equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111753812A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113673497A (en) * | 2021-07-21 | 2021-11-19 | 浙江大华技术股份有限公司 | Text detection method, terminal and computer readable storage medium thereof |
CN114581652A (en) * | 2020-12-01 | 2022-06-03 | 北京四维图新科技股份有限公司 | Target object detection method and device, electronic equipment and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019192397A1 (en) * | 2018-04-04 | 2019-10-10 | 华中科技大学 | End-to-end recognition method for scene text in any shape |
CN110751151A (en) * | 2019-10-12 | 2020-02-04 | 上海眼控科技股份有限公司 | Text character detection method and equipment for vehicle body image |
CN110837835A (en) * | 2019-10-29 | 2020-02-25 | 华中科技大学 | End-to-end scene text identification method based on boundary point detection |
CN110929665A (en) * | 2019-11-29 | 2020-03-27 | 河海大学 | Natural scene curve text detection method |
WO2020097909A1 (en) * | 2018-11-16 | 2020-05-22 | 北京比特大陆科技有限公司 | Text detection method and apparatus, and storage medium |
-
2020
- 2020-07-30 CN CN202010752292.XA patent/CN111753812A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019192397A1 (en) * | 2018-04-04 | 2019-10-10 | 华中科技大学 | End-to-end recognition method for scene text in any shape |
WO2020097909A1 (en) * | 2018-11-16 | 2020-05-22 | 北京比特大陆科技有限公司 | Text detection method and apparatus, and storage medium |
CN110751151A (en) * | 2019-10-12 | 2020-02-04 | 上海眼控科技股份有限公司 | Text character detection method and equipment for vehicle body image |
CN110837835A (en) * | 2019-10-29 | 2020-02-25 | 华中科技大学 | End-to-end scene text identification method based on boundary point detection |
CN110929665A (en) * | 2019-11-29 | 2020-03-27 | 河海大学 | Natural scene curve text detection method |
Non-Patent Citations (2)
Title |
---|
刘业鑫;卜巍;邬向前;: "基于文本中心线的自然场景文本检测方法", 智能计算机与应用, no. 02, 1 February 2020 (2020-02-01) * |
王建新;王子亚;田萱;: "基于深度学习的自然场景文本检测与识别综述", 软件学报, no. 05, 15 May 2020 (2020-05-15) * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114581652A (en) * | 2020-12-01 | 2022-06-03 | 北京四维图新科技股份有限公司 | Target object detection method and device, electronic equipment and storage medium |
CN113673497A (en) * | 2021-07-21 | 2021-11-19 | 浙江大华技术股份有限公司 | Text detection method, terminal and computer readable storage medium thereof |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107545241B (en) | Neural network model training and living body detection method, device and storage medium | |
US9171204B2 (en) | Method of perspective correction for devanagari text | |
CN110287854B (en) | Table extraction method and device, computer equipment and storage medium | |
CN111046859B (en) | Character recognition method and device | |
CN111353501A (en) | Book point-reading method and system based on deep learning | |
CN110647882A (en) | Image correction method, device, equipment and storage medium | |
CN108717744B (en) | Method and device for identifying seal serial number on financial document and terminal equipment | |
CN110443242B (en) | Reading frame detection method, target recognition model training method and related device | |
CN111967286A (en) | Method and device for identifying information bearing medium, computer equipment and medium | |
CN108734161B (en) | Method, device and equipment for identifying prefix number area and storage medium | |
CN111753812A (en) | Text recognition method and equipment | |
US11651604B2 (en) | Word recognition method, apparatus and storage medium | |
CN112668580A (en) | Text recognition method, text recognition device and terminal equipment | |
CN115631112A (en) | Building contour correction method and device based on deep learning | |
CN109829383B (en) | Palmprint recognition method, palmprint recognition device and computer equipment | |
CN114445843A (en) | Card image character recognition method and device of fixed format | |
CN112560856B (en) | License plate detection and identification method, device, equipment and storage medium | |
CN110909816B (en) | Picture identification method and device | |
CN110969640A (en) | Video image segmentation method, terminal device and computer-readable storage medium | |
CN112949649A (en) | Text image identification method and device and computing equipment | |
WO2020244076A1 (en) | Face recognition method and apparatus, and electronic device and storage medium | |
CN114495132A (en) | Character recognition method, device, equipment and storage medium | |
CN114299509A (en) | Method, device, equipment and medium for acquiring information | |
CN110751158B (en) | Digital identification method, device and storage medium in therapeutic bed display | |
CN114120305A (en) | Training method of text classification model, and recognition method and device of text content |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |