CN111753812A - Text recognition method and equipment - Google Patents

Text recognition method and equipment Download PDF

Info

Publication number
CN111753812A
CN111753812A CN202010752292.XA CN202010752292A CN111753812A CN 111753812 A CN111753812 A CN 111753812A CN 202010752292 A CN202010752292 A CN 202010752292A CN 111753812 A CN111753812 A CN 111753812A
Authority
CN
China
Prior art keywords
text
boundary
bounding box
curve
feature map
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010752292.XA
Other languages
Chinese (zh)
Inventor
丁子凡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Eye Control Technology Co Ltd
Original Assignee
Shanghai Eye Control Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Eye Control Technology Co Ltd filed Critical Shanghai Eye Control Technology Co Ltd
Priority to CN202010752292.XA priority Critical patent/CN111753812A/en
Publication of CN111753812A publication Critical patent/CN111753812A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a text recognition method and equipment, wherein the method comprises the following steps: the method comprises the steps of obtaining a curve text image to be recognized, extracting features of the curve text image to obtain a feature map, processing the feature map by adopting a target network model to obtain a text boundary box in the feature map, wherein the target network model is used for detecting a text boundary in the image, recognizing the text boundary box and obtaining a curve text in the text boundary box. The embodiment of the invention processes the characteristic diagram by using the target network model to obtain the text boundary box of the curve text in the characteristic diagram, namely the area where the curve text in the characteristic diagram is located, realizes the accurate positioning of the curve text, then performs text recognition on the text boundary box to obtain the curve text in the text boundary box, realizes the automatic recognition of the curve text, and improves the efficiency and the accuracy of the curve text recognition.

Description

Text recognition method and equipment
Technical Field
The embodiment of the invention relates to the technical field of images, in particular to a text recognition method and text recognition equipment.
Background
The frame number is a unique identification code of the vehicle, and is generally formed by combining 17 digits and letters, and some of the frame numbers are arc-shaped (for example, double-row arc-shaped frame numbers), namely, are arranged in a curve. Currently, when a frame number of a vehicle needs to be recorded (for example, a vehicle dealer is counting vehicle inventory), a relevant person is required to manually identify the frame number of the vehicle and perform recording.
However, the inventors found that at least the following problems exist in the prior art: because need artifical discernment frame number, lead to the recognition rate of frame number lower, and the mistake appears easily to cause the recognition efficiency and the rate of accuracy of frame number to hang down.
Disclosure of Invention
The embodiment of the invention provides a text recognition method and text recognition equipment, and aims to solve the problems of low recognition efficiency and low accuracy in the prior art.
In a first aspect, an embodiment of the present invention provides a text recognition method, including:
acquiring a curve text image to be identified, and extracting the characteristics of the curve text image to obtain a characteristic diagram;
processing the feature map by adopting a target network model to obtain a text boundary box in the feature map, wherein the target network model is used for detecting a text boundary in an image;
and identifying the text boundary box to obtain a curve text in the text boundary box.
In one possible design, the performing feature extraction on the curved text image to obtain a feature map includes:
and extracting the characteristics of the curve text image by adopting a shared network model to obtain the characteristic diagram, wherein a lower layer network layer in the shared network model is connected with a higher layer network layer.
In a possible design, if the target network model includes a rectangular bounding box detection model and a boundary point detection model, the processing the feature map by using the target network model to obtain the text bounding box in the feature map includes:
performing text box detection on the feature map by using the rectangular bounding box detection model to obtain a text rectangular bounding box in the feature map, wherein the text rectangular bounding box comprises a curve text, and the rectangular bounding box detection model is used for extracting the text rectangular bounding box in an image;
carrying out boundary point detection on the text rectangular bounding box by adopting the boundary point detection model to obtain boundary points of the curve text in the text rectangular bounding box, wherein the boundary point detection model is used for detecting the boundary points of the text;
and obtaining the text bounding box according to the boundary points and the feature map.
In a possible design, if the curved text image includes a car frame number image, the recognizing the text bounding box to obtain the curved text in the text bounding box includes:
and recognizing the frame number of the text boundary box by adopting a text recognition model so as to obtain the frame number in the text boundary box.
In one possible design, the method further includes:
determining whether the curve text in the text bounding box is in a horizontal state;
and if not, horizontally correcting the text bounding box.
In a possible design, the performing boundary point detection on the text rectangle bounding box by using the boundary point detection model to obtain boundary points of a curve text in the text rectangle bounding box includes:
detecting the boundary points of the rectangular text bounding box by adopting the boundary point detection model to obtain the offset distance between the boundary points and the equal division points of the long edges of the rectangular text bounding box;
and obtaining the coordinates of the long-edge equally-dividing points, and determining the coordinates of the boundary points according to the coordinates of the long-edge equally-dividing points and the offset distance.
In one possible design, the obtaining the text bounding box according to the boundary point and the feature map includes:
determining a curve text boundary in the feature map based on the coordinates of the boundary points;
and connecting the curve text boundaries to obtain the text boundary box.
In a second aspect, an embodiment of the present invention provides a text recognition apparatus, including:
the image acquisition module is used for acquiring a curve text image to be identified and extracting the characteristics of the curve text image to obtain a characteristic diagram;
the processing module is used for processing the feature map by adopting a target network model to obtain a text boundary box in the feature map, wherein the target network model is used for detecting a text boundary in an image;
the processing module is further configured to identify the text bounding box to obtain a curve text in the text bounding box.
In one possible design, the image acquisition module is further configured to:
and extracting the characteristics of the curve text image by adopting a shared network model to obtain the characteristic diagram, wherein a lower layer network layer in the shared network model is connected with a higher layer network layer.
In one possible design, the target network model includes a rectangular bounding box detection model and a boundary point detection model, and the processing module is further configured to:
performing text box detection on the feature map by using the rectangular bounding box detection model to obtain a text rectangular bounding box in the feature map, wherein the text rectangular bounding box comprises a curve text, and the rectangular bounding box detection model is used for extracting the text rectangular bounding box in an image; carrying out boundary point detection on the text rectangular bounding box by adopting the boundary point detection model to obtain boundary points of the curve text in the text rectangular bounding box, wherein the boundary point detection model is used for detecting the boundary points of the text; and obtaining the text bounding box according to the boundary points and the feature map.
In one possible design, the curved text image includes a frame number image, and the processing module is further configured to:
and recognizing the frame number of the text boundary box by adopting a text recognition model so as to obtain the frame number in the text boundary box.
In one possible design, the processing module is further to:
determining whether the curve text in the text bounding box is in a horizontal state; and if not, horizontally correcting the text bounding box.
In one possible design, the processing module is further to:
detecting the boundary points of the rectangular text bounding box by adopting the boundary point detection model to obtain the offset distance between the boundary points and the equal division points of the long edges of the rectangular text bounding box; and obtaining the coordinates of the long-edge equally-dividing points, and determining the coordinates of the boundary points according to the coordinates of the long-edge equally-dividing points and the offset distance.
In one possible design, the processing module is further to:
determining a curve text boundary in the feature map based on the coordinates of the boundary points; and connecting the curve text boundaries to obtain the text boundary box.
In a third aspect, an embodiment of the present invention provides an electronic device, including: at least one processor and memory;
the memory stores computer-executable instructions;
the at least one processor executing the computer-executable instructions stored by the memory causes the at least one processor to perform the text recognition method as described above in the first aspect and various possible designs of the first aspect.
In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, where computer-executable instructions are stored, and when a processor executes the computer-executable instructions, the text recognition method according to the first aspect and various possible designs of the first aspect is implemented.
The text recognition method and the text recognition equipment provided by the invention have the advantages that the curve text image to be recognized is obtained, the characteristic extraction is carried out on the curve text image to obtain the characteristic diagram, the characteristic diagram is processed by adopting the target network model to obtain the text boundary box in the characteristic diagram, wherein the target network model is used for detecting the text boundary in the image and recognizing the text boundary box to obtain the curve text in the text boundary box. The method and the device have the advantages that the characteristic graph which can represent the characteristics of the curve text image is obtained by extracting the characteristics of the curve text image to be recognized, the characteristic graph is processed by the target network model, the text boundary box of the curve text in the characteristic graph is obtained, the area where the curve text in the characteristic graph is located is obtained, the curve text is accurately positioned, text recognition is conducted on the text boundary box, the curve text in the text boundary box is obtained, automatic recognition of the curve text is achieved, the efficiency and the accuracy of the curve text recognition are improved, manual recognition of the curve text in the image is not needed, and the problems of low existing recognition efficiency and accuracy are solved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a diagram of a curved text provided by an embodiment of the present invention;
fig. 2 is a first flowchart of a text recognition method according to an embodiment of the present invention;
fig. 3 is a schematic flowchart of a text recognition method according to an embodiment of the present invention;
FIG. 4 is a diagram illustrating a shared network model according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of boundary points provided by an embodiment of the present invention;
FIG. 6 is a first diagram illustrating a text box according to an embodiment of the present invention;
fig. 7 is a third schematic flowchart of a text recognition method according to an embodiment of the present invention;
FIG. 8 is a second diagram of a text box according to an embodiment of the present invention;
fig. 9 is a schematic structural diagram of a text recognition apparatus according to an embodiment of the present invention;
fig. 10 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The frame numbers are generally arcuate, i.e., arranged in a curve (such as the frame numbers shown in fig. 1). Currently, when a frame number of a vehicle needs to be recorded (for example, a vehicle dealer is counting vehicle inventory), a relevant person is required to manually identify the frame number of the vehicle and perform recording. However, the frame number needs to be manually identified, so that the identification speed of the frame number is low, and errors are easy to occur, so that the identification efficiency and accuracy of the frame number are low.
Therefore, aiming at the problems, the technical idea of the invention is to detect and identify the frame number in the image based on an end-to-end detection and identification network framework, namely, to extract the characteristics of the frame number image based on a constructed shared network model to obtain a corresponding characteristic map, and the lower network layer and the upper network layer of the shared network model are connected, so that the obtained characteristic map can better reflect the characteristics of the frame number image, the characteristic map is input into a rectangular bounding box detection model to obtain a text rectangular bounding box, then the frame number in the text rectangular bounding box is detected by using the boundary point detection model to obtain the coordinates of a boundary point, the coordinates of the boundary point are combined with the characteristic map to determine a text boundary box, thereby realizing the accurate positioning of the frame number, the text boundary box carries less background information, and the text boundary box is rotated, the method has the advantages that the text boundary frame is in a horizontal state, the text boundary frame in the horizontal state is input into the text recognition model, the text recognition model recognizes the frame number in the text boundary frame, automatic recognition of the frame number is achieved, the model used for detecting the frame number is fused with the model used for recognition, namely, the frame number in the image is detected and recognized by the end-to-end detection and recognition network frame, the video memory occupation required by the running of the electronic equipment can be effectively reduced, the influence of background information brought by positioning on the recognition process can be avoided, the speed of detecting and recognizing the frame number can be greatly increased, the accuracy of recognizing the frame number can be increased, the frame number does not need to be recognized manually, and the problems of low existing recognition efficiency and accuracy are avoided.
The following describes the technical solutions of the present disclosure and how to solve the above technical problems in detail by specific examples. Several of these specific examples may be combined with each other below, and some of the same or similar concepts or processes may not be repeated in some examples. Examples of the present disclosure will now be described with reference to the accompanying drawings.
Fig. 2 is a first flowchart illustrating a text recognition method according to an embodiment of the present invention, where an execution main body of the embodiment may be an electronic device, and the embodiment is not limited herein. As shown in fig. 2, the method includes:
s201, obtaining a curve text image to be identified, and performing feature extraction on the curve text image to obtain a feature map.
In this embodiment, a curved text image to be recognized is obtained, where the curved text image includes curved texts to be recognized, and the curved texts are texts arranged in a curve, such as car frame numbers.
The curved text image may be sent by a server or other terminals, or may be imported through an associated transmission device (e.g., a usb disk), and the source of the curved text image is not limited in the present invention.
Optionally, the curved text image is a frame number image, and the frame number image includes a frame number.
In this embodiment, after the curve text image is obtained, feature extraction is performed on the curve text image to obtain a feature map corresponding to the curve text image, and the feature map can better reflect the features of the curve text image, that is, can better protrude the curve text in the curve text image, so that the subsequent positioning and identification of the curve text according to the feature map are more accurate.
S202, processing the feature graph by adopting a target network model to obtain a text boundary box in the feature graph, wherein the target network model is used for detecting the text boundary in the image.
In this embodiment, after obtaining the feature map corresponding to the curve text image, the target network model is used to process the feature map, that is, the text boundary of the curve text in the feature map is detected to obtain the text boundary box in the feature map, that is, the text region in the feature map is obtained, where the text region includes less background information, that is, includes less non-curve text information, so as to implement accurate positioning of the curve text, that is, implement automatic detection of the curve text, so as to identify the curve text by using the text boundary box, and implement accurate identification of the curve text.
The target network model comprises a rectangular surrounding frame detection model and a boundary point detection model.
And S203, identifying the text boundary box to obtain a curve text in the text boundary box.
In this embodiment, after obtaining the text bounding box in the feature map, that is, after obtaining the text region in the curved text image, text recognition is performed on the text bounding box to determine the curved text in the text bounding box, so as to implement recognition of the curved text.
From the above description, it can be known that feature extraction is performed on a curve text image to be recognized to obtain a feature map which can more represent features of the curve text image, the feature map is processed by using a target network model to obtain a text boundary box of a curve text in the feature map, that is, an area where the curve text in the feature map is located is obtained, accurate positioning of the curve text is realized, text recognition is performed on the text boundary box to obtain the curve text in the text boundary box, automatic recognition of the curve text is realized, efficiency and accuracy of curve text recognition are improved, manual recognition of the curve text in the image is not needed, and the problems of low existing recognition efficiency and accuracy are avoided.
Fig. 3 is a schematic flowchart of a second method for text recognition according to an embodiment of the present invention, in this embodiment, on the basis of the embodiment of fig. 2, after a feature map is obtained, a curve text portion in the feature map is located by using a relevant model, that is, boundary points of the curve text are detected, and a corresponding text boundary box is obtained, so that the curve text is determined according to the text boundary box. The following describes a process of detecting boundary points of a curve text by using a correlation model, with reference to a specific embodiment. As shown in fig. 3, the method includes:
s301, obtaining a curve text image to be identified, and performing feature extraction on the curve text image to obtain a feature map.
In this embodiment, when performing feature extraction on a curved text image, a shared network model is used to perform feature extraction on the curved text image, and the curved text image is input into the shared network model so that the shared network model performs feature extraction on the curved text image, so as to obtain a required feature map, where the feature map can better reflect a feature part in the curved text image, that is, can better protrude the curved text part in the curved text image, and reduce the influence of a noise part.
Wherein the shared network model includes a plurality of first network layers and a plurality of second network layers. The plurality of first network layers are connected in sequence. For each first network layer, the first network layer receives the feature map output by the last first network layer, and performs downsampling (i.e., dimension reduction) processing on the feature map to obtain a first feature map with a size equal to a preset size corresponding to the first network layer. The plurality of second network layers are connected in sequence, and each second network layer is connected with the corresponding first network layer. For each second network layer, the second network layer receives the first feature map output by the corresponding first network layer and receives the second feature map output by the previous second network layer, and performs up-sampling (i.e. dimension-increasing) processing on the first feature map and the second feature map to obtain a second feature map with a size of a preset size corresponding to the second network layer. And taking the second characteristic diagram input by the last second network layer as a required characteristic diagram. As shown in fig. 4, network layers 1, 2, 3, and 4 are all first network layers, and network layers 4, 5, 6, and 7 are all second network layers. Network layers 1, 2 and 3 are connected in sequence, network layers 5, 6 and 7 are connected in sequence, and network layers 5, 6 and 7 are connected with their corresponding first network layers, e.g., network layer 5 is connected with network layer 3. The size of the feature map output by each network layer is the corresponding size, for example, the size of the feature map output by the network layer 2 is one eighth of the original map. The feature map output by the last second network layer, i.e. the network layer 7, is taken as the required feature map.
The first network layer may include a convolution layer or a pooling layer, the second network layer may include a reverse convolution layer or a pooling layer, and a user may set a structure of the network layer according to implementation requirements.
Optionally, the shared network model is constructed based on a Segnet network framework, and the shared network model may perform comprehensive processing on the lowvelel semantic feature map and the high level semantic feature map, that is, perform comprehensive processing on the first feature map and the second feature map, so that the obtained feature maps can protrude the feature part of the curve text image.
S302, text box detection is carried out on the feature map by adopting a rectangular surrounding box detection model so as to obtain a text rectangular surrounding box in the feature map, wherein the text rectangular surrounding box comprises a curve text, and the rectangular surrounding box detection model is used for extracting the text rectangular surrounding box in the image.
And S303, carrying out boundary point detection on the rectangular text bounding box by adopting a boundary point detection model to obtain boundary points of the curve text in the rectangular text bounding box, wherein the boundary point detection model is used for detecting the boundary points of the text.
In this embodiment, a rectangular bounding box detection model is used to perform text box detection on a feature map, that is, the feature map is input into the rectangular bounding box detection model, the rectangular bounding box detection model extracts candidate regions in the feature map, and after candidate regions are extracted, regression is performed on the central offset, width, height and inclination angle of a target rectangular box, so as to finally obtain multi-directional rectangular box information that bounds text information, that is, a text rectangular bounding box in the feature map is obtained.
The rectangular bounding box detection model is actually a multidirectional rectangular bounding box detection model, and can extract a rectangular box surrounding a curve text in an image. The process of extracting the rectangular frame surrounding the curve text in the image by the rectangular surrounding frame detection model is the existing process, and is not repeated here.
In this embodiment, optionally, the rectangular bounding box detection model generates a network (RPN) model for the Region.
In this embodiment, after the rectangular text bounding box is obtained, there is more background noise in the rectangular text bounding box, that is, there are more non-curved text portions, so to improve the accuracy of subsequent text recognition, the rectangular text bounding box needs to be further segmented (that is, extracted) according to the boundary of the curved text to obtain a text bounding box including fewer non-curved text portions.
Specifically, when the rectangular text bounding box is further divided according to the boundary of the curved text, the rectangular text bounding box needs to be input into the boundary point detection model first, so that the boundary point detection model performs boundary point detection on the rectangular text bounding box to obtain the boundary points of the curved text in the rectangular text bounding box.
Optionally, the implementation manner of step S303 is: and detecting the boundary points of the rectangular text bounding box by adopting a boundary point detection model to obtain the offset distance between the boundary points and the equal division points of the long edges on the rectangular text bounding box. And obtaining the coordinates of the long-edge equally-dividing points, and determining the coordinates of the boundary points according to the coordinates of the long-edge equally-dividing points and the offset distance.
Specifically, a rectangular text bounding box is input into a boundary point detection model, the boundary point detection model extracts boundary points of a curve text in the rectangular text bounding box, each boundary point corresponds to a long-edge equal division point on one rectangular text bounding box, then for each boundary point, an offset distance between the boundary point and the corresponding long-edge equal division point is determined, an offset distance corresponding to the boundary point is obtained, and output is performed.
And for each boundary point, obtaining the coordinates of the boundary point according to the coordinates of the long-edge equally-divided points corresponding to the boundary point and the offset distance corresponding to the boundary point based on a preset coordinate calculation formula.
Wherein the offset distance comprises an abscissa offset distance and an ordinate offset distance. The preset coordinate calculation formula is
Figure BDA0002610412010000091
(x′b,y′b) Is the coordinate of the boundary point, (x'd,y′d) Coordinates of equal points of the long side, w, corresponding to the boundary points0Is the width of the rectangular bounding box of the text (i.e., the length of the long side of the rectangular bounding box of the text), h0The height of the text rectangle bounding box (i.e., the length of the short side of the text rectangle bounding box).
Optionally, when determining the boundary point of the curve text, the boundary point may be determined by using a preset boundary point generating algorithm, where a code of the preset boundary point generating algorithm is as follows:
Figure BDA0002610412010000092
Figure BDA0002610412010000101
it should be noted that, the equal division point of the long edge on the text rectangle bounding box is a default point, which may be set according to actual requirements, for example, it is set as an eighth division point, and seven equal division points of the long edge exist on one long edge of the text rectangle bounding box.
In this embodiment, the rectangular bounding box detection model and the boundary point detection model are both trained network models, the rectangular bounding box detection model can accurately extract a text rectangular bounding box in an image, and the boundary point detection model can determine boundary points of a curve text in the text rectangular bounding box and offset distances between the boundary points and equal division points of a long edge of the text rectangular bounding box.
And S304, obtaining a text boundary box according to the boundary points and the feature map.
In this embodiment, the boundary points are mapped onto the feature map, and the portion surrounded by the boundary points on the feature map is extracted to obtain a text bounding box including a curved text, where the text bounding box includes less background noise than a rectangular bounding box, so that text recognition is performed in the following process, and accuracy of the text recognition is ensured.
In this embodiment, optionally, when mapping the boundary point onto the feature map, the coordinate of the boundary segment may be used for mapping, so as to extract a portion surrounded by the boundary point on the feature map to obtain the text bounding box, where the specific process is as follows: and determining the curve text boundary in the feature map based on the coordinates of the boundary points. And connecting the curve text boundaries to obtain a text boundary box.
Specifically, the position of the boundary point on the feature map is determined according to the coordinate of the boundary point, so that a curve text boundary is obtained, and the curve text boundary is composed of a plurality of boundary points. The boundary points on the boundary of the curved text are sequentially connected to obtain a text boundary box surrounding the curved text, for example, the boundary points in fig. 5 are sequentially connected to obtain the text boundary box in fig. 6.
S305, recognizing the text boundary box to obtain a curve text in the text boundary box.
In this embodiment, after the text bounding box is obtained, that is, after the curve text portion is obtained, text recognition is performed on the curve text portion to obtain a curve text in the curve text portion, so as to implement automatic recognition of the curve text.
When the text bounding box is recognized, the text recognition model can be used for recognizing the text.
In addition, in the present embodiment, a multitask-based loss is adopted for training network models in the end-to-end framework, such as a rectangular bounding box detection model, a boundary point detection model and a text recognition model. I.e. total lossRect+lossBound+lossREC。lossRectLoss representing rectangular bounding box detection modelBoundLoss representing boundary point detection modelRECRepresents the loss of the text recognition model.
In addition, when the network model in the end-to-end frame is trained, a training sample image marked manually is obtained, and the training sample is input into the end-to-end frame, so that the related network model utilizes the training sample to train the model.
Fig. 7 is a schematic flow chart of a text recognition method according to an embodiment of the present invention, where on the basis of any of the above embodiments, when recognizing a frame number in a frame number image, the embodiment needs to first detect a frame number region in the frame number image to obtain a corresponding text bounding box, implement accurate positioning of the frame number, and then recognize the text bounding box to obtain the frame number in the text bounding box, and implement automatic recognition of the frame number. The process of identifying a carriage number in a text bounding box is described below in conjunction with a specific embodiment. As shown in fig. 7, the method includes:
s701, acquiring a frame number image to be identified, and performing feature extraction on the frame number image to obtain a feature map.
S702, processing the feature map by adopting a target network model to obtain a text boundary box in the feature map, wherein the target network model is used for detecting the text boundary in the image.
The implementation process of steps S701 to S702 in this embodiment is similar to the implementation process of steps S201 to S202 in the embodiment of fig. 2, and is not described again here.
And S703, recognizing the frame number of the text boundary box by adopting a text recognition model so as to obtain the frame number in the text boundary box.
In this embodiment, when the curve text image includes the frame number image, it indicates that the frame number in the frame number image needs to be acquired, a text bounding box in the feature map corresponding to the frame number image is acquired, that is, after the frame number region in the feature map is acquired, the text bounding box includes the frame number, text recognition is performed on the text bounding box by using a trained text recognition model, that is, the text bounding box is input into the text recognition model, and the text recognition model performs text recognition on the text boundary, determines the frame number in the text bounding box, and implements automatic recognition of the frame number.
The text recognition model is a trained Long Short-term memory (LSTM) network model, and the trained LSTM network model can accurately recognize the frame number.
In addition, the training process of the text recognition model is similar to that of the existing neural network model, that is, a required training sample is obtained first, and then the basic neural network model is trained by using the training sample, so that the trained text recognition model is obtained, and the detailed description of the training process is omitted here.
In addition, in any embodiment, optionally, since the accuracy of the text recognition model recognizing the text in the horizontal line is higher, in order to improve the accuracy of the text recognition model recognizing the curved text in the text bounding box, that is, the frame number, after the text bounding box is obtained, it is determined whether the curved text in the text bounding box is in the horizontal state. If the text bounding box is not in the horizontal state, which indicates that the curved text in the text bounding box is not in the horizontal arrangement but in the curved arrangement (as shown in a in fig. 8), the curved text in the text bounding box is horizontally rectified so that the curved text is in the horizontal state, i.e. the curved text in the text bounding box is in the horizontal arrangement (as shown in b in fig. 8). And inputting the text bounding box with the curve text in the horizontal line into a text recognition model so that the text recognition model recognizes the text in the horizontal state in the text bounding box.
After the curve text in the text boundary box is determined to be in the horizontal state, the curve text is indicated to be in horizontal arrangement, and the text boundary box with the curve text in the horizontal state is directly input into a text recognition model to realize accurate recognition of the text.
In this embodiment, the end-to-end-based detection and identification network framework detects and identifies the frame number in the frame number image, that is, the text bounding box in the frame number image is determined by using the correlation model, the non-frame number part included in the text bounding box is less, so that the accurate positioning of the frame number is realized, that is, the accurate detection of the frame number is realized, then the text bounding box is identified by using the text identification model to obtain the corresponding frame number, so that the automatic identification of the frame number is realized, and because the text bounding box comprising less background is identified, the influence of the background part on the frame number can be reduced, so that the accuracy of the identification is ensured.
Based on the same idea, an embodiment of the present specification further provides a device corresponding to the foregoing method, and fig. 9 is a schematic structural diagram of a text recognition device provided in an embodiment of the present invention. As shown in fig. 9, the text recognition apparatus 90 includes: an image acquisition module 901 and a processing module 902.
The image obtaining module 901 is configured to obtain a curve text image to be identified, and perform feature extraction on the curve text image to obtain a feature map.
And the processing module 902 is configured to process the feature map by using a target network model to obtain a text bounding box in the feature map, where the target network model is used to detect a text boundary in the image.
The processing module 902 is further configured to identify the text bounding box to obtain a curve text in the text bounding box.
In another embodiment of the present invention, based on the embodiment shown in fig. 9, the target network model includes a rectangular bounding box detection model and a boundary point detection model, and the processing module 902 is further configured to:
and detecting the text box of the feature graph by adopting a rectangular surrounding box detection model so as to obtain the text rectangular surrounding box in the feature graph, wherein the text rectangular surrounding box comprises a curve text, and the rectangular surrounding box detection model is used for extracting the text rectangular surrounding box in the image. And carrying out boundary point detection on the text rectangular bounding box by adopting a boundary point detection model to obtain boundary points of the curve text in the text rectangular bounding box, wherein the boundary point detection model is used for detecting the boundary points of the text. And obtaining a text boundary box according to the boundary points and the feature map.
In this embodiment, optionally, the image obtaining module 901 is further configured to:
and extracting the characteristics of the curve text image by adopting a shared network model to obtain a characteristic diagram, wherein a lower layer network layer and a higher layer network layer in the shared network model are connected.
In this embodiment, optionally, the processing module 902 is further configured to:
and detecting the boundary points of the rectangular text bounding box by adopting a boundary point detection model to obtain the offset distance between the boundary points and the equal division points of the long edges on the rectangular text bounding box. And obtaining the coordinates of the long-edge equally-dividing points, and determining the coordinates of the boundary points according to the coordinates of the long-edge equally-dividing points and the offset distance.
In this embodiment, optionally, the processing module 902 is further configured to:
and determining the curve text boundary in the feature map based on the coordinates of the boundary points. And connecting the curve text boundaries to obtain a text boundary box.
In another embodiment of the present invention, based on the embodiment shown in fig. 9, the curved text image includes a frame number image, and the processing module 902 is further configured to:
and recognizing the frame number of the text boundary box by adopting a text recognition model so as to obtain the frame number in the text boundary box.
In addition, optionally, the processing module 902 is further configured to:
it is determined whether the curved text in the text bounding box is in a horizontal state. And if the text bounding box is not in a horizontal state, horizontally correcting the text bounding box.
The device provided in this embodiment may be used to implement the technical solution of the above method embodiment, and the implementation principle and technical effect are similar, which are not described herein again.
Fig. 10 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present invention. As shown in fig. 10, the electronic apparatus 100 of the present embodiment includes: a processor 1001 and a memory 1002; wherein the content of the first and second substances,
a memory 1002 for storing computer-executable instructions;
the processor 1001 is configured to execute the computer-executable instructions stored in the memory to implement the steps performed by the receiving device in the above embodiments. Reference may be made in particular to the description relating to the method embodiments described above.
Alternatively, the memory 1002 may be separate or integrated with the processor 1001.
When the memory 1002 is provided separately, the train user identification device further includes a bus 1003 for connecting the memory 1002 and the processor 1001.
The embodiment of the present invention further provides a computer-readable storage medium, where computer-executable instructions are stored in the computer-readable storage medium, and when a processor executes the computer-executable instructions, the text recognition method as described above is implemented.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described device embodiments are merely illustrative, and for example, the division of the modules is only one logical division, and other divisions may be realized in practice, for example, a plurality of modules may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or modules, and may be in an electrical, mechanical or other form.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each module may exist alone physically, or two or more modules are integrated into one unit. The unit formed by the modules can be realized in a hardware form, and can also be realized in a form of hardware and a software functional unit.
The integrated module implemented in the form of a software functional module may be stored in a computer-readable storage medium. The software functional module is stored in a storage medium and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute some steps of the methods according to the embodiments of the present application.
It should be understood that the Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present invention may be embodied directly in a hardware processor, or in a combination of the hardware and software modules within the processor.
The memory may comprise a high-speed RAM memory, and may further comprise a non-volatile storage NVM, such as at least one disk memory, and may also be a usb disk, a removable hard disk, a read-only memory, a magnetic or optical disk, etc.
The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, the buses in the figures of the present application are not limited to only one bus or one type of bus.
The storage medium may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.
An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in an Application Specific Integrated Circuits (ASIC). Of course, the processor and the storage medium may reside as discrete components in an electronic device or host device.
Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. A text recognition method, comprising:
acquiring a curve text image to be identified, and extracting the characteristics of the curve text image to obtain a characteristic diagram;
processing the feature map by adopting a target network model to obtain a text boundary box in the feature map, wherein the target network model is used for detecting a text boundary in an image;
and identifying the text boundary box to obtain a curve text in the text boundary box.
2. The method of claim 1, wherein the extracting the features of the curved text image to obtain a feature map comprises:
and extracting the characteristics of the curve text image by adopting a shared network model to obtain the characteristic diagram, wherein a lower layer network layer in the shared network model is connected with a higher layer network layer.
3. The method according to claim 1, wherein the target network model includes a rectangular bounding box detection model and a boundary point detection model, and the processing the feature map to obtain the text bounding box in the feature map by using the target network model includes:
performing text box detection on the feature map by using the rectangular bounding box detection model to obtain a text rectangular bounding box in the feature map, wherein the text rectangular bounding box comprises a curve text, and the rectangular bounding box detection model is used for extracting the text rectangular bounding box in an image;
carrying out boundary point detection on the text rectangular bounding box by adopting the boundary point detection model to obtain boundary points of the curve text in the text rectangular bounding box, wherein the boundary point detection model is used for detecting the boundary points of the text;
and obtaining the text bounding box according to the boundary points and the feature map.
4. The method according to any one of claims 1 to 3, wherein if the curved text image includes a car frame number image, the recognizing the text bounding box to obtain the curved text in the text bounding box comprises:
and recognizing the frame number of the text boundary box by adopting a text recognition model so as to obtain the frame number in the text boundary box.
5. The method of claim 1, further comprising:
determining whether the curve text in the text bounding box is in a horizontal state;
and if not, horizontally correcting the text bounding box.
6. The method according to claim 3, wherein the performing boundary point detection on the rectangular bounding box of the text by using the boundary point detection model to obtain boundary points of the curved text in the rectangular bounding box of the text comprises:
detecting the boundary points of the rectangular text bounding box by adopting the boundary point detection model to obtain the offset distance between the boundary points and the equal division points of the long edges of the rectangular text bounding box;
and obtaining the coordinates of the long-edge equally-dividing points, and determining the coordinates of the boundary points according to the coordinates of the long-edge equally-dividing points and the offset distance.
7. The method according to claim 6, wherein the obtaining the text bounding box according to the boundary point and the feature map comprises:
determining a curve text boundary in the feature map based on the coordinates of the boundary points;
and connecting the curve text boundaries to obtain the text boundary box.
8. A text recognition apparatus, comprising:
the image acquisition module is used for acquiring a curve text image to be identified and extracting the characteristics of the curve text image to obtain a characteristic diagram;
the processing module is used for processing the feature map by adopting a target network model to obtain a text boundary box in the feature map, wherein the target network model is used for detecting a text boundary in an image;
the processing module is further configured to identify the text bounding box to obtain a curve text in the text bounding box.
9. An electronic device, comprising: at least one processor and memory;
the memory stores computer-executable instructions;
the at least one processor executing the computer-executable instructions stored by the memory causes the at least one processor to perform the text recognition method of any of claims 1-7.
10. A computer-readable storage medium having computer-executable instructions stored thereon which, when executed by a processor, implement the text recognition method of any one of claims 1 to 7.
CN202010752292.XA 2020-07-30 2020-07-30 Text recognition method and equipment Pending CN111753812A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010752292.XA CN111753812A (en) 2020-07-30 2020-07-30 Text recognition method and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010752292.XA CN111753812A (en) 2020-07-30 2020-07-30 Text recognition method and equipment

Publications (1)

Publication Number Publication Date
CN111753812A true CN111753812A (en) 2020-10-09

Family

ID=72712378

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010752292.XA Pending CN111753812A (en) 2020-07-30 2020-07-30 Text recognition method and equipment

Country Status (1)

Country Link
CN (1) CN111753812A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113673497A (en) * 2021-07-21 2021-11-19 浙江大华技术股份有限公司 Text detection method, terminal and computer readable storage medium thereof
CN114581652A (en) * 2020-12-01 2022-06-03 北京四维图新科技股份有限公司 Target object detection method and device, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019192397A1 (en) * 2018-04-04 2019-10-10 华中科技大学 End-to-end recognition method for scene text in any shape
CN110751151A (en) * 2019-10-12 2020-02-04 上海眼控科技股份有限公司 Text character detection method and equipment for vehicle body image
CN110837835A (en) * 2019-10-29 2020-02-25 华中科技大学 End-to-end scene text identification method based on boundary point detection
CN110929665A (en) * 2019-11-29 2020-03-27 河海大学 Natural scene curve text detection method
WO2020097909A1 (en) * 2018-11-16 2020-05-22 北京比特大陆科技有限公司 Text detection method and apparatus, and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019192397A1 (en) * 2018-04-04 2019-10-10 华中科技大学 End-to-end recognition method for scene text in any shape
WO2020097909A1 (en) * 2018-11-16 2020-05-22 北京比特大陆科技有限公司 Text detection method and apparatus, and storage medium
CN110751151A (en) * 2019-10-12 2020-02-04 上海眼控科技股份有限公司 Text character detection method and equipment for vehicle body image
CN110837835A (en) * 2019-10-29 2020-02-25 华中科技大学 End-to-end scene text identification method based on boundary point detection
CN110929665A (en) * 2019-11-29 2020-03-27 河海大学 Natural scene curve text detection method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
刘业鑫;卜巍;邬向前;: "基于文本中心线的自然场景文本检测方法", 智能计算机与应用, no. 02, 1 February 2020 (2020-02-01) *
王建新;王子亚;田萱;: "基于深度学习的自然场景文本检测与识别综述", 软件学报, no. 05, 15 May 2020 (2020-05-15) *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114581652A (en) * 2020-12-01 2022-06-03 北京四维图新科技股份有限公司 Target object detection method and device, electronic equipment and storage medium
CN113673497A (en) * 2021-07-21 2021-11-19 浙江大华技术股份有限公司 Text detection method, terminal and computer readable storage medium thereof

Similar Documents

Publication Publication Date Title
CN107545241B (en) Neural network model training and living body detection method, device and storage medium
US9171204B2 (en) Method of perspective correction for devanagari text
CN110287854B (en) Table extraction method and device, computer equipment and storage medium
CN111046859B (en) Character recognition method and device
CN111353501A (en) Book point-reading method and system based on deep learning
CN110647882A (en) Image correction method, device, equipment and storage medium
CN108717744B (en) Method and device for identifying seal serial number on financial document and terminal equipment
CN110443242B (en) Reading frame detection method, target recognition model training method and related device
CN111967286A (en) Method and device for identifying information bearing medium, computer equipment and medium
CN108734161B (en) Method, device and equipment for identifying prefix number area and storage medium
CN111753812A (en) Text recognition method and equipment
US11651604B2 (en) Word recognition method, apparatus and storage medium
CN112668580A (en) Text recognition method, text recognition device and terminal equipment
CN115631112A (en) Building contour correction method and device based on deep learning
CN109829383B (en) Palmprint recognition method, palmprint recognition device and computer equipment
CN114445843A (en) Card image character recognition method and device of fixed format
CN112560856B (en) License plate detection and identification method, device, equipment and storage medium
CN110909816B (en) Picture identification method and device
CN110969640A (en) Video image segmentation method, terminal device and computer-readable storage medium
CN112949649A (en) Text image identification method and device and computing equipment
WO2020244076A1 (en) Face recognition method and apparatus, and electronic device and storage medium
CN114495132A (en) Character recognition method, device, equipment and storage medium
CN114299509A (en) Method, device, equipment and medium for acquiring information
CN110751158B (en) Digital identification method, device and storage medium in therapeutic bed display
CN114120305A (en) Training method of text classification model, and recognition method and device of text content

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination