CN113033377A

CN113033377A - Character position correction method, character position correction device, electronic equipment and storage medium

Info

Publication number: CN113033377A
Application number: CN202110304878.4A
Authority: CN
Inventors: 蔡悦; 张宇轩; 庄妮; 黄灿; 王长虎
Original assignee: Beijing Youzhuju Network Technology Co Ltd
Current assignee: Beijing Youzhuju Network Technology Co Ltd
Priority date: 2021-03-16
Filing date: 2021-03-16
Publication date: 2021-06-25
Also published as: WO2022194130A1

Abstract

The present disclosure provides a character position correction method, apparatus, electronic device, and storage medium, which can achieve both character recognition performance and character position accuracy by correcting a character position in a text line recognition result by using a character position in a character detection recognition result.

Description

Character position correction method, character position correction device, electronic equipment and storage medium

Technical Field

The embodiment of the disclosure relates to the technical field of character recognition, in particular to a character position correction method, a character position correction device, electronic equipment and a storage medium.

Background

Optical Character Recognition (OCR) is a technique for recognizing characters in an image. Current OCR has two primary granularities, row and word. The OCR of the text line is the mainstream, the OCR of the text line can accurately recognize and identify characters appearing in an image of the text line, and the position of a single character is roughly estimated by utilizing the characteristics of the OCR of the text line.

Disclosure of Invention

The embodiment of the disclosure provides a character position correction method and device, electronic equipment and a storage medium.

In a first aspect, an embodiment of the present disclosure provides a character position correction method, including: acquiring a text line recognition result sequence and a single character detection recognition result sequence corresponding to a text line image, wherein the text line recognition result comprises a first character and a first character bounding box position, and the single character detection recognition result comprises a second character and a second character bounding box position; and executing the following position updating operation on each single character detection recognition result in the single character detection recognition result sequence: searching a text line recognition result with the first character being the same as the second character in the single character detection recognition result in the text line recognition result sequence; in response to the at least one found text line recognition result, determining a text line recognition result closest to the single character detection recognition result in the found text line recognition results; and updating the position of the first character bounding box in the determined text line recognition result to the position of the second character in the single character detection recognition result.

In some alternative embodiments, the first character bounding box location comprises an in-line start location and an in-line end location; and the method further comprises:

calculating an intra-line character gap average value of the text line recognition result sequence, wherein the intra-line character gap average value of the text line recognition result sequence is an average value of distances between an intra-line end position in a preceding text line recognition result and an intra-line start position in a following text line recognition result in two adjacent text line recognition results in the text line recognition results of which the positions are updated by first character bounding boxes in the text line recognition result sequence;

and for the text line identification result of which the in-line starting position and the in-line ending position are not updated in the text line identification result sequence, updating the in-line starting position and the in-line ending position in the text line identification result according to the in-line character gap average value, wherein the distance between the updated in-line starting position of the text line identification result and the in-line ending position of the previous text identification result of the text identification result in the text line identification result sequence is the in-line character gap average value, and/or the distance between the updated in-line ending position of the text line identification result and the in-line starting position of the next text identification result of the text identification result in the text line identification result sequence is the in-line character gap average value.

In some alternative embodiments, the first character bounding box location comprises a line start location and a line end location; and

the method further comprises the following steps:

respectively determining the minimum value in the line starting positions and the maximum value in the line ending positions of the text line recognition results of which the line starting positions and the line ending positions are updated in the text line recognition result sequence as the text line starting positions and the text line ending positions;

and respectively updating the line starting position and the line ending position in the text line recognition result by using the text line starting position and the text line ending position for the text line recognition result of which the line starting position and the line ending position are not updated in the text line recognition result sequence.

In some alternative embodiments, the text line recognition result sequence corresponding to the text line image is obtained by:

and inputting the text line image into a pre-trained text line recognition model to obtain a text line recognition result sequence corresponding to the text line image, wherein the text line recognition model is used for representing the corresponding relation between the image to be recognized and the text line recognition result sequence.

In some alternative embodiments, the text line recognition model includes a sequential convolutional neural network, a cyclic neural network, and a connection time classification CTC.

In some alternative embodiments, the text line recognition model includes a sequential convolutional neural network and an attention-based recurrent neural network.

In some alternative embodiments, the sequence of single-word detection recognition results corresponding to the text line image is obtained by:

carrying out single character detection on the text line image by using a target detection algorithm to obtain the position of at least one character bounding box;

intercepting character images from the text line images according to the positions of the detected character bounding boxes, and inputting the intercepted character images into a single character recognition model to obtain corresponding character recognition results;

and for each detected character bounding box, generating a single character detection recognition result by using the character recognition result corresponding to the character bounding box and the position of the character bounding box, and generating a single character detection recognition result sequence by using the generated single character recognition according to the sequence of the characters of the character bounding box corresponding to the position of the character bounding box in the text line image.

In a second aspect, an embodiment of the present disclosure provides a character position correction apparatus, including: the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is configured to acquire a text line recognition result sequence and a single character detection recognition result sequence corresponding to a text line image, the text line recognition result comprises a first character and a first character bounding box position, and the single character detection recognition result comprises a second character and a second character bounding box position; a first updating unit configured to perform the following position updating operation for each single character detection recognition result in the sequence of single character detection recognition results: searching a text line recognition result with the first character being the same as the second character in the single character detection recognition result in the text line recognition result sequence; in response to the at least one found text line recognition result, determining a text line recognition result closest to the single character detection recognition result in the found text line recognition results; and updating the position of the first character bounding box in the determined text line recognition result to the position of the second character in the single character detection recognition result.

In some alternative embodiments, the first character bounding box location comprises an in-line start location and an in-line end location; and the apparatus further comprises:

an average value calculation unit configured to calculate an intra-line character gap average value of the text line recognition result sequence, wherein the intra-line character gap average value of the text line recognition result sequence is an average value of distances between an intra-line end position in a preceding text line recognition result and an intra-line start position in a following text line recognition result in two adjacent text line recognition results in the text line recognition results each updated by a first character bounding box position in the text line recognition result sequence;

a second updating unit configured to update the in-line start position and the in-line end position in the text line recognition result according to the in-line character gap average value for a text line recognition result of which the in-line start position and the in-line end position in the text line recognition result are not updated in the text line recognition result sequence, wherein a distance between the updated in-line start position of the text line recognition result and the in-line end position of a text recognition result preceding the text recognition result in the text line recognition result sequence is the in-line character gap average value, and/or a distance between the updated in-line end position of the text line recognition result and the in-line start position of a text recognition result succeeding the text recognition result in the text line recognition result sequence is the in-line character gap average value.

the device further comprises:

a determination unit configured to determine a minimum value of line start positions and a maximum value of line end positions of the respective text line recognition results updated with the line start position and the line end position in the text line recognition result sequence as a text line start position and a text line end position, respectively;

and a third updating unit configured to update, for a text line recognition result in which the line start position and the line end position are not updated in the text line recognition result sequence, the line start position and the line end position in the text line recognition result with the text line start position and the text line end position, respectively.

In a third aspect, an embodiment of the present disclosure provides an electronic device, including: one or more processors; a storage device, on which one or more programs are stored, which, when executed by the one or more processors, cause the one or more processors to implement the method as described in any implementation manner of the first aspect.

In a fourth aspect, embodiments of the present disclosure provide a computer-readable storage medium on which a computer program is stored, wherein the computer program, when executed by one or more processors, implements the method as described in any of the implementations of the first aspect.

Text line OCR, while it can accurately identify the characters appearing in a text line image, it only uses a rough estimate of the single character position. For example, CRNN (Convolutional Recurrent Neural Networks) is based primarily on ctc (connected Temporal classification) information pushback, and the Transformer is based primarily on the attention mechanism. However, the accuracy of the position of the single character obtained by the method is low, and the method cannot be applied to a scene with high requirement on the position of the single character. For example, in a scenario where two documents (e.g., contract documents) are differentially aligned, it is desirable that the word position accuracy is high.

In order to give consideration to both the character recognition performance and the single character position accuracy, the applicant finds through practice that the single character recognition accuracy obtained in the character detection and recognition of the single character granularity is low, but the single character position accuracy is high. Therefore, the character position correction method, device, electronic device, and storage medium provided in the embodiments of the present disclosure correct the position of the single character in the text line recognition result by detecting the position of the single character in the recognition result, so as to achieve both the character recognition performance and the accuracy of the position of the single character, and are specifically implemented as follows: firstly, acquiring a text line recognition result sequence and a single character detection recognition result sequence corresponding to a text line image, wherein the text line recognition result comprises a first character and a first character bounding box position, and the single character detection recognition result comprises a second character and a second character bounding box position; and then for each single character detection recognition result in the single character detection recognition result sequence, executing position updating operation: that is, searching a text line recognition result with the first character being the same as the second character in the single character detection recognition result in the text line recognition result sequence; in response to the at least one found text line recognition result, determining a text line recognition result closest to the single character detection recognition result in the found text line recognition results; and updating the position of the first character bounding box in the determined text line recognition result to the position of the second character in the single character detection recognition result. And then, the position of the character bounding box in the text line recognition result is updated, so that the accuracy of the position of the character in the text line recognition result is improved.

Drawings

Other features, objects, and advantages of the disclosure will become apparent from a reading of the following detailed description of non-limiting embodiments which proceeds with reference to the accompanying drawings. The drawings are only for purposes of illustrating the particular embodiments and are not to be construed as limiting the invention. In the drawings:

FIG. 1 is an exemplary system architecture diagram in which one embodiment of the character position correction method of the present disclosure may be applied;

FIG. 2 is a flow diagram of one embodiment of a character position correction method according to the present disclosure;

3A-3C are schematic diagrams of an application scenario of the character position correction method according to the present disclosure;

FIG. 4 is a flow chart diagram of yet another embodiment of a character position correction method according to the present disclosure;

FIG. 5 is a schematic structural diagram of one embodiment of a character position correction apparatus according to the present disclosure;

FIG. 6 is a schematic block diagram of a computer system suitable for use with an electronic device implementing embodiments of the present disclosure.

Detailed Description

The present disclosure is described in further detail below with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that, in the present disclosure, the embodiments and features of the embodiments may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Fig. 1 illustrates an exemplary system architecture 100 to which embodiments of the character position correction method, apparatus, electronic device, and storage medium of the present disclosure may be applied.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. Various communication client applications, such as a character recognition application, a text processing application, a voice recognition application, a short video social application, a web conference application, a web browser application, a shopping application, a search application, an instant messaging tool, a mailbox client, social platform software, and the like, may be installed on the

terminal devices

101, 102, 103.

The

terminal apparatuses

101, 102, and 103 may be hardware or software. When the

terminal devices

101, 102, 103 are hardware, they may be various electronic devices having a video capture device (e.g. a camera), a tablet, and a display screen, including but not limited to a smart phone, a tablet computer, an e-book reader, an MP3 player (Moving Picture Experts Group Audio Layer III, motion Picture Experts Group Audio Layer IV, motion Picture Experts Group Audio Layer 4), an MP4 player, a laptop computer, a desktop computer, and the like. When the

terminal apparatuses

101, 102, 103 are software, they can be installed in the above-listed terminal apparatuses. It may be implemented as a plurality of software or software modules (for example, for providing a character position correction service), or as a single software or software module. And is not particularly limited herein.

In some cases, the character position correction method provided by the present disclosure may be executed by the

terminal devices

101, 102, 103, and accordingly, the character position correction means may be provided in the

terminal devices

101, 102, 103. In this case, the system architecture 100 may not include the server 105.

In some cases, the character position correction method provided by the present disclosure may be performed by the

terminal devices

101, 102, 103 and the server 105 in common, for example, the step of "acquiring the text line recognition result sequence and the word detection recognition result sequence corresponding to the text line image" may be performed by the

terminal devices

101, 102, 103, the step of "performing the position update operation for each word detection recognition result in the word detection recognition result sequence" and the like may be performed by the server 105. The present disclosure is not limited thereto. Accordingly, the character position correction means may be provided in the

terminal devices

101, 102, 103 and the server 105, respectively.

In some cases, the character position correction method provided by the present disclosure may be executed by the server 105, and accordingly, the character position correction apparatus may also be disposed in the server 105, and in this case, the system architecture 100 may not include the

terminal devices

101, 102, and 103.

The server 105 may be hardware or software. When the server 105 is hardware, it may be implemented as a distributed server cluster composed of a plurality of servers, or may be implemented as a single server. When the server 105 is software, it may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to FIG. 2, a flow 200 of one embodiment of a character position correction method according to the present disclosure is shown, the character position correction method comprising the steps of:

step 201, obtaining a text line recognition result sequence and a single character detection recognition result sequence corresponding to the text line image.

In the present embodiment, the execution subject (for example, the

terminal apparatuses

101, 102, 103 shown in fig. 1) of the character position correction method may locally or remotely acquire the text line recognition result sequence and the single word detection recognition result sequence corresponding to the text line image from another electronic apparatus (for example, the server 105 shown in fig. 1) connected to the execution subject network described above.

Here, the text line image may be an image including a text line object. Here, the characters in the text line object may or may not have the same size. The characters in the text line object may be composed of characters in the same language, or may be composed of characters in more than one language, and this disclosure is not limited thereto.

The text line recognition result sequence corresponding to the text line image can be obtained by recognizing the text line image by using a text line OCR technology. The text line recognition result sequence may be a sequence of text line recognition results, and the text line recognition results may include a first character and a first character bounding box position, where the first character bounding box position may be used to characterize a range of positions in the text line image to which the first character corresponds. The text line recognition results in the sequence of text line recognition results may be arranged in the order of the first character in the text line recognition result in the text line object in the text line image. In practice, the bounding rectangle of the character in the text line image is typically used as the bounding box for the character, and accordingly, the first character bounding box position herein may represent the bounding rectangle of the first character in the text line image in various implementations. For example, the first character bounding box location may include four vertex coordinates of a circumscribed rectangle of the first character in the text line image; for another example, the first character bounding box position may further include the coordinates of the top left vertex of the circumscribed rectangle in the text line image for the first character and the long and short side lengths of the circumscribed rectangle.

The sequence of single-word detection and recognition results corresponding to the text line image may be obtained by recognizing the text line image by using an OCR technology based on single-word detection and recognition. The sequence of single-word-detection recognition results may be a sequence of single-word-detection recognition results, and the single-word-detection recognition results may include a second character and a second character bounding box position, where the second character bounding box position may be used to characterize a range of positions of the second character corresponding to the text line image. The single-character detection recognition results in the single-character detection recognition result sequence may be arranged in the order of the second character in the single-character detection recognition result in the text line object in the text line image. Similar to the first character bounding box, the second character bounding box may also be a circumscribed rectangle that characterizes the second character in the text line image in various implementation manners, which are not described herein again.

In some alternative embodiments, the text line recognition result sequence corresponding to the text line image may be obtained by:

and inputting the text line image into a pre-trained text line recognition model to obtain a text line recognition result sequence corresponding to the text line image. Here, the text line recognition model may be used to characterize a correspondence between an image to be recognized and a sequence of text line recognition results. For example, the text line recognition model may be derived from training a machine learning model based on a large number of training samples. The training sample may include a sample text line image and corresponding annotation information, and the annotation information may include a character and a corresponding character position of each character in the sample text line image.

Alternatively, here, the text line recognition model may include a Convolutional Neural Network (CNN), a Recurrent Neural Network (RNN), and a CTC (connection time Classification) arranged in sequence. Specifically, the text line image may be input into the CNN to obtain a feature image, the feature image is input into the RNN (specifically, a deep bidirectional LSTM network), and the character sequence features are continuously extracted on the basis of the feature image obtained by convolution. And then performing information backstepping on the character sequence characteristics through CTC to obtain a text line identification result sequence.

Optionally, the text line recognition model here may also include a convolutional neural network and a cyclic neural network based on an attention mechanism, which are arranged in sequence.

In some alternative embodiments, the sequence of single-word detection recognition results corresponding to the text line image may be obtained by:

firstly, single character detection is carried out on a text line image by using a target detection algorithm to obtain the position of at least one character bounding box.

Secondly, intercepting character images from the text line images according to the positions of the detected character bounding boxes, inputting the intercepted character images into a single character recognition model, and obtaining corresponding character recognition results. For example, the word recognition model can be obtained by training a machine learning model based on a large number of word training samples. The single character training sample can comprise a sample single character image and corresponding characters.

And finally, for each detected character bounding box, generating a single character detection and recognition result by using the character recognition result corresponding to the character bounding box and the position of the character bounding box, and generating a single character detection and recognition result sequence by using the generated single character recognition according to the sequence of the characters of the character bounding box corresponding to the position of the character bounding box in the text line image.

In step 202, a position updating operation is performed on each single character detection recognition result in the single character detection recognition result sequence.

In this embodiment, the execution body may execute the position updating operation on each single-word detection recognition result in the sequence of single-word detection recognition results acquired in step 201. The location update operation may specifically include the following sub-steps 2021 to 2023:

substep 2021, searching the text line recognition result sequence for the text line recognition result with the first character being the same as the second character in the single character detection recognition result.

Substep 2022, in response to finding at least one text line recognition result, determining the text line recognition result closest to the single character detection recognition result among the found text line recognition results.

Here, the distance between the text line recognition result and the single word detection recognition result may be determined in various implementations.

For example, the difference between the arrangement order of the text line recognition results in the text line recognition result sequence and the order of the word detection recognition results in the word detection recognition result sequence may be used as the distance between the text line recognition result and the word detection recognition result.

For another example, the distance between the first character bounding box position in the text line recognition result and the second character bounding box position in the single-character detection recognition result may be used as the distance between the text line recognition result and the single-character detection recognition result.

Substep 2023, updating the first character bounding box position in the determined text line recognition result to the second character position in the single character detection recognition result.

The location update operation described above is described below with reference to specific examples:

as shown in fig. 3A, a text line recognition result sequence 302 and a single character detection recognition result sequence 303 corresponding to a text line image 301 are shown, and it can be seen from the figure that the first character in the text line recognition result sequence 302 is correctly recognized, but the position of the first character bounding box is relatively rough. The position of the bounding box of the second character in the single character detection and recognition result sequence 303 is accurate, but the second character has recognition errors, for example, the current error is recognized as the head, and the pre-error is recognized as the dip.

The step 202 executed based on the text line recognition result sequence 302 and the single character detection recognition result sequence 303 may be:

the position updating operation is performed for each individual character detection recognition result in each individual character detection recognition result sequence in the individual character detection recognition result sequence 303.

Here, when the substep 2021 is performed for the second characters "head" and "hectare" in the single character detection recognition result sequence 303, since the same first character does not exist in the text line recognition result sequence 302, the

substeps

2022 and 2023 are not performed any more.

For the "day" where the first character of the second characters in the single character detection recognition result sequence 303 appears, when the substep 2021 is executed, two text line recognition results corresponding to the first character which is the same as the "day" are found in the text line recognition results; in performing sub-step 2022, determining that the word detection recognition result corresponding to the "day" of the first occurrence of the second character in the sequence of word detection recognition results 303 is the closest text line recognition result corresponding to the "day" of the first character in the text line recognition results; then, in performing sub-step 2023, the first character bounding box position in the text line recognition result corresponding to the first character "day" in the text line recognition result sequence 302 is updated to the second bounding box position corresponding to the "day" in which the first character in the second character in the single character detection recognition result sequence 303 appears.

Similarly, a similar process is performed for the "day" where the second character appears in the second character in the single character detection recognition result sequence 303, and the position of the first character bounding box in the text line recognition result corresponding to the "day" where the second character appears in the second character in the text line recognition result sequence 302 is finally updated to the position of the second bounding box corresponding to the "day" where the second character appears in the second character in the single character detection recognition result sequence 303.

For the second character "qi" and "newspaper" in the single character detection recognition result sequence 303, when the substep 2021 is executed, a text line recognition result corresponding to the first character respectively identical to the "qi" and the "newspaper" exists in the searched text line recognition results; thus, when performing sub-step 2022, the text line recognition result of which the first character is "of", "qi", or "newspaper", respectively, can be directly determined as the corresponding text line language recognition result closest to the first character; then, in executing sub-step 2023, the position of the first character bounding box in the text line recognition result in which the first character in the text line recognition result sequence 302 is "in", or "in", respectively, is updated to the position of the first character bounding box in the single character detection recognition result in which the second character in the single character detection recognition result sequence 303 is "in", or "in", respectively.

After the step 202, the text line recognition result sequence 302 is as shown in the lower text line recognition result sequence 302 in fig. 3A, it can be seen that, except for the first character bounding box positions corresponding to "present" and "pre" in the single character detection recognition result, the positions of the other first character bounding boxes are updated by using the position of the second character bounding box in the single character detection recognition result, and compared with the position of the second character bounding box in the text line recognition result sequence before updating, the accuracy of the position of the second character bounding box in the text line recognition result sequence is higher, and the method is more suitable for the scene with higher requirement on the character position.

According to the character position correction method provided by the embodiment of the disclosure, the character bounding box position in the text line recognition result is corrected by using the character bounding box position in the single character detection recognition result corresponding to the text line image, which is more accurate relative to the character bounding box position in the text line recognition result, and the character in the text line recognition result is more accurate relative to the character in the single character detection recognition, so that the accuracy of the character bounding box position in the text line recognition result is improved, and the character position correction method is more suitable for the scene with higher requirement on the character position.

With continued reference to FIG. 4, a flow 400 of yet another embodiment of a character position correction method according to the present disclosure is shown. The character position correction method comprises the following steps:

step 401, obtaining a text line recognition result sequence and a single character detection recognition result sequence corresponding to the text line image.

In step 402, a position updating operation is performed on each single character detection recognition result in the sequence of single character detection recognition results.

In this embodiment, the specific operations of step 401 and step 402 and the technical effects thereof are substantially the same as the operations and effects of step 202 and step 202 in the embodiment shown in fig. 2, and are not repeated herein.

In step 403, an intra-line character gap average value of the text line recognition result sequence is calculated.

In this embodiment, the first character bounding box position in each text line recognition result in the sequence of text line recognition results may include an in-line start position and an in-line end position. Here, the in-line start position and the in-line end position in the first character bounding box position are used to respectively represent the minimum coordinate value and the maximum coordinate value of the circumscribed rectangle of the first character in the text line image in the direction parallel to the text line.

For example, when the text line in the text line image is in the horizontal direction, the origin of coordinates of the text line image is the top left corner vertex of the text line image, and the characters in the text line are arranged from left to right in the horizontal direction. At this time, the in-line start position in the first character bounding box position corresponding to the first character may be an abscissa value of a vertex coordinate of an upper left corner or a lower left corner of the circumscribed rectangle of the first character, and the in-line end position may be an abscissa value of a vertex coordinate of an upper right corner or a lower right corner of the circumscribed rectangle of the first character.

For another example, when the text line in the text line image is in the vertical direction, the origin of coordinates of the text line image is the top left vertex of the text line image, and the characters in the text line are arranged from top to bottom in the vertical direction. At this time, the in-line starting position in the first character bounding box position corresponding to the first character may be an ordinate value of a vertex coordinate of an upper left corner or an upper right corner of the circumscribed rectangle of the first character, and the in-line ending position may be an ordinate value of a coordinate of a lower left corner or a lower right corner of the circumscribed rectangle of the first character.

Here, the text line in the text line image does not necessarily define a specific direction. For example, the text lines may be arranged horizontally from left to right, or vertically from top to bottom. The lines of text may also be arranged from top left to bottom right.

In this embodiment, the intra-line character gap average value of the text line recognition result sequence is an average value of distances between an intra-line end position in the preceding text line recognition result and an intra-line start position in the following text line recognition result in two adjacent text line recognition results in the text line recognition results each updated by the first character bounding box position in the text line recognition result sequence.

Referring specifically to fig. 3B, the upper portion 302 in fig. 3B shows an enlarged view of the positions of the first character bounding boxes in the text line recognition result sequence corresponding to the revised positions of the first character bounding boxes in step 202 in fig. 3A.

As can be seen from fig. 3B, the text line recognition results each updated by the first character bounding box position in the text line recognition result sequence include the text line recognition results corresponding to the first characters "day", "atmosphere", and "newspaper", respectively, where the character gap between the adjacent first character "day" and the first character "is d1, where the character gap between the adjacent first character" day "and the first character" day "is d2, and where the character gap between the adjacent first character" day "and the first character" atmosphere "is d 3. And the intra-line character gap average value of the text line recognition result sequence is the average value d0 of d1, d2 and d3, which is (d1+ d2+ d 3)/3.

In step 404, for the text line recognition result that the in-line start position and the in-line end position in the text line recognition result sequence have not been updated, the in-line start position and the in-line end position in the text line recognition result are updated.

In this embodiment, the execution body may update the in-line start position and the in-line end position in the text line recognition result according to the in-line character gap average value of the text line recognition result sequence calculated in step 403 for the text line recognition result of which the in-line start position and the in-line end position are not updated in the text line recognition result sequence. Wherein the distance between the updated in-line starting position of the text line recognition result and the in-line ending position of the text recognition result preceding the text recognition result in the text line recognition result sequence is the calculated in-line character gap average value, and/or the distance between the updated in-line ending position of the text line recognition result and the in-line starting position of the text recognition result following the text recognition result in the text line recognition result sequence is the calculated in-line character gap average value.

Here, the description is continued by taking the text line recognition result sequence 302 shown in fig. 3B as an example. As can be seen from the description of the embodiment in fig. 2, after step 202 or step 402, the text line recognition results in the text line recognition result sequence 302 that have not been updated with the in-line start position and the in-line end position are the text line recognition results corresponding to the "present" and "pre" in the single character detection recognition result, where the second character recognition error is the second character recognition error. Here, in step 404, for the in-line start position and the in-line end position in the first character bounding box position in the text line recognition result in which the first character in the text line recognition result sequence is "present" and "pre", the in-line start position and the in-line end position in the text line recognition result in which the first character is "present" and "pre" may be updated based on the in-line character gap average value d0 calculated as described above, the distance between the in-line end position corresponding to the updated first character "present" and the in-line start position corresponding to the first character "day" may be the in-line character gap average value d0, and the distance between the in-line start position corresponding to the updated first character "pre" and the in-line end position corresponding to the first character "day" may be the in-line character gap average value d0, and/or the distance between the in-line end position corresponding to the updated first character "pre" and the in-line start position corresponding to the first character "newspaper" may be the in-line character average value d0. The updated text sequence recognition result sequence may be as shown in the lower portion 302 of fig. 3B.

As can be seen intuitively in fig. 3B, the accuracy of the position of the first character bounding box in the text line recognition result sequence 302 shown in the lower part in fig. 3B is higher relative to the position of the first character bounding box in the text line recognition result sequence 302 shown in the upper part in fig. 3B. The embodiment is that the space between characters in the text line is more uniform.

As is apparent from the above description, in the text line recognition result sequence, the in-line start position and the in-line end position of the text line recognition result whose first character bounding box position was not updated in step 402 are further corrected in

steps

403 and 404, and the updated text line recognition result is closer to the inter-line character gap of the entire text line recognition result sequence than before the update. In practice, the characters in the text line tend to be equally spaced, so that the accuracy of the position of the first character bounding box in the text line recognition result sequence is further improved through the operations of step 403 and step 404.

In some optional embodiments, the executing main body may further execute the following

steps

405 and 406 after executing step 402 or after executing step 404:

step 405, respectively determining the minimum value in the line start positions and the maximum value in the line end positions of the text line recognition results of which the line start positions and the line end positions are updated in the text line recognition result sequence as the text line start positions and the text line end positions.

Here, the first bounding box position in each text line recognition result in the text line recognition result sequence may further include a line start position and a line end position.

Here, the line start position and the line end position in the first character bounding box position are used to represent the minimum coordinate value and the maximum coordinate value of the circumscribed rectangle of the first character in the text line image in the direction perpendicular to the text line, respectively.

For example, when the text line in the text line image is in the horizontal direction, the origin of coordinates of the text line image is the top left corner vertex of the text line image, and the characters in the text line are arranged from left to right in the horizontal direction. At this time, the line start position in the first character bounding box position corresponding to the first character may be a vertical coordinate value of a vertex coordinate of an upper left corner or an upper right corner of the circumscribed rectangle of the first character, and the line end position may be a vertical coordinate value of a vertex coordinate of a lower left corner or a lower right corner of the circumscribed rectangle of the first character.

For another example, when the text line in the text line image is in the vertical direction, the origin of coordinates of the text line image is the top left vertex of the text line image, and the characters in the text line are arranged from top to bottom in the vertical direction. At this time, the line start position in the first character bounding box position corresponding to the first character may be an abscissa value of a vertex coordinate of an upper left corner or an upper left corner of the circumscribed rectangle of the first character, and the line end position may be an abscissa value of a vertex coordinate of an upper right corner or a lower right corner of the circumscribed rectangle of the first character.

Referring specifically to fig. 3C, the upper portion 302 in fig. 3C shows the text line recognition result sequence 302 shown in the lower portion in fig. 3B. The text line recognition results updated in the line start position and line end position in the upper text line recognition result sequence 302 in fig. 3C include the text line recognition results corresponding to the first characters "day", "angry", and "newspaper". The minimum value of the line start positions of the text line recognition results is the line start position y1 corresponding to the first character day, namely the vertical coordinate corresponding to the upper side of the circumscribed rectangle of the first character day; the maximum value among the line end positions of the above-described respective text line recognition results is the line end position y2 corresponding to the first character "newspaper", that is, the ordinate corresponding to the lower side of the circumscribed rectangle of the first character "newspaper". Thus, y1 and y2 may be determined as the text line start position and the text line end position, respectively.

In step 406, for the text line recognition result whose line start position and line end position have not been updated in the text line recognition result sequence, the line start position and line end position in the text line recognition result are updated with the text line start position and the text line end position, respectively.

For example, for the text recognition results in which the first characters in the text line recognition sequence 302 at the upper part in fig. 3C, the line start position and the line end position of which are not updated, are "present" and "pre", respectively, the line start position and the line end position in the first character position are updated to y1 and y2, and the updated text line recognition result sequence 302 is shown in the lower part in fig. 3C.

As can be seen intuitively from fig. 3C, the accuracy of the position of the first character bounding box in the text line recognition result sequence 302 shown in the lower part of fig. 3C is higher relative to the position of the first character bounding box in the text line recognition result sequence 302 shown in the upper part of fig. 3C, particularly in that the line height of each character is further approximated to the real case as viewed from the line height.

As can be seen from the above description, the line start position and the line end position of the text line recognition result that is not updated in the text line recognition result sequence are further corrected through

steps

405 and 406, and the text line recognition result after updating is closer to the line height range of the entire text line recognition result sequence before updating. In practice, the characters in the text line tend to be in the same line and then in the corresponding line height range, that is, between a certain line starting position and a certain line ending position, so that the accuracy of the position of the first character bounding box in the text line recognition result sequence is further improved through the operations of step 405 and step 405.

As can be seen from fig. 4, compared with the embodiment corresponding to fig. 2, the flow 400 of the character position correction method in the present embodiment has more operations of updating the in-line start position and the in-line end position of the text line recognition result that has not been updated with the in-line start position and the in-line end position, updating the in-line start position and the in-line end position according to the in-line character gap average value, and optionally updating the in-line start position and the in-line end position of the text line recognition result that has not been updated with the in-line start position and the in-line end position. Therefore, the scheme described in the embodiment can further improve the accuracy of the position of the first character bounding box in the text line recognition result sequence.

With further reference to fig. 5, as an implementation of the methods shown in the above-mentioned figures, the present disclosure provides an embodiment of a character position correction apparatus, which corresponds to the method embodiment shown in fig. 2, and which is particularly applicable to various electronic devices.

As shown in fig. 5, the character position correction apparatus 500 of the present embodiment includes: an acquisition unit 501 and a first updating unit 502. The acquiring unit 501 is configured to acquire a text line recognition result sequence and a single character detection recognition result sequence corresponding to a text line image, where the text line recognition result includes a first character and a first character bounding box position, and the single character detection recognition result includes a second character and a second character bounding box position; a first updating unit 502 configured to perform the following position updating operation for each single character detection recognition result in the sequence of single character detection recognition results: searching a text line recognition result with the first character being the same as the second character in the single character detection recognition result in the text line recognition result sequence; in response to the at least one found text line recognition result, determining a text line recognition result closest to the single character detection recognition result in the found text line recognition results; and updating the position of the first character bounding box in the determined text line recognition result to the position of the second character in the single character detection recognition result.

In this embodiment, the specific processing of the obtaining unit 501 and the first updating unit 502 of the character position correcting apparatus 500 and the technical effects thereof can refer to the related descriptions of step 201 and step 202 in the corresponding embodiment of fig. 2, which are not repeated herein.

In some alternative embodiments, the first character bounding box location may include an in-line start location and an in-line end location; and the apparatus 500 may further comprise:

an average value calculating unit 503 configured to calculate an intra-line character gap average value of the text line recognition result sequence, where the intra-line character gap average value of the text line recognition result sequence is an average value of distances between an intra-line end position in a preceding text line recognition result and an intra-line start position in a following text line recognition result in two adjacent text line recognition results in the text line recognition results each updated by the first character bounding box position in the text line recognition result sequence;

a second updating unit 504 configured to update the in-line start position and the in-line end position in the text line recognition result according to the in-line character gap average value for a text line recognition result of which the in-line start position and the in-line end position in the text line recognition result are not updated in the text line recognition result sequence, wherein a distance between the updated in-line start position of the text line recognition result and the in-line end position of a text recognition result preceding the text recognition result in the text line recognition result sequence is the in-line character gap average value, and/or a distance between the updated in-line end position of the text line recognition result and the in-line start position of a text recognition result succeeding the text recognition result in the text line recognition result sequence is the in-line character gap average value.

In some alternative embodiments, the first character bounding box location may include a line start location and a line end location; and the apparatus 500 may further comprise:

a determining unit 505 configured to determine a minimum value of line start positions and a maximum value of line end positions of the respective text line recognition results updated with the line start position and the line end position in the text line recognition result sequence as a text line start position and a text line end position, respectively;

a third updating unit 506, configured to update the line start position and the line end position in the text line recognition result with the text line start position and the text line end position, respectively, for the text line recognition result in which the line start position and the line end position are not updated in the text line recognition result sequence.

In some alternative embodiments, the text line recognition model may include a convolutional neural network, a cyclic neural network, and a connection time classification CTC arranged in sequence.

In some alternative embodiments, the text line recognition model may include a convolutional neural network and an attention-based recurrent neural network arranged in sequence.

It should be noted that details of implementation and technical effects of each unit in the character position correction device provided in the embodiments of the present disclosure may refer to descriptions of other embodiments in the present disclosure, and are not described herein again.

Referring now to FIG. 6, a block diagram of a computer system 600 suitable for use in implementing the electronic device of the present disclosure is shown. The computer system 600 shown in fig. 6 is only one example and should not bring any limitations to the functionality or scope of use of embodiments of the present disclosure.

As shown in fig. 6, computer system 600 may include a processing device (e.g., central processing unit, graphics processor, etc.) 601 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage device 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data necessary for the operation of the computer system 600 are also stored. The processing device 601, the ROM 602, and the RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

Generally, the following devices may be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, etc.; output devices 607 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 608 including, for example, tape, hard disk, etc.; and a communication device 609. The communication means 609 may allow the computer system 600 to communicate with other devices, wireless or wired, to exchange data. While fig. 6 illustrates a computer system 600 having various means of electronic equipment, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 609, or may be installed from the storage means 608, or may be installed from the ROM 602. The computer program, when executed by the processing device 601, performs the above-described functions defined in the methods of embodiments of the present disclosure.

It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.

The computer-readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to implement the character position correction method shown in the embodiment shown in fig. 2 and its alternative embodiments, and/or the character position correction method shown in the embodiment shown in fig. 4 and its alternative embodiments.

Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present disclosure may be implemented by software or hardware. Here, the name of a unit does not constitute a limitation of the unit itself in some cases, and for example, the acquisition unit may also be described as a "unit that acquires a text line recognition result sequence and a single character detection recognition result sequence corresponding to a text line image".

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Claims

1. A character position correction method comprising:

acquiring a text line recognition result sequence and a single character detection recognition result sequence corresponding to a text line image, wherein the text line recognition result comprises a first character and a first character bounding box position, and the single character detection recognition result comprises a second character and a second character bounding box position;

and executing the following position updating operation on each single character detection recognition result in the single character detection recognition result sequence: searching a text line recognition result with the first character being the same as the second character in the single character detection recognition result in the text line recognition result sequence; in response to the at least one found text line recognition result, determining a text line recognition result closest to the single character detection recognition result in the found text line recognition results; and updating the position of the first character bounding box in the determined text line recognition result to the position of the second character in the single character detection recognition result.

2. The method of claim 1, wherein the first character bounding box location comprises an in-line start location and an in-line end location; and

the method further comprises the following steps:

3. The method of claim 1 or 2, wherein the first character bounding box location comprises a line start location and a line end location; and

the method further comprises the following steps:

4. The method of claim 1, wherein the text line recognition result sequence corresponding to the text line image is obtained by:

5. The method of claim 4, wherein the textual line recognition model comprises a sequential convolutional neural network, a cyclic neural network, and a Connection Time Classification (CTC).

6. The method of claim 4, wherein the text line recognition model comprises a sequential convolutional neural network and an attention-based cyclic neural network.

7. The method of claim 1, wherein the sequence of word detection recognition results corresponding to the text line image is obtained by:

8. A character position correction apparatus comprising:

the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is configured to acquire a text line recognition result sequence and a single character detection recognition result sequence corresponding to a text line image, the text line recognition result comprises a first character and a first character bounding box position, and the single character detection recognition result comprises a second character and a second character bounding box position;

a first updating unit configured to perform the following position updating operation for each single character detection recognition result in the sequence of single character detection recognition results: searching a text line recognition result with the first character being the same as the second character in the single character detection recognition result in the text line recognition result sequence; in response to the at least one found text line recognition result, determining a text line recognition result closest to the single character detection recognition result in the found text line recognition results; and updating the position of the first character bounding box in the determined text line recognition result to the position of the second character in the single character detection recognition result.

9. An electronic device, comprising:

one or more processors;

a storage device having one or more programs stored thereon,

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method recited in any of claims 1-7.

10. A computer-readable storage medium, on which a computer program is stored, wherein the computer program, when executed by one or more processors, implements the method of any one of claims 1-7.