CN111325195B

CN111325195B - Text recognition method and device and electronic equipment

Info

Publication number: CN111325195B
Application number: CN202010097683.2A
Authority: CN
Inventors: 余红
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2020-02-17
Filing date: 2020-02-17
Publication date: 2024-01-26
Anticipated expiration: 2040-02-17
Also published as: CN117912017A; CN111325195A

Abstract

The embodiment of the specification discloses a text recognition method, a text recognition device and electronic equipment. And extracting block characteristics of the text blocks aiming at the text blocks, and judging whether the block characteristics of the text blocks in two adjacent lines reach preset characteristic conditions or not, wherein the preset characteristic conditions are characteristic conditions which are established by training samples and are met by the block characteristics of the text blocks in two adjacent lines when the text blocks in two adjacent lines belong to the same text information. And determining operations on the text blocks of the two adjacent lines according to the judging result, wherein the operations comprise merging into text information and not merging.

Description

Text recognition method and device and electronic equipment

Technical Field

Embodiments of the present disclosure relate to the field of computer technologies, and in particular, to a text recognition method, a text recognition device, and an electronic device.

Background

In daily life, it is often necessary to perform character recognition on a carrier such as a picture, and further perform character processing based on the recognition result. How to recognize complete text information is then a matter of constant discussion in the industry.

Disclosure of Invention

In view of this, the embodiments of the present disclosure provide a text recognition method, apparatus and electronic device for accurately recognizing text information.

The embodiment of the specification adopts the following technical scheme:

the embodiment of the specification provides a text recognition method, which comprises the following steps:

performing character recognition on the object to be recognized, and respectively obtaining text blocks based on the recognized text of each row;

extracting block features of the text block;

judging whether the block characteristics of two adjacent lines of text blocks reach preset characteristic conditions or not, wherein the preset characteristic conditions are characteristic conditions which are established by using training samples and are met by the block characteristics of the two adjacent lines of text blocks when the two adjacent lines of text blocks belong to the same text information;

and executing operation on the text blocks in the two adjacent lines according to the judging result, wherein the operation comprises one of merging and non-merging.

The embodiment of the specification also provides a text recognition method, which comprises the following steps:

performing character recognition on the training sample, and respectively obtaining text blocks based on the recognized text of each row;

extracting block features of the text block;

and training a merging model by using the block characteristics of the text blocks to determine preset characteristic conditions in the merging model, so as to judge whether the block characteristics of two adjacent lines of text blocks reach the preset characteristic conditions when the block characteristics of two adjacent lines of text blocks in an object to be identified are identified, and executing operations on the two adjacent lines of text blocks according to a judging result, wherein the operations comprise one of merging and non-merging.

extracting block features of the text block;

processing block features of two adjacent lines of text blocks by using a merging model to obtain a judging result of whether the two adjacent lines of text blocks reach preset feature conditions, wherein the merging model is obtained by training the block features of the two adjacent lines of text blocks identified from training samples to determine the preset feature conditions;

The embodiment of the specification also provides a text recognition device, which comprises:

the character recognition module is used for recognizing characters of the object to be recognized, and text blocks are respectively obtained based on the recognized characters of each row;

the extraction module is used for extracting block characteristics of the text block;

the judging module is used for judging whether the block characteristics of the adjacent two lines of text blocks reach preset characteristic conditions or not, wherein the preset characteristic conditions are characteristic conditions which are established by using training samples and are met by the block characteristics of the adjacent two lines of text blocks when the adjacent two lines of text blocks belong to the same text information;

And the execution module is used for executing operation on the text blocks in the two adjacent lines according to the judging result, wherein the operation comprises one of merging and non-merging.

the character recognition module is used for recognizing characters of the training sample and respectively obtaining text blocks based on the recognized characters of each row;

the training module is used for training a merging model by utilizing the block characteristics of the text blocks so as to determine preset characteristic conditions in the merging model, so that when the block characteristics of two adjacent lines of text blocks in an object to be identified are identified, whether the block characteristics of the two adjacent lines of text blocks reach the preset characteristic conditions or not is judged, and an operation is carried out on the two adjacent lines of text blocks according to a judging result, wherein the operation comprises one of merging and non-merging.

the model processing module is used for processing block characteristics of two adjacent lines of text blocks by utilizing a merging model to obtain a judging result of whether the two adjacent lines of text blocks reach preset characteristic conditions, and the merging model is obtained by training the block characteristics of the two adjacent lines of text blocks identified from training samples to determine the preset characteristic conditions;

The embodiment of the specification also provides an electronic device, including:

a processor; and

a memory configured to store a computer program that, when executed, causes the processor to:

extracting block features of the text block;

a processor; and

extracting block features of the text block;

a processor; and

extracting block features of the text block;

The above-mentioned at least one technical scheme that this description embodiment adopted can reach following beneficial effect:

the embodiment of the specification provides an automatic recognition scheme for whether texts are combined or not, and the actual technical scheme is that text blocks can be obtained respectively based on recognized text of each line by performing text recognition on the text of each line in an object to be recognized. And extracting block characteristics of the text blocks aiming at the text blocks, and judging whether the block characteristics of the text blocks in two adjacent lines reach preset characteristic conditions or not, wherein the preset characteristic conditions are characteristic conditions which are established by training samples and are met by the block characteristics of the text blocks in two adjacent lines when the text blocks in two adjacent lines belong to the same text information. And determining operations on the text blocks of the two adjacent lines according to the judging result, wherein the operations comprise merging into text information and not merging.

Therefore, the method and the device can automatically identify whether the adjacent two lines of text blocks belong to the same text information by utilizing the block characteristics of the text blocks and combining preset characteristic conditions established in the training samples, and determine whether to combine the adjacent two lines of text blocks according to the judgment result. The scheme disclosed by the embodiment of the specification can improve the text recognition efficiency.

Drawings

The accompanying drawings, which are included to provide a further understanding of embodiments of the present specification and are incorporated in and constitute a part of this specification, illustrate embodiments of the present specification and together with the description serve to explain the present application and do not constitute an undue limitation to the present application. In the drawings:

fig. 1 is a schematic diagram of a system architecture of a text recognition scheme according to an embodiment of the present disclosure;

FIG. 2 is a flowchart of a text recognition method according to an embodiment of the present disclosure;

FIG. 3 is a schematic diagram of text blocks in a text recognition process according to an embodiment of the present disclosure;

FIG. 4 is a schematic diagram of text blocks in a text recognition process according to an embodiment of the present disclosure;

FIG. 5 is a schematic diagram of text blocks in a text recognition process according to an embodiment of the present disclosure; .

Fig. 6 is a flowchart of an application example of a text recognition method according to an embodiment of the present disclosure;

FIG. 7 is a flowchart of a text recognition method according to an embodiment of the present disclosure;

FIG. 8 is a flowchart of a text recognition method according to an embodiment of the present disclosure;

fig. 9 is a flowchart of an application example of a text recognition method provided in an embodiment of the present disclosure;

Fig. 10 is a flowchart of an application example of a text recognition method provided in an embodiment of the present disclosure;

fig. 11 is a schematic structural diagram of a text recognition device according to an embodiment of the present disclosure;

fig. 12 is a schematic structural diagram of a text recognition device according to an embodiment of the present disclosure;

fig. 13 is a schematic structural diagram of a text recognition device according to an embodiment of the present disclosure.

Detailed Description

Analysis of the prior art has found that the prior art provides a single character recognition technique, particularly, character recognition by using an optical character recognition technique. And then combining the recognized characters in a manual mode to form complete text information.

The embodiment of the specification provides a text recognition method, a text recognition device and electronic equipment, and the actual technical scheme is that text blocks can be obtained based on recognized text of each line by performing text recognition on text of each line in an object to be recognized. And extracting block characteristics of the text blocks aiming at the text blocks, and judging whether the block characteristics of the text blocks in two adjacent lines reach preset characteristic conditions or not, wherein the preset characteristic conditions are characteristic conditions which are established by training samples and are met by the block characteristics of the text blocks in two adjacent lines when the text blocks in two adjacent lines belong to the same text information. And determining operations on the text blocks of the two adjacent lines according to the judging result, wherein the operations comprise merging into text information and not merging.

Therefore, the embodiment of the specification provides an automatic recognition scheme for whether texts are combined, the scheme utilizes the block characteristics of the text blocks, combines and utilizes preset characteristic conditions established in training samples, can automatically recognize whether two adjacent lines of text blocks belong to the same text information, and determines whether to combine the two adjacent lines of text blocks according to a judging result. The scheme disclosed by the embodiment of the specification can improve the text recognition efficiency.

For the purposes of making the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be clearly and completely described below with reference to specific embodiments of the present application and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, of the embodiments of the present specification. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are intended to be within the scope of the present application based on the embodiments herein.

The following describes in detail the technical solutions provided by the embodiments of the present specification with reference to the accompanying drawings.

Fig. 1 is a schematic system architecture diagram of a text recognition scheme according to an embodiment of the present disclosure.

As shown in fig. 1, a system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

The terminal devices 101, 102, 103 interact with the server 105 via the network 104 to receive or send messages or the like. Various client applications can be installed on the terminal devices 101, 102, 103. Such as browser-like applications, search-like applications, instant messaging-like tools, etc.

The terminal devices 101, 102, 103 may be hardware or software. When the terminal devices 101, 102, 103 are hardware, they may be various electronic devices including, but not limited to, smartphones, tablet computers, electronic book readers, laptop and desktop computers, and the like. When the terminal devices 101, 102, 103 are software, they can be installed in the above-listed electronic devices. Which may be implemented as multiple software or software modules (e.g., multiple software or software modules for providing distributed services) or as a single software or software module. The present invention is not particularly limited herein.

The server 105 may be a server providing various services, such as a backend server testing client applications installed on the terminal devices 101, 102, 103. It should be noted that, the text recognition method provided by the embodiments of the present disclosure is generally performed by the server 105, and accordingly, the text recognition device is generally disposed in the server 105. At this time, the terminal apparatuses 101, 102, 103 and the network 104 may not exist.

It is also noted that the testing of the client applications installed on the terminal devices 101, 102, 103 may also be performed by the terminal devices 101, 102, 103. At this time, the text recognition method may be performed by the terminal apparatuses 101, 102, 103, and accordingly, the text recognition means may be provided in the terminal apparatuses 101, 102, 103. At this point, the exemplary system architecture 100 may not have the server 105 and network 104 present.

The server 105 may be hardware or software. When the server 105 is hardware, it may be implemented as a distributed server cluster formed by a plurality of servers, or as a single server. When server 105 is software, it may be implemented as multiple software or software modules (e.g., multiple software or software modules for providing distributed services), or as a single software or software module. The present invention is not particularly limited herein.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

Fig. 2 is a flowchart of a text recognition method according to an embodiment of the present disclosure.

Step 201: and carrying out character recognition on the object to be recognized, and respectively obtaining text blocks based on the recognized text of each row.

In a specific application, text information can be presented in the object to be identified. The object to be identified described in the embodiment of the present disclosure may be an image, which may be obtained by scanning a carrier carrying text information, or may be obtained by digital synthesis, which is not particularly limited herein.

The characters of each line of characters in the object to be recognized are recognized, and in particular, an optical character recognition technology OCR (full name: optical Character Recognition) can be adopted. OCR refers to the process of an electronic device (e.g., a scanner or digital camera) checking characters printed on paper, determining their shape by detecting dark and light patterns, and then translating the shape into computer text by a character recognition method, simply speaking, recognizing the text in an image.

In the embodiment of the present specification, the object to be identified may include a plurality of lines of text. The rows herein are not limited to a particular direction, but rather represent a set of words arranged along a straight line. In the embodiment of the present disclosure, performing text recognition on each line of text in an object to be recognized, and obtaining text blocks based on the recognized text lines respectively may include:

Performing character recognition on the object to be recognized;

the character arrangement mode in the object to be identified is also identified, and each identified row of characters is obtained;

each identified line of text is marked as a text block.

The text block described in the embodiment of the present specification is a text unit, and is not limited to a text structure.

The embodiment of the specification carries out subsequent processing on the recognized characters by taking the text block as a unit.

Step 203: and extracting block characteristics of the text block.

In the embodiment of the present specification, the block feature is a feature possessed by the text block itself. The innovation of the automatic text block identification and merging scheme provided by the embodiment of the specification is that the text blocks are taken as units, the block characteristics of the text blocks are identified, and the block characteristics of two adjacent lines of text blocks are combined to judge whether the text blocks belong to the same text information.

In the embodiment of the present specification, the block feature may include one or both of a line height of each text block and a line spacing between two adjacent lines of the text blocks, which is not particularly limited herein.

In an embodiment of the present specification, extracting the block feature of the text block may include:

creating a coordinate space;

Identifying coordinate values of the text block in a coordinate space;

and determining the block characteristics of the text block based on the coordinate values.

Specifically, the line height of the text block, the line spacing between two adjacent lines of text blocks, or other dimensions may be calculated using the coordinate values, which are not particularly limited herein.

Step 205: judging whether the block characteristics of two adjacent lines of text blocks reach preset characteristic conditions or not, wherein the preset characteristic conditions are characteristic conditions which are established by using training samples and are met by the block characteristics of the two adjacent lines of text blocks when the two adjacent lines of text blocks belong to the same text information.

According to the embodiment of the specification, the training samples are analyzed or learned, the preset characteristic conditions are extracted, when the preset characteristic conditions are that the adjacent two lines of text blocks belong to the same text information, the block characteristics of the adjacent two lines of text blocks have the characteristic threshold conditions, and the preset characteristic conditions can be used as the text block merging conditions. This provides the possibility to implement the text block automatic merging scheme proposed by the embodiments of the present specification.

The text information described in the embodiment of the present specification refers to a paragraph or a file describing a specified item composed of text. The fact that two adjacent lines of text blocks belong to the same text information means that the text descriptions contained in the two adjacent lines of text blocks are identical items, and can be combined into a paragraph or a file in terms of form.

Specifically, in another example of the present specification, determining whether the block characteristics of two adjacent lines of text blocks reach the preset characteristic condition includes:

and judging that the row height of the text blocks in two adjacent rows is not smaller than the row spacing.

In this case, the line height of the adjacent two lines of text blocks is not smaller than the line spacing, which is one example of the preset feature condition.

The principle is that if the current actual line height is not smaller than the line spacing, namely the line height is close to the line spacing, or the line height is larger than the line spacing, two adjacent lines of text blocks are close to each other, and the two adjacent lines of text blocks belong to the same text information with high probability. Otherwise, if the current actual line height is smaller than the line spacing, that is, the line spacing is large, and the distance between two adjacent lines of text blocks is large, the adjacent lines of text blocks are not in the same text information with high probability.

In this example, the rows of adjacent two lines of text blocks are close or equal.

The judging that the line height of the text blocks in the two adjacent lines is not smaller than the line spacing comprises the following steps:

and judging whether the line height of the text blocks in the two adjacent lines exceeds the line spacing to reach a preset difference value.

In this case, the preset difference value is one of preset characteristic conditions.

The principle is that if the line height exceeds the line spacing and reaches the preset difference, namely the word height in the text block is far greater than the line spacing, the word is large and the line spacing is small, the text blocks in two adjacent lines belong to the same text information with high probability. Otherwise, if the line height exceeds the line spacing, or if the line height exceeds the line spacing but does not exceed the preset difference, that is, the word height in the text block is smaller than the line spacing, the word is small and the line spacing is large, the text blocks in two adjacent lines do not belong to the same text information with high probability.

In another example of the present specification, determining whether the block features of two adjacent lines of the text blocks reach a preset feature condition includes:

if the block characteristics of the text block comprise the line height, judging whether the line height difference of two adjacent lines of the text block is not larger than a preset line height difference.

In this case, the preset level difference is one of preset characteristic conditions.

The principle is that if the difference of the lines of lines is not larger than the preset line difference, namely the lines of adjacent two lines of text blocks are close or equal, the adjacent two lines of text blocks have larger probability of belonging to the same text information. If the difference of the lines is larger than the preset height difference, namely the difference of the lines is large, the text blocks in two adjacent lines do not belong to the same text information with larger probability.

The embodiments of the present specification propose that the specific examples described above may be used alone or in combination for determination, and are not limited thereto. In addition to the above two examples, other preset feature conditions and corresponding block features may be analyzed or identified based on the training samples, which are not specifically limited herein.

The determination result of step 205 may include two types, that is, the predetermined feature condition is reached or the predetermined feature condition is not reached.

Step 207: and executing operation on the two adjacent lines of text blocks according to the judging result, wherein the operation can comprise one of merging and non-merging.

The non-merging may be arranged according to an arrangement mode corresponding to the two lines of text blocks in the original object to be identified, or may isolate the two lines of text blocks according to a preset strategy.

If the judgment result is merging, two adjacent lines of text blocks can be merged into paragraphs or files. The text format after merging can be arbitrarily set, and is not limited.

In the embodiment of the present disclosure, after the text blocks are combined, subsequent text processing, such as text review and quality inspection, may be performed, which is not specifically limited herein.

The embodiment of the specification provides an automatic recognition scheme for whether texts are combined or not, the scheme utilizes block characteristics of text blocks, and combines and utilizes preset characteristic conditions established in training samples, so that whether two adjacent lines of text blocks belong to the same text information can be automatically recognized, and whether the two adjacent lines of text blocks are combined or not is determined according to a judging result. The scheme disclosed by the embodiment of the specification can improve the text recognition efficiency.

As shown in fig. 3, "x" indicates a text, and fig. 3 shows two lines of text blocks adjacent to each other up and down, the line height exceeding the line spacing, in which case it can be determined that the two lines of text blocks belong to the same text information, and can be merged together.

As shown in fig. 4, two text blocks of the upper and lower adjacent lines are relatively close to each other in line space, in which case it can be determined that the two text blocks belong to the same text information, and can be merged together.

As shown in fig. 5, in two adjacent lines of text blocks, the line height of the upper line of text blocks is larger than the line height of the lower line of text blocks, and it is determined that the two lines of text blocks do not belong to the same text information, and the two lines of text blocks do not need to be combined.

Fig. 6 is a flowchart of an application example of a text recognition method according to an embodiment of the present disclosure.

Step 602: and detecting service information issued by the user.

The execution main body of the method can be a service information platform, and can receive various service information uploaded or released by a user, namely, detect the service information released by the user. The type of the service information may be video, text, and pictures, and the text is not particularly limited herein, such as a poster, and the like.

In the embodiment of the present disclosure, detecting the service information issued by the user may include detecting the service information issued by the user periodically or aperiodically, or retrieving the scanned service information from a database, which is not limited herein.

Step 604: the object to be identified is extracted from the business information.

The service information described in the embodiments of the present disclosure may include an object to be identified, such as an image. In this case, the service information may be text containing pictures, or may be video, and the image is an image frame extracted from the video.

Step 606 may refer to the content of step 201 above, step 608 may refer to the content of step 203 above, step 610 may refer to the content of step 205 above, and step 612 may refer to the content of step 207 above, which is not specifically limited herein.

Fig. 7 is a flowchart of a text recognition method according to an embodiment of the present disclosure. The present method describes a merge model formation method applied to practice.

Step 702: and performing character recognition on the training sample, and respectively obtaining text blocks based on the recognized text of each row.

The text recognition scheme of the training sample may refer to step 201 above, and will not be described herein.

Wherein the training samples may comprise one or both of a white sample and a black sample, so that two adjacent lines of text blocks in the white sample belong to the same text information and can be merged together, while two adjacent lines of text blocks in the black sample cannot be merged. The sample amounts of the white sample and the black sample may be selected as needed and in practice, and are not particularly limited.

Step 704: block features of text blocks are extracted.

Reference may be made specifically to step 203 above, and details are not described here.

Step 706: training a merging model by using block features of text blocks to determine preset feature conditions in the merging model, so as to judge whether the block features of two adjacent lines of text blocks reach the preset feature conditions when the block features of two adjacent lines of text blocks in an object to be identified are identified, and executing operations on the two adjacent lines of text blocks according to judgment results, wherein the operations comprise one of merging and non-merging.

In the embodiment of the specification, a process of judging whether the block characteristics of two adjacent lines of text blocks reach the preset characteristic conditions is modeled, a merging model is trained based on a principle of machine learning, and the merging model is enabled to continuously adjust and update the preset characteristic conditions based on the block characteristics of the text blocks in a training sample until a model result of the merging model reaches a preset effect.

The merging model described in the embodiments of the present disclosure may select a classification model, such as a decision tree, and is not specifically limited herein.

In addition, before training the merging model by using the block features of the text block, the method may further include:

Acquiring merging information of two adjacent lines of text blocks in the training sample, wherein the merging information represents whether the two adjacent text blocks are merged or not;

and marking the block characteristics according to the merging information.

The merging information records whether the two adjacent lines of text blocks can be merged or not, and labels are added to the block characteristics based on the merging information to form a mark. The method embodies a supervised learning method and improves training efficiency of the merging model.

In another embodiment, the merging model may be trained without pre-marking by using an unsupervised learning method or a semi-supervised learning method.

In other embodiments of the present disclosure, the preset feature condition may also be determined using a manual identification scheme.

Fig. 8 is a flowchart of a text recognition method according to an embodiment of the present disclosure.

Step 801 may refer to the content of step 201 above, and step 803 may refer to the content of step 203 above, which is not specifically limited herein.

Step 805: and processing the block characteristics of the adjacent two lines of text blocks by using a merging model to obtain a judging result of whether the adjacent two lines of text blocks reach preset characteristic conditions, wherein the merging model is obtained by training the block characteristics of the adjacent two lines of text blocks identified from training samples to determine the preset characteristic conditions.

Processing block characteristics of two adjacent lines of text blocks by utilizing a merging model, specifically comprising the following steps:

and using the characteristics of the text blocks in two adjacent lines as input, and judging whether the characteristics of the text blocks in two adjacent lines reach preset characteristic conditions or not by using a merging model so as to obtain a judging result.

Step 807 may refer to step 207 above, and is not specifically limited herein.

Fig. 9 is a flowchart of an application example of a text recognition method according to an embodiment of the present disclosure.

Step 902: and obtaining a training picture.

Step 904: OCR recognition is carried out on the training pictures;

step 906: combining the identified text of each line to obtain text blocks, and acquiring the coordinates of each text block;

step 908: calculating the row height of the text block according to the coordinates of the text block;

step 910: calculating the line spacing between two adjacent lines of text blocks according to the coordinates of the two adjacent lines of text blocks;

step 912: labeling whether two adjacent lines of text blocks are combined or not;

step 914: acquiring block characteristics formed by the line height of the text blocks and the line spacing between two adjacent lines of text blocks and marked merging information;

step 916: training the decision tree model by using the acquired block characteristics and marked merging information to adjust preset characteristic conditions, thereby obtaining a merging model.

Fig. 10 is a flowchart of a text recognition method according to an embodiment of the present disclosure.

Step 1001: and obtaining a predicted picture.

Step 1003: OCR recognition is carried out on the predicted picture;

step 1005: combining the identified text of each line to obtain text blocks, and acquiring the coordinates of each text block;

step 1007: calculating the row height of the text block according to the coordinates of the text block;

step 1009: calculating the line spacing between two adjacent lines of text blocks according to the coordinates of the two adjacent lines of text blocks;

step 1011: labeling whether two adjacent lines of text blocks are combined or not;

step 1013: acquiring block characteristics formed by the line height of the text blocks and the line spacing between two adjacent lines of text blocks;

step 1015: inputting the obtained block characteristics into a merging model obtained by training, and judging whether the block characteristics reach preset characteristic conditions or not by the merging model;

step 1017: the merge model outputs whether the result is merged.

Fig. 11 is a schematic structural diagram of a text recognition device according to an embodiment of the present disclosure.

The apparatus may include:

the character recognition module 1101 performs character recognition on the object to be recognized, and obtains text blocks based on the recognized text of each line;

an extraction module 1102 that extracts block features of the text block;

A judging module 1103, configured to judge whether the block features of the two adjacent lines of text blocks reach a preset feature condition, where the preset feature condition is a feature condition that is established by using a training sample and is satisfied by the block features of the two adjacent lines of text blocks when the two adjacent lines of text blocks belong to the same text information;

and an execution module 1104, configured to execute an operation on the text blocks in the two adjacent lines according to the determination result, where the operation includes one of merging and non-merging.

The text recognition device described in the embodiments of the present disclosure provides an automatic recognition scheme for whether text is merged, and by using block features of text blocks and combining preset feature conditions established in training samples, it is able to automatically recognize whether two adjacent lines of text blocks belong to the same text information, and determine whether to merge the two adjacent lines of text blocks according to a determination result. The scheme disclosed by the embodiment of the specification can improve the text recognition efficiency.

Based on the same inventive concept, the embodiments of the present disclosure further provide an electronic device, including:

a processor; and

extracting block features of the text block;

Based on the same inventive concept, there is also provided in embodiments of the present specification a computer readable storage medium comprising a computer program for use in connection with an electronic device, the computer program being executable by a processor to perform the steps of:

extracting block features of the text block;

Fig. 12 is a schematic structural diagram of a text recognition device according to an embodiment of the present disclosure.

The apparatus may include:

the text recognition module 1201 performs text recognition on the training sample, and obtains text blocks based on the recognized text of each line respectively;

an extraction module 1202 that extracts block features of the text block;

the training module 1203 trains a merging model by using the block features of the text blocks to determine preset feature conditions in the merging model, so as to judge whether the block features of two adjacent lines of text blocks reach the preset feature conditions when the block features of two adjacent lines of text blocks in an object to be identified are identified, and execute operations on the two adjacent lines of text blocks according to a judgment result, wherein the operations comprise one of merging and non-merging.

a processor; and

extracting block features of the text block;

Fig. 13 is a schematic structural diagram of a text recognition device according to an embodiment of the present disclosure

The apparatus may include:

the character recognition module 1301 is used for recognizing characters of the object to be recognized, and text blocks are respectively obtained based on the recognized characters of each row;

an extraction module 1302 that extracts block features of the text block;

the model processing module 1303 processes the block characteristics of the two adjacent lines of text blocks by using a merging model to obtain a judging result of whether the two adjacent lines of text blocks reach a preset characteristic condition, wherein the merging model is obtained by training the block characteristics of the two adjacent lines of text blocks identified from a training sample to determine the preset characteristic condition;

and an execution module 1304, configured to execute an operation on the text blocks in the two adjacent lines according to the determination result, where the operation includes one of merging and non-merging.

a processor; and

Extracting block features of the text block;

extracting block features of the text block;

In the 90 s of the 20 th century, improvements to one technology could clearly be distinguished as improvements in hardware (e.g., improvements to circuit structures such as diodes, transistors, switches, etc.) or software (improvements to the process flow). However, with the development of technology, many improvements of the current method flows can be regarded as direct improvements of hardware circuit structures. Designers almost always obtain corresponding hardware circuit structures by programming improved method flows into hardware circuits. Therefore, an improvement of a method flow cannot be said to be realized by a hardware entity module. For example, a programmable logic device (Programmable Logic Device, PLD) (e.g., field programmable gate array (Field Programmable Gate Array, FPGA)) is an integrated circuit whose logic function is determined by the programming of the device by a user. A designer programs to "integrate" a digital system onto a PLD without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Moreover, nowadays, instead of manually manufacturing integrated circuit chips, such programming is mostly implemented by using "logic compiler" software, which is similar to the software compiler used in program development and writing, and the original code before the compiling is also written in a specific programming language, which is called hardware description language (Hardware Description Language, HDL), but not just one of the hdds, but a plurality of kinds, such as ABEL (Advanced Boolean Expression Language), AHDL (Altera Hardware Description Language), confluence, CUPL (Cornell University Programming Language), HDCal, JHDL (Java Hardware Description Language), lava, lola, myHDL, PALASM, RHDL (Ruby Hardware Description Language), etc., VHDL (Very-High-Speed Integrated Circuit Hardware Description Language) and Verilog are currently most commonly used. It will also be apparent to those skilled in the art that a hardware circuit implementing the logic method flow can be readily obtained by merely slightly programming the method flow into an integrated circuit using several of the hardware description languages described above.

The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer readable medium storing computer readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, application specific integrated circuits (Application Specific Integrated Circuit, ASIC), programmable logic controllers, and embedded microcontrollers, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, atmel AT91SAM, microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic of the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller in a pure computer readable program code, it is well possible to implement the same functionality by logically programming the method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc. Such a controller may thus be regarded as a kind of hardware component, and means for performing various functions included therein may also be regarded as structures within the hardware component. Or even means for achieving the various functions may be regarded as either software modules implementing the methods or structures within hardware components.

The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. One typical implementation is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functions of each element may be implemented in one or more software and/or hardware elements when implemented in the present application.

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.

The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.

The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and changes may be made to the present application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc. which are within the spirit and principles of the present application are intended to be included within the scope of the claims of the present application.

Claims

1. A text recognition method, comprising:

performing text recognition on the object to be recognized and recognizing a text arrangement mode in the object to be recognized, and respectively obtaining text blocks based on the recognized text of each row, wherein the text blocks comprise: performing character recognition on the object to be recognized and recognizing a character arrangement mode in the object to be recognized to obtain recognized characters of each row, and marking the recognized characters of each row as text blocks respectively;

Extracting block characteristics of the text blocks, wherein the block characteristics comprise the row height of the text blocks and the row spacing between two adjacent rows of the text blocks;

judging whether the block characteristics of two adjacent lines of text blocks reach preset characteristic conditions or not, wherein the preset characteristic conditions comprise that the line heights of the two adjacent lines of text blocks are not smaller than the line spacing, and/or the line heights of the two adjacent lines of text blocks are close to or equal to each other; the preset characteristic condition is a characteristic condition which is established by using a training sample and is met by the block characteristics of two adjacent lines of text blocks when the two adjacent lines of text blocks belong to the same text information; the preset characteristic condition is obtained by analyzing or learning a training sample, and is extracted from the training sample, wherein the training sample comprises the steps of acquiring merging information of two adjacent lines of text blocks in the training sample, and the merging information represents whether the two adjacent text blocks are merged or not; marking the block features according to the merging information; modeling a process of judging whether the block characteristics of two adjacent lines of text blocks reach the preset characteristic conditions, training a merging model, and enabling the merging model to adjust and update the preset characteristic conditions based on the block characteristics of the text blocks in a training sample so as to refine the preset characteristic conditions;

Executing operation on the two adjacent lines of text blocks according to the judging result, wherein the operation comprises one of merging and non-merging; and if the judging result is that the preset characteristic condition is met, merging the two adjacent lines of text blocks into a paragraph.

2. The method of claim 1, the object to be identified being an image.

3. The method of claim 1, further comprising, prior to text recognition of the object to be recognized:

detecting service information issued by a user;

and extracting the object to be identified from the service information.

4. The method of claim 1, the block features comprising one or both of a row of each of the text blocks and a row spacing between adjacent two rows of the text blocks.

5. The method of claim 4, wherein determining whether the block features of two adjacent lines of the text block meet a preset feature condition comprises:

and judging that the row height of the text blocks in the two adjacent rows is not smaller than the row spacing.

6. The method of claim 5, determining that the line height of the text blocks in the two adjacent lines is not less than the line spacing, comprising:

7. The method of claim 4, wherein determining whether the block features of two adjacent lines of the text block meet a preset feature condition comprises:

and judging whether the difference of the lines of the text blocks in two adjacent lines is not larger than a preset line difference.

8. The method of claim 4, wherein performing an operation on the text blocks in the two adjacent lines according to the determination result comprises:

and if the judging result is that the preset characteristic condition is met, merging the two adjacent lines of text blocks into a paragraph.

9. A text recognition method, comprising:

performing text recognition on the training sample and recognizing a text arrangement mode in the training sample, and respectively obtaining text blocks based on recognized text of each row, wherein the text blocks comprise: performing character recognition on the object to be recognized and recognizing a character arrangement mode in the object to be recognized to obtain recognized characters of each row, and marking the recognized characters of each row as text blocks respectively;

training a merging model by using block features of the text blocks to determine preset feature conditions in the merging model, so as to judge whether the block features of two adjacent lines of text blocks reach the preset feature conditions when the block features of two adjacent lines of text blocks in an object to be identified are identified, wherein the preset feature conditions comprise that the line heights of the two adjacent lines of text blocks are not smaller than the line spacing, and/or the line heights of the two adjacent lines of text blocks are close to or equal to each other; the preset characteristic condition is obtained by analyzing or learning a training sample, and is extracted from the training sample, wherein the training sample comprises the steps of acquiring merging information of two adjacent lines of text blocks in the training sample, and the merging information represents whether the two adjacent text blocks are merged or not; marking the block features according to the merging information; modeling a process of judging whether the block characteristics of two adjacent lines of text blocks reach the preset characteristic conditions, training a merging model, and enabling the merging model to adjust and update the preset characteristic conditions based on the block characteristics of the text blocks in a training sample so as to refine the preset characteristic conditions;

10. The method of claim 9, further comprising, prior to training a merge model using block features of the text block:

and marking the block characteristics according to the merging information.

11. A text recognition method, comprising:

Processing block characteristics of two adjacent lines of text blocks by utilizing a merging model to obtain a judging result of whether the two adjacent lines of text blocks reach preset characteristic conditions, wherein the preset characteristic conditions comprise that the line heights of the two adjacent lines of text blocks are not smaller than the line spacing, and/or the line heights of the two adjacent lines of text blocks are close to or equal to each other; the preset characteristic condition is obtained by analyzing or learning a training sample, and is extracted from the training sample, wherein the training sample comprises the steps of acquiring merging information of two adjacent lines of text blocks in the training sample, and the merging information represents whether the two adjacent text blocks are merged or not; marking the block features according to the merging information; modeling a process of judging whether the block characteristics of two adjacent lines of text blocks reach the preset characteristic conditions, training a merging model, and enabling the merging model to adjust and update the preset characteristic conditions based on the block characteristics of the text blocks in a training sample so as to refine the preset characteristic conditions; the merging model is obtained by training block features of two adjacent lines of text blocks identified from training samples to determine the preset feature conditions;

12. A text recognition device, comprising:

the character recognition module is used for recognizing characters of an object to be recognized and recognizing a character arrangement mode in the object to be recognized, and respectively obtaining text blocks based on recognized characters of each row, and comprises the following steps: performing character recognition on the object to be recognized and recognizing a character arrangement mode in the object to be recognized to obtain recognized characters of each row, and marking the recognized characters of each row as text blocks respectively;

the extraction module is used for extracting block characteristics of the text blocks, wherein the block characteristics comprise the row height of the text blocks and the row spacing between two adjacent rows of the text blocks;

the judging module is used for judging whether the block characteristics of the adjacent two lines of text blocks reach preset characteristic conditions or not, wherein the preset characteristic conditions comprise that the line heights of the adjacent two lines of text blocks are not smaller than the line spacing, and/or the line heights of the adjacent two lines of text blocks are close to or equal to each other; the preset characteristic condition is a characteristic condition which is established by using a training sample and is met by the block characteristics of two adjacent lines of text blocks when the two adjacent lines of text blocks belong to the same text information; the preset characteristic condition is obtained by analyzing or learning a training sample, and is extracted from the training sample, wherein the training sample comprises the steps of acquiring merging information of two adjacent lines of text blocks in the training sample, and the merging information represents whether the two adjacent text blocks are merged or not; marking the block features according to the merging information; modeling a process of judging whether the block characteristics of two adjacent lines of text blocks reach the preset characteristic conditions, training a merging model, and enabling the merging model to adjust and update the preset characteristic conditions based on the block characteristics of the text blocks in a training sample so as to refine the preset characteristic conditions;

The execution module is used for executing operation on the two adjacent lines of text blocks according to the judging result, wherein the operation comprises one of merging and non-merging; and if the judging result is that the preset characteristic condition is met, merging the two adjacent lines of text blocks into a paragraph.

13. A text recognition device, comprising:

the character recognition module is used for recognizing characters of the training sample and recognizing character arrangement modes in the training sample, and respectively obtaining text blocks based on recognized characters of each row, and comprises the following steps: performing character recognition on the object to be recognized and recognizing a character arrangement mode in the object to be recognized to obtain recognized characters of each row, and marking the recognized characters of each row as text blocks respectively;

the training module is used for training a merging model by utilizing the block characteristics of the text blocks so as to determine preset characteristic conditions in the merging model, so that when the block characteristics of two adjacent lines of text blocks in an object to be identified are identified, whether the block characteristics of the two adjacent lines of text blocks reach the preset characteristic conditions or not is judged, wherein the preset characteristic conditions comprise that the line height of the two adjacent lines of text blocks is not smaller than the line spacing, and/or the line heights of the two adjacent lines of text blocks are close to or equal to each other; the preset characteristic condition is obtained by analyzing or learning a training sample, and is extracted from the training sample, wherein the training sample comprises the steps of acquiring merging information of two adjacent lines of text blocks in the training sample, and the merging information represents whether the two adjacent text blocks are merged or not; marking the block features according to the merging information; modeling a process of judging whether the block characteristics of two adjacent lines of text blocks reach the preset characteristic conditions, training a merging model, and enabling the merging model to adjust and update the preset characteristic conditions based on the block characteristics of the text blocks in a training sample so as to refine the preset characteristic conditions; executing operation on the two adjacent lines of text blocks according to the judging result, wherein the operation comprises one of merging and non-merging; and if the judging result is that the preset characteristic condition is met, merging the two adjacent lines of text blocks into a paragraph.

14. A text recognition device, comprising:

the model processing module is used for processing block characteristics of two adjacent lines of text blocks by utilizing a merging model to obtain a judging result of whether the two adjacent lines of text blocks reach preset characteristic conditions, wherein the preset characteristic conditions comprise that the line heights of the two adjacent lines of text blocks are not smaller than the line spacing, and/or the line heights of the two adjacent lines of text blocks are close to or equal to each other; the preset characteristic condition is obtained by analyzing or learning a training sample, and is extracted from the training sample, wherein the training sample comprises the steps of acquiring merging information of two adjacent lines of text blocks in the training sample, and the merging information represents whether the two adjacent text blocks are merged or not; marking the block features according to the merging information; modeling a process of judging whether the block characteristics of two adjacent lines of text blocks reach the preset characteristic conditions, training a merging model, and enabling the merging model to adjust and update the preset characteristic conditions based on the block characteristics of the text blocks in a training sample so as to refine the preset characteristic conditions; the merging model is obtained by training block features of two adjacent lines of text blocks identified from training samples to determine the preset feature conditions;

15. An electronic device, comprising:

a processor; and

16. An electronic device, comprising:

a processor; and

training a merging model by using block features of the text blocks to determine preset feature conditions in the merging model, so as to judge whether the block features of two adjacent lines of text blocks reach the preset feature conditions when the block features of two adjacent lines of text blocks in an object to be identified are identified, wherein the preset feature conditions comprise that the line heights of the two adjacent lines of text blocks are not smaller than the line spacing, and/or the line heights of the two adjacent lines of text blocks are close to or equal to each other; the preset characteristic condition is obtained by analyzing or learning a training sample, and is extracted from the training sample, wherein the training sample comprises the steps of acquiring merging information of two adjacent lines of text blocks in the training sample, and the merging information represents whether the two adjacent text blocks are merged or not; marking the block features according to the merging information; modeling a process of judging whether the block characteristics of two adjacent lines of text blocks reach the preset characteristic conditions, training a merging model, and enabling the merging model to adjust and update the preset characteristic conditions based on the block characteristics of the text blocks in a training sample so as to refine the preset characteristic conditions; executing operation on the two adjacent lines of text blocks according to the judging result, wherein the operation comprises one of merging and non-merging; and if the judging result is that the preset characteristic condition is met, merging the two adjacent lines of text blocks into a paragraph.

17. An electronic device, comprising:

a processor; and