CN112070076A

CN112070076A - Text paragraph structure reduction method, device, equipment and computer storage medium

Info

Publication number: CN112070076A
Application number: CN202011264865.0A
Authority: CN
Inventors: 高超; 徐国强
Original assignee: OneConnect Financial Technology Co Ltd Shanghai
Current assignee: OneConnect Smart Technology Co Ltd; OneConnect Financial Technology Co Ltd Shanghai
Priority date: 2020-11-13
Filing date: 2020-11-13
Publication date: 2020-12-11
Anticipated expiration: 2040-11-13
Also published as: CN112070076B; WO2022100376A1

Abstract

The invention relates to the technical field of image processing, and discloses a text paragraph structure restoration method, a device, equipment and a computer storage medium, wherein the method comprises the following steps: identifying a target picture, and determining all text boxes and text box positions of all text boxes in the target picture based on the identified identification result; sequencing the text boxes according to the positions of the text boxes, and inputting the text features of the text boxes to a preset deep learning model for training based on the sequencing result of the sequencing; and merging the text boxes based on the training result of the training to obtain all text paragraphs corresponding to the target picture. The method and the device improve the accuracy of text paragraph structure reduction.

Description

Text paragraph structure reduction method, device, equipment and computer storage medium

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to a method, an apparatus, a device, and a computer-readable storage medium for text paragraph structure reduction.

Background

In the process of electronizing a paper document, the document needs to be input and the original format needs to be reserved, and text paragraph information cannot be directly obtained by the existing text line-based detection and identification method. At present, there are two methods, i.e. from top to bottom, that is, firstly, the layout of the whole page is analyzed, paragraphs are segmented, and then the text rows in the paragraph area are detected and identified. The method can not capture the detail characteristics of local characters when the layout analysis is carried out, only uses the picture information without the character content information, and has low accuracy. Or from bottom to top, the text lines are detected first and then combined to obtain the paragraphs. The method mainly combines text boxes to obtain paragraphs through the compatibility of the positions of the text boxes and by using a certain rule or heuristic algorithm, needs to manually extract a large number of features, and is difficult to refer to text content information, so the accuracy is not high.

Disclosure of Invention

The invention mainly aims to provide a text paragraph structure reduction method, a text paragraph structure reduction device, text paragraph structure reduction equipment and a computer storage medium, and aims to solve the technical problem of how to improve the accuracy of text paragraph structure reduction.

In order to achieve the above object, the present invention provides a text paragraph structure restoring method, including:

identifying a target picture, and determining all text boxes and text box positions of all text boxes in the target picture based on the identified identification result;

sequencing the text boxes according to the positions of the text boxes, and inputting the text features of the text boxes to a preset deep learning model for training based on the sequencing result of the sequencing;

and merging the text boxes based on the training result of the training to obtain all text paragraphs corresponding to the target picture.

Optionally, the step of merging the text boxes based on the training result of the training to obtain all text paragraphs corresponding to the target picture includes:

determining text labels corresponding to the text boxes based on the training result of the training, traversing the text labels, and detecting whether traversal label information corresponding to the traversed text labels is a paragraph;

and if the traversal label information is a paragraph, determining that a text box corresponding to the traversed text label is a text paragraph corresponding to the target picture.

Optionally, after the step of detecting whether traversal label information corresponding to a traversed text label is a paragraph, the method includes:

if not, detecting whether the traversal label information is paragraph content;

if the traversal label information is paragraph content, determining whether the label information of a text label before the traversal text label is paragraph starting information;

and if the label information of the previous text label is paragraph starting information, determining the text paragraph corresponding to the target picture based on the traversed text label and the previous text label.

Optionally, the step of determining a text paragraph corresponding to the target picture based on the traversed text label and the previous text label includes:

detecting whether continuous adjacent content tags exist in the text tags or not;

if continuous adjacent content tags exist, determining whether traversed text tags exist in the continuous adjacent content tags;

if the traversed text label does not exist, combining the text box corresponding to the traversed text label and the text box corresponding to the previous text label to obtain a combined text box, and taking the combined text box as a text paragraph corresponding to the target picture.

Optionally, after the step of determining whether there is a traversed text label in the consecutive adjacent content labels, the method includes:

if the traversed text labels exist, combining all text boxes corresponding to the continuous adjacent content labels with the traversed text labels with the text box corresponding to the previous text label to obtain a combined text box, and taking the combined text box as a text paragraph corresponding to the picture.

Optionally, the step of inputting the text features of each text box into a preset deep learning model for training based on the sorted sorting result includes:

and sequentially extracting text features of the text boxes, fusing the text features into sequence features according to the sequencing result, and inputting the sequence features into a preset deep learning model for training.

Optionally, the step of sequentially extracting text features of each text box includes:

and traversing each text box in sequence, extracting the position feature, the language feature and the image feature of the traversed text box, and taking the position feature, the language feature and the image feature as the text feature of the traversed text box.

In addition, to achieve the above object, the present invention further provides a text paragraph structure restoring apparatus, including:

the determining module is used for identifying a target picture and determining all text boxes and text box positions of all the text boxes in the target picture based on the identified identification result;

the input module is used for sequencing the text boxes according to the positions of the text boxes and inputting the text features of the text boxes to a preset deep learning model for training based on the sequencing result of the sequencing;

and the obtaining module is used for carrying out merging processing on each text box based on the training result of the training so as to obtain all text paragraphs corresponding to the target picture.

In addition, in order to achieve the above object, the present invention further provides a text paragraph structure restoring apparatus;

the text paragraph structure restoring apparatus includes: a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein:

the computer program when executed by the processor implements the steps of the text paragraph structure restoring method as described above.

In addition, to achieve the above object, the present invention also provides a computer-readable storage medium;

the computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the steps of the text paragraph structure restoring method as described above.

According to the method, a target picture is identified, and all text boxes and the text box positions of all the text boxes in the target picture are determined based on the identified identification result; sequencing the text boxes according to the positions of the text boxes, and inputting the text features of the text boxes to a preset deep learning model for training based on the sequencing result of the sequencing; and merging the text boxes based on the training result of the training to obtain all text paragraphs corresponding to the target picture. The text boxes and the positions of the text boxes are determined according to the recognition result of the target picture, the text boxes are sequenced according to the positions of the text boxes, the text characteristics of the text boxes are input into a preset deep learning model for training, and the text boxes are combined based on the training result to obtain the text paragraphs, so that the phenomenon that the accuracy of the obtained text paragraphs is low due to manual operation of a user is avoided, and the accuracy of text paragraph structure reduction is improved.

Drawings

Fig. 1 is a schematic structural diagram of a text paragraph structure restoring device of a hardware operating environment according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating a text paragraph structure reduction method according to a first embodiment of the present invention;

FIG. 3 is a functional block diagram of the text paragraph structure recovery apparatus according to the present invention;

fig. 4 is a schematic diagram of text box ordering in the text paragraph structure reduction method according to the present invention.

The objects, features and advantages of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

As shown in fig. 1, fig. 1 is a schematic structural diagram of a text paragraph structure restoring device of a hardware operating environment according to an embodiment of the present invention.

As shown in fig. 1, the text passage structure restoring apparatus may include: a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, a communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a storage device separate from the processor 1001.

Optionally, the text passage structure restoring device may further include a camera, a Radio Frequency (RF) circuit, a sensor, an audio circuit, a WiFi module, and the like. Such as light sensors, motion sensors, and other sensors. Specifically, the light sensor may include an ambient light sensor and a proximity sensor, wherein the ambient light sensor may adjust the brightness of the display screen according to the brightness of ambient light. Of course, the text paragraph structure reduction device may also be configured with other sensors such as a gyroscope, a barometer, a hygrometer, a thermometer, and an infrared sensor, which are not described herein again.

Those skilled in the art will appreciate that the structure of the text paragraph structure reducing device shown in fig. 1 does not constitute a limitation of the text paragraph structure reducing device and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.

As shown in fig. 1, a memory 1005, which is a kind of computer storage medium, may include therein an operating system, a network communication module, a user interface module, and a text paragraph structure restoring program.

In the text paragraph structure restoring apparatus shown in fig. 1, the network interface 1004 is mainly used for connecting to a background server and performing data communication with the background server; the user interface 1003 is mainly used for connecting a client (user side) and performing data communication with the client; the processor 1001 may be configured to call the text paragraph structure restoring program stored in the memory 1005, and execute the text paragraph structure restoring method provided by the embodiment of the present invention.

Referring to fig. 2, the present invention provides a text paragraph structure reduction method, in an embodiment of the text paragraph structure reduction method, the text paragraph structure reduction method includes the following steps:

step S10, identifying a target picture, and determining all text boxes and the text box positions of all the text boxes in the target picture based on the identification result;

in this embodiment, when text paragraphs in a target picture need to be restored, text line information (i.e., text boxes) obtained by detecting and identifying the target picture may be converted into sequence features, the sequence features are input into a preset deep learning model, sequence labeling is performed through the deep learning model, so as to obtain a category of each text box, and the categories are respectively combined to obtain a specific text paragraph.

Therefore, in this embodiment, the target picture may be recognized first, and the Recognition may be performed by using a text Recognition technology, such as an OCR (Optical Character Recognition) technology, to determine whether text content exists in the target picture, and if the text content exists and is distributed at different positions, the positions of the text boxes (such as x1, y1, x2, and y 2) and the text content information in the target picture may be obtained according to the Recognition result of the OCR, or may be performed by using another text Recognition model. And after the target picture is identified through a text identification technology to obtain the text boxes and the text box positions of the text boxes, detecting whether one text box exists or not, if so, directly taking the text box as a text paragraph in the target picture, wherein the text box position is the position of the text paragraph. However, if there are a plurality of detected text boxes, the text paragraphs in the target picture need to be determined according to the text boxes and the corresponding text box positions.

Step S20, sequencing each text box according to the position of each text box, and inputting the text characteristics of each text box into a preset deep learning model for training based on the sequencing result of the sequencing;

after the text boxes and the text box positions of the text boxes are obtained, the text boxes can be numbered according to the text box positions, the numbering sequence can be set according to the requirements of a user, if the numbering sequence can be set from the upper part of the target picture until each text box has the respective number, and the text boxes are sequenced according to the numbers to obtain the sequenced text boxes. For example, as shown in FIG. 4, the text boxes may be numbered and ordered sequentially from top to bottom, from left to right, such as 1-12, according to the text box positions of the text boxes.

After the ordering of each text box is completed and the ordering result of the ordering is obtained, the text features, such as position features, language features, image features and the like, of each text box can be sequentially obtained. The position features may be vertex coordinates of the text box, a center point, a width of the text box, and high features. The language features may be language model features of the text in the text box, such as: word vectors, sentence vectors, text ngram (n-gram) scores, etc. for text. The image features may be features extracted from text regions in the image using a convolutional neural network, or the like. And inputting the acquired text features as sequence features into a deep learning model which is set in advance and trained to acquire training results.

The deep learning model training process may be to collect a large number of document pictures, and use an OCR system to perform character detection and recognition to obtain three types of features, namely, a position feature, a language feature and an image feature, of each text box in the document pictures. And marking three types of characteristics of each text box in the document picture manually, namely marking which part of each text box in the document picture belongs to the text paragraph, and determining the manual marking result of each text box in the document picture. Inputting the three types of characteristics of each text box into the deep learning model for training to obtain a model output result, namely labels (namely labels with text paragraph information) of each text box in the document picture, sequentially traversing the labels of each text box in the document picture, comparing the labels of the text boxes in the traversed document picture with corresponding manual labeling results, if the labels are different, optimizing the deep learning model, for example, optimizing the deep learning model by using a gradient descent method until the optimization is completed, and obtaining the trained deep learning model, namely the preset deep learning model. The deep learning model can be various sequence models, such as a recurrent neural network model, a convolutional neural network model, and the like.

Step S30, merging each text box based on the training result of the training to obtain all text paragraphs corresponding to the target picture.

After the training result is obtained, the label of each text box, such as a BIO label, can be obtained, and the labels corresponding to each text box are combined to obtain all text paragraphs corresponding to the target picture. And the merge operation may be performed according to different B IO tags. For example, if the label of the text box is an O label, it may be determined that the text box is a paragraph, and the position of the text box is a text paragraph range. If the label of the text box is a B label, it may be determined that the coverage of the text paragraph starts from the text box where the B label is located until the last consecutive I label in the sequence ends, and at this time, the paragraph range of the text paragraph may be the text box positions of all text boxes corresponding to the B label and the consecutive I label. And all text paragraph ranges in the target picture can be determined according to the respective BIO tags.

That is, after the training result of the deep learning model is obtained, the text labels, such as a B label, an I label, an O label, and the like, corresponding to each text box can be determined according to the training result. And traversing the text labels corresponding to the text boxes, and detecting label information corresponding to the traversed text labels, namely traversing the label information. And then determining whether the traversal label information is a paragraph, so as to determine whether a text box corresponding to the traversed text label is a text paragraph according to the determination result, such as label information corresponding to an O label. And when the traversal label information is found to be a paragraph through judgment, determining the text paragraph corresponding to the text box text target picture corresponding to the traversal label information.

However, if the traversal tag information is not a paragraph, it is further required to check whether the traversal tag information is paragraph content, such as tag information corresponding to an I tag. When the traversal label information is found to be paragraph content through judgment, it is required to determine whether the label information of a previous label of the traversed text label is paragraph start information, such as label information corresponding to the B label, and if so, the text paragraph of the target picture can be directly determined according to the text box corresponding to the traversed text label and the text box corresponding to the previous text label. When determining a text passage, it is also necessary to detect whether there are consecutive adjacent content tags in each text box, i.e. to determine whether there are consecutive adjacent content tags, such as consecutive adjacent I tags. If the continuous adjacent content tags exist, whether the text tag traversed at the current time exists in the continuous adjacent content tags needs to be judged, if the continuous adjacent content tags do not exist, the text box corresponding to the traversed text tag and the text box corresponding to the previous text tag can be directly merged, and the merged text box is used as the text paragraph corresponding to the target picture.

However, if there is a text label of a traversal, it is necessary to determine all adjacent text labels having the text label of the traversal, and merge all text boxes corresponding to consecutive adjacent content labels having the text label of the traversal. For example, if the traversed text label is I4 and the consecutive adjacent text labels are I1-I5, the consecutive adjacent text labels include the traversed text label I4, at this time, the text boxes corresponding to all the consecutive adjacent text labels may be merged, that is, the text boxes corresponding to I1, I2, I3, I4, and I5 are merged to obtain a merged text box, and then the merged text box is merged with the text box corresponding to the previous text label, that is, the text box corresponding to the B label is merged to obtain a merged text box, and at this time, the merged text box may be directly used as a text paragraph corresponding to the picture.

In the embodiment, the target picture is detected, the text features of each text box and each text box are determined, the sequence features are formed according to each text feature and input into the deep learning model for training and prediction, so that the manual design rule can be avoided, various complex structures can be supported, and higher accuracy can be obtained. And scattered text line information in a text result can be converted into text paragraphs, so that the efficiency of document entry and electronization of paper texts is improved, and the development of informatization and digitization of various industries is promoted better.

In the embodiment, by identifying a target picture, determining all text boxes and text box positions of the text boxes in the target picture based on the identification result; sequencing the text boxes according to the positions of the text boxes, and inputting the text features of the text boxes to a preset deep learning model for training based on the sequencing result of the sequencing; and merging the text boxes based on the training result of the training to obtain all text paragraphs corresponding to the target picture. The text boxes and the positions of the text boxes are determined according to the recognition result of the target picture, the text boxes are sequenced according to the positions of the text boxes, the text characteristics of the text boxes are input into a preset deep learning model for training, and the text boxes are combined based on the training result to obtain the text paragraphs, so that the phenomenon that the accuracy of the obtained text paragraphs is low due to manual operation of a user is avoided, and the accuracy of text paragraph structure reduction is improved.

Further, on the basis of the first embodiment of the present invention, a second embodiment of the text paragraph structure restoring method according to the present invention is provided, where this embodiment is step S10 of the first embodiment of the present invention, and the step of merging each text box based on the training result of the training to obtain all text paragraphs corresponding to the target picture includes:

step a, determining text labels corresponding to the text boxes based on the training result of the training, traversing the text labels, and detecting whether traversal label information corresponding to the traversed text labels is a paragraph;

in this embodiment, after the training result of the deep learning model is obtained, text labels, such as a B label, an I label, an O label, and the like, corresponding to each text box may be determined according to the training result. And traversing the text labels corresponding to the text boxes, and detecting label information corresponding to the traversed text labels, namely traversing the label information. And then determining whether the traversal label information is a paragraph, so as to determine whether a text box corresponding to the traversed text label is a text paragraph according to the determination result, such as label information corresponding to an O label.

And b, if the traversal label information is a paragraph, determining that a text box corresponding to the traversed text label is the text paragraph corresponding to the target picture.

And when the traversal label information is found to be a paragraph through judgment, determining the text paragraph corresponding to the text box text target picture corresponding to the traversal label information. In this embodiment, all the label information may be detected in the same manner until all the text paragraphs are determined.

In this embodiment, the text labels corresponding to the text boxes are determined according to the training result, and the text labels are traversed, so that when the traversal label information of the traversed text labels is a paragraph, the text boxes corresponding to the traversed text labels are used as the text paragraphs corresponding to the target picture, thereby ensuring the accuracy of the obtained text paragraphs.

Further, after the step of detecting whether traversal label information corresponding to the traversed text label is a paragraph, the method includes:

step c, if not, detecting whether the traversal label information is paragraph content;

if the traversal label information is found not to be a paragraph by the judgment, it is further required to check whether the traversal label information is paragraph content, such as label information corresponding to the I label, and execute different operations according to different detection results.

Step d, if the traversal label information is paragraph content, determining whether the label information of a text label before the traversal text label is paragraph initial information;

when the traversal label information is found to be paragraph content by judgment, it is further required to determine whether label information of a previous label of the traversed text label is paragraph start information, such as label information corresponding to the B label, so as to determine a start position of a paragraph where a text frame corresponding to the traversal label information is located, and perform different operations according to different detection results.

And e, if the label information of the previous text label is paragraph starting information, determining the text paragraph corresponding to the target picture based on the traversed text label and the previous text label.

And when the label information of the previous text label is found not to be paragraph starting information but to be paragraph content through judgment, determining that all continuous adjacent label information are continuous adjacent content text labels of the paragraph content, determining a target continuous adjacent content text label with a traversed text label from the continuous adjacent content text labels, and combining all text frames corresponding to the target continuous adjacent content text label and a text frame corresponding to the previous text label of the target continuous adjacent content text label to be used as a text paragraph in the target picture.

If the label information of the previous text label is paragraph start information, it is required to determine whether the label information of the next text label of the traversed text label is paragraph content, and if not, the text box corresponding to the traversed text label and the text box corresponding to the previous text label can be directly merged together to be used as a text paragraph in the target picture. If the label information of the next text label is the paragraph content, the same detection operation needs to be continuously performed on the next text label of the next text label until the label information is not the paragraph content, and at this time, all adjacent text frames, which contain the labels of the traversed text labels and have label information as the paragraph content, are merged and serve as one text paragraph in the target picture.

In this embodiment, when it is determined that the traversal tag information is paragraph content and the tag information of the previous text tag of the traversed text tag is paragraph start information, the text paragraph is determined according to the traversed text tag and the previous text tag, so that the accuracy of the obtained text paragraph is ensured.

Specifically, the step of determining the text paragraph corresponding to the target picture based on the traversed text label and the previous text label includes:

step f, detecting whether continuous adjacent content tags exist in the text tags or not;

in determining the text passage, it is also necessary to detect whether there are consecutive adjacent content tags in each text box, i.e., to determine whether there are consecutive adjacent content tags, such as consecutive adjacent I tags. And different operations are executed according to different detection results. If the label information of the text label is paragraph content and the label information of the text label adjacent to the text label is also paragraph content, the text label and the text label adjacent to the text label are both regarded as continuous adjacent content labels.

Step g, if continuous adjacent content tags exist, determining whether traversed text tags exist in the continuous adjacent content tags;

when it is found that there are a plurality of continuous adjacent content tags and a plurality of different continuous adjacent content tags, it is necessary to continuously determine whether there is a traversed text tag in each continuous adjacent content tag, and perform different operations according to different determination results. And if no continuous adjacent content label exists, directly combining the text box corresponding to the traversed text label with the text box corresponding to the previous text label, and taking the combined result as the text paragraph of the target picture.

And h, if the traversed text label does not exist, combining the text box corresponding to the traversed text label and the text box corresponding to the previous text label to obtain a combined text box, and taking the combined text box as a text paragraph corresponding to the target picture.

If the traversed text label does not exist, combining the text box corresponding to the traversed text label and the text box corresponding to the previous text label directly, and taking the combined text box as the text paragraph corresponding to the target picture.

In this example, when it is determined that consecutive adjacent content tags exist in each text tag and a traversed text tag does not exist in the consecutive adjacent content tags, a text box corresponding to the traversed text tag and a text box corresponding to a previous text tag are merged, and the merged text box is used as a text paragraph, so that the accuracy of the obtained text paragraph is ensured.

Further, the step of determining whether there is a traversed text label in the consecutive adjacent content labels, comprises:

and k, if the traversed text labels exist, combining all text boxes corresponding to the continuous adjacent content labels with the traversed text labels with the text box corresponding to the previous text label to obtain a combined text box, and taking the combined text box as a text paragraph corresponding to the picture.

And when the traversed text labels are found through judgment, all the adjacent text labels with the traversed text labels need to be determined, and all the text boxes corresponding to the continuous adjacent content labels with the traversed text labels are combined. For example, if the traversed text label is I4 and the consecutive adjacent text labels are I1-I5, the consecutive adjacent text labels include the traversed text label I4, at this time, the text boxes corresponding to all the consecutive adjacent text labels may be merged, that is, the text boxes corresponding to I1, I2, I3, I4, and I5 are merged to obtain a merged text box, and then the merged text box is merged with the text box corresponding to the previous text label, that is, the text box corresponding to the B label is merged to obtain a merged text box, and at this time, the merged text box may be directly used as a text paragraph corresponding to the picture.

In this embodiment, when it is determined that the traversed text labels exist in the consecutive adjacent content labels, all text boxes corresponding to the consecutive adjacent content labels having the traversed text labels and a text box corresponding to a previous text label are merged, and the merged text box is used as a text paragraph, so that the accuracy of the obtained text paragraph is ensured.

Further, the step of inputting the text features of each text box into a preset deep learning model for training based on the sorted sorting result includes:

and step m, sequentially extracting text features of the text boxes, fusing the text features into sequence features according to the sequencing result, and inputting the sequence features into a preset deep learning model for training.

In this embodiment, after obtaining each text box and sorting each text box, text features, such as a position feature, a language feature and an image feature, in each text box may be sequentially extracted, and after extracting each text feature, how each text feature is a sequence feature may be determined according to a sorting result of each text box, and the sequence feature is used as an input in a preset deep learning model, and an output result, that is, a training result, is obtained by training in the deep learning model in which an input value is preset.

In the embodiment, the text features of the text boxes are sequentially extracted, the text features are fused, and the fused sequence features are input into the preset deep learning model for training, so that the effective training is guaranteed.

Further, the step of sequentially extracting the text features of each text box includes:

and n, sequentially traversing each text box, extracting the position feature, the language feature and the image feature of the traversed text box, and taking the position feature, the language feature and the image feature as the text features of the traversed text box.

In this embodiment, when extracting the text features of all the text boxes, the text boxes may be sequentially traversed, the position features, the language features, and the image features of the traversed text boxes are extracted, and then the position features, the language features, and the image features are used as the text features of the traversed text boxes. I.e. the same extraction is applied to all text boxes. The position features may be vertex coordinates of the text box, a center point, a width of the text box, and high features. The language features may be language model features of the text in the text box, such as: word vectors, sentence vectors, text ngram scores, etc. for text. The image features may be features extracted from text regions in the image using a convolutional neural network, or the like.

In this embodiment, the position feature, the language feature and the image feature of the traversed text box are extracted and used as the text feature of the traversed text box, so that the effectiveness of the acquired text feature is guaranteed.

In addition, referring to fig. 3, an embodiment of the present invention further provides a text paragraph structure restoring apparatus, where the text paragraph structure restoring apparatus includes:

a determining module a10, configured to identify a target picture, and determine, based on an identification result of the identification, all text boxes and text box positions of the text boxes in the target picture;

the input module A20 is used for sorting the text boxes according to the positions of the text boxes and inputting the text features of the text boxes to a preset deep learning model for training based on the sorting result of the sorting;

an obtaining module a30, configured to perform merging processing on each text box based on the training result of the training to obtain all text paragraphs corresponding to the target picture.

Further, the obtaining module a30 is further configured to:

if not, detecting whether the traversal label information is paragraph content;

Further, the obtaining module a30 is further configured to:

Further, the input module a20 is further configured to:

The steps implemented by each functional module of the text paragraph structure restoring device may refer to each embodiment of the text paragraph structure restoring method of the present invention, and are not described herein again.

The present invention also provides a text paragraph structure restoring apparatus, including: the text paragraph structure restoring program comprises a memory, a processor and a text paragraph structure restoring program stored on the memory; the processor is configured to execute the text paragraph structure restoring program to implement the following steps:

The present invention also provides a computer-readable storage medium storing one or more programs, which are further executable by one or more processors for implementing the steps of the embodiments of the text paragraph structure restoring method described above.

The specific implementation manner of the computer-readable storage medium of the present invention is substantially the same as that of each embodiment of the text paragraph structure restoring method described above, and is not described herein again.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A text paragraph structure reduction method is characterized by comprising the following steps:

2. The method for restoring a text paragraph structure according to claim 1, wherein the step of merging each text box based on the training result of the training to obtain all text paragraphs corresponding to the target picture comprises:

3. The method for text paragraph structure recovery according to claim 2, wherein the step of detecting whether the traversal label information corresponding to the traversed text label is a paragraph comprises:

if not, detecting whether the traversal label information is paragraph content;

4. The method of claim 3, wherein the step of determining the text paragraph corresponding to the target picture based on the traversed text label and the previous text label comprises:

5. The text paragraph structure reduction method of claim 4 wherein the step of determining whether there is a traversed text label in the consecutive adjacent content labels is followed by:

6. The method for text paragraph structure reduction according to any one of claims 1-5, wherein the step of inputting the text features of each text box into a preset deep learning model for training based on the sorted result comprises:

7. The method for restoring a text paragraph structure as claimed in claim 6, wherein the step of sequentially extracting the text features of each text box comprises:

8. A text paragraph structure restoring apparatus, comprising:

9. A text paragraph structure restoring apparatus, characterized in that the text paragraph structure restoring apparatus comprises: a memory, a processor and a text paragraph structure reduction program stored on the memory and executable on the processor, the text paragraph structure reduction program when executed by the processor implementing the steps of the text paragraph structure reduction method as claimed in any one of claims 1 to 7.

10. A computer storage medium having stored thereon a text paragraph structure restoring program that, when executed by a processor, performs the steps of the text paragraph structure restoring method according to any one of claims 1 to 7.