CN112070076A - Text paragraph structure reduction method, device, equipment and computer storage medium - Google Patents

Text paragraph structure reduction method, device, equipment and computer storage medium Download PDF

Info

Publication number
CN112070076A
CN112070076A CN202011264865.0A CN202011264865A CN112070076A CN 112070076 A CN112070076 A CN 112070076A CN 202011264865 A CN202011264865 A CN 202011264865A CN 112070076 A CN112070076 A CN 112070076A
Authority
CN
China
Prior art keywords
text
paragraph
label
boxes
traversed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011264865.0A
Other languages
Chinese (zh)
Other versions
CN112070076B (en
Inventor
高超
徐国强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
OneConnect Smart Technology Co Ltd
OneConnect Financial Technology Co Ltd Shanghai
Original Assignee
OneConnect Financial Technology Co Ltd Shanghai
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by OneConnect Financial Technology Co Ltd Shanghai filed Critical OneConnect Financial Technology Co Ltd Shanghai
Priority to CN202011264865.0A priority Critical patent/CN112070076B/en
Publication of CN112070076A publication Critical patent/CN112070076A/en
Application granted granted Critical
Publication of CN112070076B publication Critical patent/CN112070076B/en
Priority to PCT/CN2021/124605 priority patent/WO2022100376A1/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/412Layout analysis of documents structured with printed lines or input boxes, e.g. business forms or tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Analysis (AREA)
  • Character Input (AREA)

Abstract

The invention relates to the technical field of image processing, and discloses a text paragraph structure restoration method, a device, equipment and a computer storage medium, wherein the method comprises the following steps: identifying a target picture, and determining all text boxes and text box positions of all text boxes in the target picture based on the identified identification result; sequencing the text boxes according to the positions of the text boxes, and inputting the text features of the text boxes to a preset deep learning model for training based on the sequencing result of the sequencing; and merging the text boxes based on the training result of the training to obtain all text paragraphs corresponding to the target picture. The method and the device improve the accuracy of text paragraph structure reduction.

Description

Text paragraph structure reduction method, device, equipment and computer storage medium
Technical Field
The present invention relates to the field of image processing technologies, and in particular, to a method, an apparatus, a device, and a computer-readable storage medium for text paragraph structure reduction.
Background
In the process of electronizing a paper document, the document needs to be input and the original format needs to be reserved, and text paragraph information cannot be directly obtained by the existing text line-based detection and identification method. At present, there are two methods, i.e. from top to bottom, that is, firstly, the layout of the whole page is analyzed, paragraphs are segmented, and then the text rows in the paragraph area are detected and identified. The method can not capture the detail characteristics of local characters when the layout analysis is carried out, only uses the picture information without the character content information, and has low accuracy. Or from bottom to top, the text lines are detected first and then combined to obtain the paragraphs. The method mainly combines text boxes to obtain paragraphs through the compatibility of the positions of the text boxes and by using a certain rule or heuristic algorithm, needs to manually extract a large number of features, and is difficult to refer to text content information, so the accuracy is not high.
Disclosure of Invention
The invention mainly aims to provide a text paragraph structure reduction method, a text paragraph structure reduction device, text paragraph structure reduction equipment and a computer storage medium, and aims to solve the technical problem of how to improve the accuracy of text paragraph structure reduction.
In order to achieve the above object, the present invention provides a text paragraph structure restoring method, including:
identifying a target picture, and determining all text boxes and text box positions of all text boxes in the target picture based on the identified identification result;
sequencing the text boxes according to the positions of the text boxes, and inputting the text features of the text boxes to a preset deep learning model for training based on the sequencing result of the sequencing;
and merging the text boxes based on the training result of the training to obtain all text paragraphs corresponding to the target picture.
Optionally, the step of merging the text boxes based on the training result of the training to obtain all text paragraphs corresponding to the target picture includes:
determining text labels corresponding to the text boxes based on the training result of the training, traversing the text labels, and detecting whether traversal label information corresponding to the traversed text labels is a paragraph;
and if the traversal label information is a paragraph, determining that a text box corresponding to the traversed text label is a text paragraph corresponding to the target picture.
Optionally, after the step of detecting whether traversal label information corresponding to a traversed text label is a paragraph, the method includes:
if not, detecting whether the traversal label information is paragraph content;
if the traversal label information is paragraph content, determining whether the label information of a text label before the traversal text label is paragraph starting information;
and if the label information of the previous text label is paragraph starting information, determining the text paragraph corresponding to the target picture based on the traversed text label and the previous text label.
Optionally, the step of determining a text paragraph corresponding to the target picture based on the traversed text label and the previous text label includes:
detecting whether continuous adjacent content tags exist in the text tags or not;
if continuous adjacent content tags exist, determining whether traversed text tags exist in the continuous adjacent content tags;
if the traversed text label does not exist, combining the text box corresponding to the traversed text label and the text box corresponding to the previous text label to obtain a combined text box, and taking the combined text box as a text paragraph corresponding to the target picture.
Optionally, after the step of determining whether there is a traversed text label in the consecutive adjacent content labels, the method includes:
if the traversed text labels exist, combining all text boxes corresponding to the continuous adjacent content labels with the traversed text labels with the text box corresponding to the previous text label to obtain a combined text box, and taking the combined text box as a text paragraph corresponding to the picture.
Optionally, the step of inputting the text features of each text box into a preset deep learning model for training based on the sorted sorting result includes:
and sequentially extracting text features of the text boxes, fusing the text features into sequence features according to the sequencing result, and inputting the sequence features into a preset deep learning model for training.
Optionally, the step of sequentially extracting text features of each text box includes:
and traversing each text box in sequence, extracting the position feature, the language feature and the image feature of the traversed text box, and taking the position feature, the language feature and the image feature as the text feature of the traversed text box.
In addition, to achieve the above object, the present invention further provides a text paragraph structure restoring apparatus, including:
the determining module is used for identifying a target picture and determining all text boxes and text box positions of all the text boxes in the target picture based on the identified identification result;
the input module is used for sequencing the text boxes according to the positions of the text boxes and inputting the text features of the text boxes to a preset deep learning model for training based on the sequencing result of the sequencing;
and the obtaining module is used for carrying out merging processing on each text box based on the training result of the training so as to obtain all text paragraphs corresponding to the target picture.
In addition, in order to achieve the above object, the present invention further provides a text paragraph structure restoring apparatus;
the text paragraph structure restoring apparatus includes: a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein:
the computer program when executed by the processor implements the steps of the text paragraph structure restoring method as described above.
In addition, to achieve the above object, the present invention also provides a computer-readable storage medium;
the computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the steps of the text paragraph structure restoring method as described above.
According to the method, a target picture is identified, and all text boxes and the text box positions of all the text boxes in the target picture are determined based on the identified identification result; sequencing the text boxes according to the positions of the text boxes, and inputting the text features of the text boxes to a preset deep learning model for training based on the sequencing result of the sequencing; and merging the text boxes based on the training result of the training to obtain all text paragraphs corresponding to the target picture. The text boxes and the positions of the text boxes are determined according to the recognition result of the target picture, the text boxes are sequenced according to the positions of the text boxes, the text characteristics of the text boxes are input into a preset deep learning model for training, and the text boxes are combined based on the training result to obtain the text paragraphs, so that the phenomenon that the accuracy of the obtained text paragraphs is low due to manual operation of a user is avoided, and the accuracy of text paragraph structure reduction is improved.
Drawings
Fig. 1 is a schematic structural diagram of a text paragraph structure restoring device of a hardware operating environment according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating a text paragraph structure reduction method according to a first embodiment of the present invention;
FIG. 3 is a functional block diagram of the text paragraph structure recovery apparatus according to the present invention;
fig. 4 is a schematic diagram of text box ordering in the text paragraph structure reduction method according to the present invention.
The objects, features and advantages of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
As shown in fig. 1, fig. 1 is a schematic structural diagram of a text paragraph structure restoring device of a hardware operating environment according to an embodiment of the present invention.
As shown in fig. 1, the text passage structure restoring apparatus may include: a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, a communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a storage device separate from the processor 1001.
Optionally, the text passage structure restoring device may further include a camera, a Radio Frequency (RF) circuit, a sensor, an audio circuit, a WiFi module, and the like. Such as light sensors, motion sensors, and other sensors. Specifically, the light sensor may include an ambient light sensor and a proximity sensor, wherein the ambient light sensor may adjust the brightness of the display screen according to the brightness of ambient light. Of course, the text paragraph structure reduction device may also be configured with other sensors such as a gyroscope, a barometer, a hygrometer, a thermometer, and an infrared sensor, which are not described herein again.
Those skilled in the art will appreciate that the structure of the text paragraph structure reducing device shown in fig. 1 does not constitute a limitation of the text paragraph structure reducing device and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.
As shown in fig. 1, a memory 1005, which is a kind of computer storage medium, may include therein an operating system, a network communication module, a user interface module, and a text paragraph structure restoring program.
In the text paragraph structure restoring apparatus shown in fig. 1, the network interface 1004 is mainly used for connecting to a background server and performing data communication with the background server; the user interface 1003 is mainly used for connecting a client (user side) and performing data communication with the client; the processor 1001 may be configured to call the text paragraph structure restoring program stored in the memory 1005, and execute the text paragraph structure restoring method provided by the embodiment of the present invention.
Referring to fig. 2, the present invention provides a text paragraph structure reduction method, in an embodiment of the text paragraph structure reduction method, the text paragraph structure reduction method includes the following steps:
step S10, identifying a target picture, and determining all text boxes and the text box positions of all the text boxes in the target picture based on the identification result;
in this embodiment, when text paragraphs in a target picture need to be restored, text line information (i.e., text boxes) obtained by detecting and identifying the target picture may be converted into sequence features, the sequence features are input into a preset deep learning model, sequence labeling is performed through the deep learning model, so as to obtain a category of each text box, and the categories are respectively combined to obtain a specific text paragraph.
Therefore, in this embodiment, the target picture may be recognized first, and the Recognition may be performed by using a text Recognition technology, such as an OCR (Optical Character Recognition) technology, to determine whether text content exists in the target picture, and if the text content exists and is distributed at different positions, the positions of the text boxes (such as x1, y1, x2, and y 2) and the text content information in the target picture may be obtained according to the Recognition result of the OCR, or may be performed by using another text Recognition model. And after the target picture is identified through a text identification technology to obtain the text boxes and the text box positions of the text boxes, detecting whether one text box exists or not, if so, directly taking the text box as a text paragraph in the target picture, wherein the text box position is the position of the text paragraph. However, if there are a plurality of detected text boxes, the text paragraphs in the target picture need to be determined according to the text boxes and the corresponding text box positions.
Step S20, sequencing each text box according to the position of each text box, and inputting the text characteristics of each text box into a preset deep learning model for training based on the sequencing result of the sequencing;
after the text boxes and the text box positions of the text boxes are obtained, the text boxes can be numbered according to the text box positions, the numbering sequence can be set according to the requirements of a user, if the numbering sequence can be set from the upper part of the target picture until each text box has the respective number, and the text boxes are sequenced according to the numbers to obtain the sequenced text boxes. For example, as shown in FIG. 4, the text boxes may be numbered and ordered sequentially from top to bottom, from left to right, such as 1-12, according to the text box positions of the text boxes.
After the ordering of each text box is completed and the ordering result of the ordering is obtained, the text features, such as position features, language features, image features and the like, of each text box can be sequentially obtained. The position features may be vertex coordinates of the text box, a center point, a width of the text box, and high features. The language features may be language model features of the text in the text box, such as: word vectors, sentence vectors, text ngram (n-gram) scores, etc. for text. The image features may be features extracted from text regions in the image using a convolutional neural network, or the like. And inputting the acquired text features as sequence features into a deep learning model which is set in advance and trained to acquire training results.
The deep learning model training process may be to collect a large number of document pictures, and use an OCR system to perform character detection and recognition to obtain three types of features, namely, a position feature, a language feature and an image feature, of each text box in the document pictures. And marking three types of characteristics of each text box in the document picture manually, namely marking which part of each text box in the document picture belongs to the text paragraph, and determining the manual marking result of each text box in the document picture. Inputting the three types of characteristics of each text box into the deep learning model for training to obtain a model output result, namely labels (namely labels with text paragraph information) of each text box in the document picture, sequentially traversing the labels of each text box in the document picture, comparing the labels of the text boxes in the traversed document picture with corresponding manual labeling results, if the labels are different, optimizing the deep learning model, for example, optimizing the deep learning model by using a gradient descent method until the optimization is completed, and obtaining the trained deep learning model, namely the preset deep learning model. The deep learning model can be various sequence models, such as a recurrent neural network model, a convolutional neural network model, and the like.
Step S30, merging each text box based on the training result of the training to obtain all text paragraphs corresponding to the target picture.
After the training result is obtained, the label of each text box, such as a BIO label, can be obtained, and the labels corresponding to each text box are combined to obtain all text paragraphs corresponding to the target picture. And the merge operation may be performed according to different B IO tags. For example, if the label of the text box is an O label, it may be determined that the text box is a paragraph, and the position of the text box is a text paragraph range. If the label of the text box is a B label, it may be determined that the coverage of the text paragraph starts from the text box where the B label is located until the last consecutive I label in the sequence ends, and at this time, the paragraph range of the text paragraph may be the text box positions of all text boxes corresponding to the B label and the consecutive I label. And all text paragraph ranges in the target picture can be determined according to the respective BIO tags.
That is, after the training result of the deep learning model is obtained, the text labels, such as a B label, an I label, an O label, and the like, corresponding to each text box can be determined according to the training result. And traversing the text labels corresponding to the text boxes, and detecting label information corresponding to the traversed text labels, namely traversing the label information. And then determining whether the traversal label information is a paragraph, so as to determine whether a text box corresponding to the traversed text label is a text paragraph according to the determination result, such as label information corresponding to an O label. And when the traversal label information is found to be a paragraph through judgment, determining the text paragraph corresponding to the text box text target picture corresponding to the traversal label information.
However, if the traversal tag information is not a paragraph, it is further required to check whether the traversal tag information is paragraph content, such as tag information corresponding to an I tag. When the traversal label information is found to be paragraph content through judgment, it is required to determine whether the label information of a previous label of the traversed text label is paragraph start information, such as label information corresponding to the B label, and if so, the text paragraph of the target picture can be directly determined according to the text box corresponding to the traversed text label and the text box corresponding to the previous text label. When determining a text passage, it is also necessary to detect whether there are consecutive adjacent content tags in each text box, i.e. to determine whether there are consecutive adjacent content tags, such as consecutive adjacent I tags. If the continuous adjacent content tags exist, whether the text tag traversed at the current time exists in the continuous adjacent content tags needs to be judged, if the continuous adjacent content tags do not exist, the text box corresponding to the traversed text tag and the text box corresponding to the previous text tag can be directly merged, and the merged text box is used as the text paragraph corresponding to the target picture.
However, if there is a text label of a traversal, it is necessary to determine all adjacent text labels having the text label of the traversal, and merge all text boxes corresponding to consecutive adjacent content labels having the text label of the traversal. For example, if the traversed text label is I4 and the consecutive adjacent text labels are I1-I5, the consecutive adjacent text labels include the traversed text label I4, at this time, the text boxes corresponding to all the consecutive adjacent text labels may be merged, that is, the text boxes corresponding to I1, I2, I3, I4, and I5 are merged to obtain a merged text box, and then the merged text box is merged with the text box corresponding to the previous text label, that is, the text box corresponding to the B label is merged to obtain a merged text box, and at this time, the merged text box may be directly used as a text paragraph corresponding to the picture.
In the embodiment, the target picture is detected, the text features of each text box and each text box are determined, the sequence features are formed according to each text feature and input into the deep learning model for training and prediction, so that the manual design rule can be avoided, various complex structures can be supported, and higher accuracy can be obtained. And scattered text line information in a text result can be converted into text paragraphs, so that the efficiency of document entry and electronization of paper texts is improved, and the development of informatization and digitization of various industries is promoted better.
In the embodiment, by identifying a target picture, determining all text boxes and text box positions of the text boxes in the target picture based on the identification result; sequencing the text boxes according to the positions of the text boxes, and inputting the text features of the text boxes to a preset deep learning model for training based on the sequencing result of the sequencing; and merging the text boxes based on the training result of the training to obtain all text paragraphs corresponding to the target picture. The text boxes and the positions of the text boxes are determined according to the recognition result of the target picture, the text boxes are sequenced according to the positions of the text boxes, the text characteristics of the text boxes are input into a preset deep learning model for training, and the text boxes are combined based on the training result to obtain the text paragraphs, so that the phenomenon that the accuracy of the obtained text paragraphs is low due to manual operation of a user is avoided, and the accuracy of text paragraph structure reduction is improved.
Further, on the basis of the first embodiment of the present invention, a second embodiment of the text paragraph structure restoring method according to the present invention is provided, where this embodiment is step S10 of the first embodiment of the present invention, and the step of merging each text box based on the training result of the training to obtain all text paragraphs corresponding to the target picture includes:
step a, determining text labels corresponding to the text boxes based on the training result of the training, traversing the text labels, and detecting whether traversal label information corresponding to the traversed text labels is a paragraph;
in this embodiment, after the training result of the deep learning model is obtained, text labels, such as a B label, an I label, an O label, and the like, corresponding to each text box may be determined according to the training result. And traversing the text labels corresponding to the text boxes, and detecting label information corresponding to the traversed text labels, namely traversing the label information. And then determining whether the traversal label information is a paragraph, so as to determine whether a text box corresponding to the traversed text label is a text paragraph according to the determination result, such as label information corresponding to an O label.
And b, if the traversal label information is a paragraph, determining that a text box corresponding to the traversed text label is the text paragraph corresponding to the target picture.
And when the traversal label information is found to be a paragraph through judgment, determining the text paragraph corresponding to the text box text target picture corresponding to the traversal label information. In this embodiment, all the label information may be detected in the same manner until all the text paragraphs are determined.
In this embodiment, the text labels corresponding to the text boxes are determined according to the training result, and the text labels are traversed, so that when the traversal label information of the traversed text labels is a paragraph, the text boxes corresponding to the traversed text labels are used as the text paragraphs corresponding to the target picture, thereby ensuring the accuracy of the obtained text paragraphs.
Further, after the step of detecting whether traversal label information corresponding to the traversed text label is a paragraph, the method includes:
step c, if not, detecting whether the traversal label information is paragraph content;
if the traversal label information is found not to be a paragraph by the judgment, it is further required to check whether the traversal label information is paragraph content, such as label information corresponding to the I label, and execute different operations according to different detection results.
Step d, if the traversal label information is paragraph content, determining whether the label information of a text label before the traversal text label is paragraph initial information;
when the traversal label information is found to be paragraph content by judgment, it is further required to determine whether label information of a previous label of the traversed text label is paragraph start information, such as label information corresponding to the B label, so as to determine a start position of a paragraph where a text frame corresponding to the traversal label information is located, and perform different operations according to different detection results.
And e, if the label information of the previous text label is paragraph starting information, determining the text paragraph corresponding to the target picture based on the traversed text label and the previous text label.
And when the label information of the previous text label is found not to be paragraph starting information but to be paragraph content through judgment, determining that all continuous adjacent label information are continuous adjacent content text labels of the paragraph content, determining a target continuous adjacent content text label with a traversed text label from the continuous adjacent content text labels, and combining all text frames corresponding to the target continuous adjacent content text label and a text frame corresponding to the previous text label of the target continuous adjacent content text label to be used as a text paragraph in the target picture.
If the label information of the previous text label is paragraph start information, it is required to determine whether the label information of the next text label of the traversed text label is paragraph content, and if not, the text box corresponding to the traversed text label and the text box corresponding to the previous text label can be directly merged together to be used as a text paragraph in the target picture. If the label information of the next text label is the paragraph content, the same detection operation needs to be continuously performed on the next text label of the next text label until the label information is not the paragraph content, and at this time, all adjacent text frames, which contain the labels of the traversed text labels and have label information as the paragraph content, are merged and serve as one text paragraph in the target picture.
In this embodiment, when it is determined that the traversal tag information is paragraph content and the tag information of the previous text tag of the traversed text tag is paragraph start information, the text paragraph is determined according to the traversed text tag and the previous text tag, so that the accuracy of the obtained text paragraph is ensured.
Specifically, the step of determining the text paragraph corresponding to the target picture based on the traversed text label and the previous text label includes:
step f, detecting whether continuous adjacent content tags exist in the text tags or not;
in determining the text passage, it is also necessary to detect whether there are consecutive adjacent content tags in each text box, i.e., to determine whether there are consecutive adjacent content tags, such as consecutive adjacent I tags. And different operations are executed according to different detection results. If the label information of the text label is paragraph content and the label information of the text label adjacent to the text label is also paragraph content, the text label and the text label adjacent to the text label are both regarded as continuous adjacent content labels.
Step g, if continuous adjacent content tags exist, determining whether traversed text tags exist in the continuous adjacent content tags;
when it is found that there are a plurality of continuous adjacent content tags and a plurality of different continuous adjacent content tags, it is necessary to continuously determine whether there is a traversed text tag in each continuous adjacent content tag, and perform different operations according to different determination results. And if no continuous adjacent content label exists, directly combining the text box corresponding to the traversed text label with the text box corresponding to the previous text label, and taking the combined result as the text paragraph of the target picture.
And h, if the traversed text label does not exist, combining the text box corresponding to the traversed text label and the text box corresponding to the previous text label to obtain a combined text box, and taking the combined text box as a text paragraph corresponding to the target picture.
If the traversed text label does not exist, combining the text box corresponding to the traversed text label and the text box corresponding to the previous text label directly, and taking the combined text box as the text paragraph corresponding to the target picture.
In this example, when it is determined that consecutive adjacent content tags exist in each text tag and a traversed text tag does not exist in the consecutive adjacent content tags, a text box corresponding to the traversed text tag and a text box corresponding to a previous text tag are merged, and the merged text box is used as a text paragraph, so that the accuracy of the obtained text paragraph is ensured.
Further, the step of determining whether there is a traversed text label in the consecutive adjacent content labels, comprises:
and k, if the traversed text labels exist, combining all text boxes corresponding to the continuous adjacent content labels with the traversed text labels with the text box corresponding to the previous text label to obtain a combined text box, and taking the combined text box as a text paragraph corresponding to the picture.
And when the traversed text labels are found through judgment, all the adjacent text labels with the traversed text labels need to be determined, and all the text boxes corresponding to the continuous adjacent content labels with the traversed text labels are combined. For example, if the traversed text label is I4 and the consecutive adjacent text labels are I1-I5, the consecutive adjacent text labels include the traversed text label I4, at this time, the text boxes corresponding to all the consecutive adjacent text labels may be merged, that is, the text boxes corresponding to I1, I2, I3, I4, and I5 are merged to obtain a merged text box, and then the merged text box is merged with the text box corresponding to the previous text label, that is, the text box corresponding to the B label is merged to obtain a merged text box, and at this time, the merged text box may be directly used as a text paragraph corresponding to the picture.
In this embodiment, when it is determined that the traversed text labels exist in the consecutive adjacent content labels, all text boxes corresponding to the consecutive adjacent content labels having the traversed text labels and a text box corresponding to a previous text label are merged, and the merged text box is used as a text paragraph, so that the accuracy of the obtained text paragraph is ensured.
Further, the step of inputting the text features of each text box into a preset deep learning model for training based on the sorted sorting result includes:
and step m, sequentially extracting text features of the text boxes, fusing the text features into sequence features according to the sequencing result, and inputting the sequence features into a preset deep learning model for training.
In this embodiment, after obtaining each text box and sorting each text box, text features, such as a position feature, a language feature and an image feature, in each text box may be sequentially extracted, and after extracting each text feature, how each text feature is a sequence feature may be determined according to a sorting result of each text box, and the sequence feature is used as an input in a preset deep learning model, and an output result, that is, a training result, is obtained by training in the deep learning model in which an input value is preset.
In the embodiment, the text features of the text boxes are sequentially extracted, the text features are fused, and the fused sequence features are input into the preset deep learning model for training, so that the effective training is guaranteed.
Further, the step of sequentially extracting the text features of each text box includes:
and n, sequentially traversing each text box, extracting the position feature, the language feature and the image feature of the traversed text box, and taking the position feature, the language feature and the image feature as the text features of the traversed text box.
In this embodiment, when extracting the text features of all the text boxes, the text boxes may be sequentially traversed, the position features, the language features, and the image features of the traversed text boxes are extracted, and then the position features, the language features, and the image features are used as the text features of the traversed text boxes. I.e. the same extraction is applied to all text boxes. The position features may be vertex coordinates of the text box, a center point, a width of the text box, and high features. The language features may be language model features of the text in the text box, such as: word vectors, sentence vectors, text ngram scores, etc. for text. The image features may be features extracted from text regions in the image using a convolutional neural network, or the like.
In this embodiment, the position feature, the language feature and the image feature of the traversed text box are extracted and used as the text feature of the traversed text box, so that the effectiveness of the acquired text feature is guaranteed.
In addition, referring to fig. 3, an embodiment of the present invention further provides a text paragraph structure restoring apparatus, where the text paragraph structure restoring apparatus includes:
a determining module a10, configured to identify a target picture, and determine, based on an identification result of the identification, all text boxes and text box positions of the text boxes in the target picture;
the input module A20 is used for sorting the text boxes according to the positions of the text boxes and inputting the text features of the text boxes to a preset deep learning model for training based on the sorting result of the sorting;
an obtaining module a30, configured to perform merging processing on each text box based on the training result of the training to obtain all text paragraphs corresponding to the target picture.
Further, the obtaining module a30 is further configured to:
determining text labels corresponding to the text boxes based on the training result of the training, traversing the text labels, and detecting whether traversal label information corresponding to the traversed text labels is a paragraph;
and if the traversal label information is a paragraph, determining that a text box corresponding to the traversed text label is a text paragraph corresponding to the target picture.
Further, the obtaining module a30 is further configured to:
if not, detecting whether the traversal label information is paragraph content;
if the traversal label information is paragraph content, determining whether the label information of a text label before the traversal text label is paragraph starting information;
and if the label information of the previous text label is paragraph starting information, determining the text paragraph corresponding to the target picture based on the traversed text label and the previous text label.
Further, the obtaining module a30 is further configured to:
detecting whether continuous adjacent content tags exist in the text tags or not;
if continuous adjacent content tags exist, determining whether traversed text tags exist in the continuous adjacent content tags;
if the traversed text label does not exist, combining the text box corresponding to the traversed text label and the text box corresponding to the previous text label to obtain a combined text box, and taking the combined text box as a text paragraph corresponding to the target picture.
Further, the obtaining module a30 is further configured to:
if the traversed text labels exist, combining all text boxes corresponding to the continuous adjacent content labels with the traversed text labels with the text box corresponding to the previous text label to obtain a combined text box, and taking the combined text box as a text paragraph corresponding to the picture.
Further, the input module a20 is further configured to:
and sequentially extracting text features of the text boxes, fusing the text features into sequence features according to the sequencing result, and inputting the sequence features into a preset deep learning model for training.
Further, the input module a20 is further configured to:
and traversing each text box in sequence, extracting the position feature, the language feature and the image feature of the traversed text box, and taking the position feature, the language feature and the image feature as the text feature of the traversed text box.
The steps implemented by each functional module of the text paragraph structure restoring device may refer to each embodiment of the text paragraph structure restoring method of the present invention, and are not described herein again.
The present invention also provides a text paragraph structure restoring apparatus, including: the text paragraph structure restoring program comprises a memory, a processor and a text paragraph structure restoring program stored on the memory; the processor is configured to execute the text paragraph structure restoring program to implement the following steps:
identifying a target picture, and determining all text boxes and text box positions of all text boxes in the target picture based on the identified identification result;
sequencing the text boxes according to the positions of the text boxes, and inputting the text features of the text boxes to a preset deep learning model for training based on the sequencing result of the sequencing;
and merging the text boxes based on the training result of the training to obtain all text paragraphs corresponding to the target picture.
The present invention also provides a computer-readable storage medium storing one or more programs, which are further executable by one or more processors for implementing the steps of the embodiments of the text paragraph structure restoring method described above.
The specific implementation manner of the computer-readable storage medium of the present invention is substantially the same as that of each embodiment of the text paragraph structure restoring method described above, and is not described herein again.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. A text paragraph structure reduction method is characterized by comprising the following steps:
identifying a target picture, and determining all text boxes and text box positions of all text boxes in the target picture based on the identified identification result;
sequencing the text boxes according to the positions of the text boxes, and inputting the text features of the text boxes to a preset deep learning model for training based on the sequencing result of the sequencing;
and merging the text boxes based on the training result of the training to obtain all text paragraphs corresponding to the target picture.
2. The method for restoring a text paragraph structure according to claim 1, wherein the step of merging each text box based on the training result of the training to obtain all text paragraphs corresponding to the target picture comprises:
determining text labels corresponding to the text boxes based on the training result of the training, traversing the text labels, and detecting whether traversal label information corresponding to the traversed text labels is a paragraph;
and if the traversal label information is a paragraph, determining that a text box corresponding to the traversed text label is a text paragraph corresponding to the target picture.
3. The method for text paragraph structure recovery according to claim 2, wherein the step of detecting whether the traversal label information corresponding to the traversed text label is a paragraph comprises:
if not, detecting whether the traversal label information is paragraph content;
if the traversal label information is paragraph content, determining whether the label information of a text label before the traversal text label is paragraph starting information;
and if the label information of the previous text label is paragraph starting information, determining the text paragraph corresponding to the target picture based on the traversed text label and the previous text label.
4. The method of claim 3, wherein the step of determining the text paragraph corresponding to the target picture based on the traversed text label and the previous text label comprises:
detecting whether continuous adjacent content tags exist in the text tags or not;
if continuous adjacent content tags exist, determining whether traversed text tags exist in the continuous adjacent content tags;
if the traversed text label does not exist, combining the text box corresponding to the traversed text label and the text box corresponding to the previous text label to obtain a combined text box, and taking the combined text box as a text paragraph corresponding to the target picture.
5. The text paragraph structure reduction method of claim 4 wherein the step of determining whether there is a traversed text label in the consecutive adjacent content labels is followed by:
if the traversed text labels exist, combining all text boxes corresponding to the continuous adjacent content labels with the traversed text labels with the text box corresponding to the previous text label to obtain a combined text box, and taking the combined text box as a text paragraph corresponding to the picture.
6. The method for text paragraph structure reduction according to any one of claims 1-5, wherein the step of inputting the text features of each text box into a preset deep learning model for training based on the sorted result comprises:
and sequentially extracting text features of the text boxes, fusing the text features into sequence features according to the sequencing result, and inputting the sequence features into a preset deep learning model for training.
7. The method for restoring a text paragraph structure as claimed in claim 6, wherein the step of sequentially extracting the text features of each text box comprises:
and traversing each text box in sequence, extracting the position feature, the language feature and the image feature of the traversed text box, and taking the position feature, the language feature and the image feature as the text feature of the traversed text box.
8. A text paragraph structure restoring apparatus, comprising:
the determining module is used for identifying a target picture and determining all text boxes and text box positions of all the text boxes in the target picture based on the identified identification result;
the input module is used for sequencing the text boxes according to the positions of the text boxes and inputting the text features of the text boxes to a preset deep learning model for training based on the sequencing result of the sequencing;
and the obtaining module is used for carrying out merging processing on each text box based on the training result of the training so as to obtain all text paragraphs corresponding to the target picture.
9. A text paragraph structure restoring apparatus, characterized in that the text paragraph structure restoring apparatus comprises: a memory, a processor and a text paragraph structure reduction program stored on the memory and executable on the processor, the text paragraph structure reduction program when executed by the processor implementing the steps of the text paragraph structure reduction method as claimed in any one of claims 1 to 7.
10. A computer storage medium having stored thereon a text paragraph structure restoring program that, when executed by a processor, performs the steps of the text paragraph structure restoring method according to any one of claims 1 to 7.
CN202011264865.0A 2020-11-13 2020-11-13 Text paragraph structure reduction method, device, equipment and computer storage medium Active CN112070076B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202011264865.0A CN112070076B (en) 2020-11-13 2020-11-13 Text paragraph structure reduction method, device, equipment and computer storage medium
PCT/CN2021/124605 WO2022100376A1 (en) 2020-11-13 2021-10-19 Text paragraph structure restoration method and apparatus, and device and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011264865.0A CN112070076B (en) 2020-11-13 2020-11-13 Text paragraph structure reduction method, device, equipment and computer storage medium

Publications (2)

Publication Number Publication Date
CN112070076A true CN112070076A (en) 2020-12-11
CN112070076B CN112070076B (en) 2021-04-06

Family

ID=73655111

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011264865.0A Active CN112070076B (en) 2020-11-13 2020-11-13 Text paragraph structure reduction method, device, equipment and computer storage medium

Country Status (2)

Country Link
CN (1) CN112070076B (en)
WO (1) WO2022100376A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113221632A (en) * 2021-03-23 2021-08-06 奇安信科技集团股份有限公司 Document picture identification method and device and computer equipment
CN113392827A (en) * 2021-06-22 2021-09-14 平安健康保险股份有限公司 Character recognition method, device, equipment and medium
CN113610075A (en) * 2021-07-16 2021-11-05 苏州浪潮智能科技有限公司 Lightweight label text box detection method, device, terminal and storage medium
CN114170423A (en) * 2022-02-14 2022-03-11 成都数之联科技股份有限公司 Image document layout identification method, device and system
WO2022100376A1 (en) * 2020-11-13 2022-05-19 深圳壹账通智能科技有限公司 Text paragraph structure restoration method and apparatus, and device and computer storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140250361A1 (en) * 2004-09-30 2014-09-04 Macromedia, Inc. Preserving document construct fidelity in converting graphic-represented documents into text-readable documents
CN108804591A (en) * 2018-05-28 2018-11-13 杭州依图医疗技术有限公司 A kind of file classification method and device of case history text
CN109697414A (en) * 2018-12-13 2019-04-30 北京金山数字娱乐科技有限公司 A kind of text positioning method and device
CN110532563A (en) * 2019-09-02 2019-12-03 苏州美能华智能科技有限公司 The detection method and device of crucial paragraph in text
CN111507112A (en) * 2019-01-31 2020-08-07 搜狗(杭州)智能科技有限公司 Translation method and device and translation device
CN111639250A (en) * 2020-06-05 2020-09-08 深圳市小满科技有限公司 Enterprise description information acquisition method and device, electronic equipment and storage medium
CN111639175A (en) * 2020-05-29 2020-09-08 电子科技大学 Self-monitoring dialog text summarization method and system

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10565443B2 (en) * 2018-02-16 2020-02-18 Wipro Limited Method and system for determining structural blocks of a document
CN112070076B (en) * 2020-11-13 2021-04-06 深圳壹账通智能科技有限公司 Text paragraph structure reduction method, device, equipment and computer storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140250361A1 (en) * 2004-09-30 2014-09-04 Macromedia, Inc. Preserving document construct fidelity in converting graphic-represented documents into text-readable documents
CN108804591A (en) * 2018-05-28 2018-11-13 杭州依图医疗技术有限公司 A kind of file classification method and device of case history text
CN109697414A (en) * 2018-12-13 2019-04-30 北京金山数字娱乐科技有限公司 A kind of text positioning method and device
CN111507112A (en) * 2019-01-31 2020-08-07 搜狗(杭州)智能科技有限公司 Translation method and device and translation device
CN110532563A (en) * 2019-09-02 2019-12-03 苏州美能华智能科技有限公司 The detection method and device of crucial paragraph in text
CN111639175A (en) * 2020-05-29 2020-09-08 电子科技大学 Self-monitoring dialog text summarization method and system
CN111639250A (en) * 2020-06-05 2020-09-08 深圳市小满科技有限公司 Enterprise description information acquisition method and device, electronic equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
S. SACHIN ET AL: "Text/Image Region Separation for Document Layout Detection of Old Document Images Using Non-linear Diffusion and Level Set", 《PROCEDIA COMPUTER SCIENCE》 *
李睿凡 ET AL: "全卷积神经结构的段落式图像描述算法", 《北京邮电大学学报》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022100376A1 (en) * 2020-11-13 2022-05-19 深圳壹账通智能科技有限公司 Text paragraph structure restoration method and apparatus, and device and computer storage medium
CN113221632A (en) * 2021-03-23 2021-08-06 奇安信科技集团股份有限公司 Document picture identification method and device and computer equipment
CN113392827A (en) * 2021-06-22 2021-09-14 平安健康保险股份有限公司 Character recognition method, device, equipment and medium
CN113610075A (en) * 2021-07-16 2021-11-05 苏州浪潮智能科技有限公司 Lightweight label text box detection method, device, terminal and storage medium
CN113610075B (en) * 2021-07-16 2023-05-26 苏州浪潮智能科技有限公司 Lightweight label text box detection method, device, terminal and storage medium
CN114170423A (en) * 2022-02-14 2022-03-11 成都数之联科技股份有限公司 Image document layout identification method, device and system

Also Published As

Publication number Publication date
CN112070076B (en) 2021-04-06
WO2022100376A1 (en) 2022-05-19

Similar Documents

Publication Publication Date Title
CN112070076B (en) Text paragraph structure reduction method, device, equipment and computer storage medium
CN110390269B (en) PDF document table extraction method, device, equipment and computer readable storage medium
CN111476227B (en) Target field identification method and device based on OCR and storage medium
US10032072B1 (en) Text recognition and localization with deep learning
CN111428723B (en) Character recognition method and device, electronic equipment and storage medium
CN110033018B (en) Graph similarity judging method and device and computer readable storage medium
CN112101317B (en) Page direction identification method, device, equipment and computer readable storage medium
CN112434691A (en) HS code matching and displaying method and system based on intelligent analysis and identification and storage medium
CN111832447A (en) Building drawing component identification method, electronic equipment and related product
CN110659346B (en) Form extraction method, form extraction device, terminal and computer readable storage medium
CN113239227B (en) Image data structuring method, device, electronic equipment and computer readable medium
CN111738252B (en) Text line detection method, device and computer system in image
CN112308069A (en) Click test method, device, equipment and storage medium for software interface
CN113469067A (en) Document analysis method and device, computer equipment and storage medium
CN111522901A (en) Method and device for processing address information in text
CN111368045A (en) User intention identification method, device, equipment and computer readable storage medium
CN111612081A (en) Recognition model training method, device, equipment and storage medium
CN111753522A (en) Event extraction method, device, equipment and computer readable storage medium
CN110363190A (en) A kind of character recognition method, device and equipment
CN113762257A (en) Identification method and device for marks in makeup brand images
CN114022891A (en) Method, device and equipment for extracting key information of scanned text and storage medium
KR102086600B1 (en) Apparatus and method for providing purchase information of products
CN110750501A (en) File retrieval method and device, storage medium and related equipment
CN111414758A (en) Zero-reference position detection method, device, equipment and computer-readable storage medium
CN113297411B (en) Method, device and equipment for measuring similarity of wheel-shaped atlas and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant