CN113743318A

CN113743318A - Table structure identification method based on row and column division, storage medium and electronic device

Info

Publication number: CN113743318A
Application number: CN202111042986.5A
Authority: CN
Inventors: 孔令军; 包云超; 王茜雯; 侯文涛; 刘伟光; 周耀威; 闫佳艺; 李华康
Original assignee: Jinling Institute of Technology
Current assignee: Jinling Institute of Technology
Priority date: 2021-09-07
Filing date: 2021-09-07
Publication date: 2021-12-03

Abstract

The invention discloses a table structure identification method based on row-column division, a storage medium and an electronic device, wherein the method comprises the steps of obtaining a table image; extracting a table feature map comprising row features and column features; processing the row characteristics and the column characteristics to respectively obtain row distribution and column distribution of the table; and judging whether the areas of the row distribution and the column distribution are overlapped, wherein the overlapped part is a table cell, and otherwise, the overlapped part is a background. The invention simplifies the table row and column prediction and ensures higher stability of the prediction; the prediction of table rows and columns is completed in the same convolutional network, so that the debugging and the deployment are facilitated; the table row and column distribution is obtained first, and then the table cell distribution is obtained, so that the robustness is improved by the bottom-up method.

Description

Table structure identification method based on row and column division, storage medium and electronic device

Technical Field

The invention belongs to the technical field of computer vision and artificial intelligence, and particularly relates to a table structure identification method based on row-column division, a storage medium and an electronic device.

Background

In daily life, a form is a general and common text object, and how to detect and identify the form in massive data becomes a necessary and challenging task. The table detection and the table structure identification form a complete table identification task. The purpose of form detection is to locate the form area in the page, which many researchers define as a target detection problem. Table structure identification is a more difficult task compared to table detection, with the goal of obtaining structure information for the table. Early table structure identification studies were primarily based on heuristic rules, i.e., a series of rules were developed to detect tables that met specific conditions. However, the table identification method based on the heuristic rule is difficult to design, is limited in a certain scene, and cannot show good generalization capability. At present, most researchers use deep learning methods such as target detection and image segmentation to identify the table structure. For the special structure of the table, the frame lines between rows and columns can be used as objects for identification, but the frame lines of the table occupy fewer pixels, which causes the problem of unbalance between positive and negative samples. Some studies propose consistency assumptions for the table structure: all rows of the table start from the start of the first column and end at the end of the last column; all columns start from the start of the first row and end at the end of the last row. Therefore, for the column features, only the classification of the first row of pixels needs to be predicted and then expanded to obtain the whole column prediction image, and for the row features, only the classification of the first column of pixels needs to be predicted. Although the complexity of the row-column division can be reduced, a large fault tolerance rate is easily generated, and the whole prediction graph is influenced by the classification prediction error of a certain pixel position.

Disclosure of Invention

In view of the defects in the prior art, the table structure identification task is divided into the table row and column division tasks, and complete table structure information is constructed through the divided row and column information.

In a first aspect of the present invention, a method for identifying a table structure based on row-column division is provided, which includes the following steps,

s1, obtaining a form image;

s2, extracting a table feature map comprising row features and column features;

s3, processing the row characteristics and the column characteristics to respectively obtain row distribution and column distribution of the table;

s4, judging whether the areas of the row distribution and the column distribution are overlapped, wherein the overlapped part is a table cell, and otherwise, the overlapped part is a background.

Further, the extracting of the row features and the column features of the table in step S2 is specifically to perform feature extraction by using a convolutional neural network based on deep learning as a backbone network, where the convolutional neural network is VGG, ResNet, or MobileNet.

Further, in step S3, specifically,

s31, respectively extracting the maximum value of each line and each column of the feature map on the channel dimension by using a network based on an attention mechanism;

s32, correspondingly generating a distribution of a column of pixels and a distribution of a row of pixels;

assuming that the input table feature map has a size of H × W × C, a line feature map F having a size of H × 1 × C is output_rowAnd a column profile F of size 1 XWxC_col；

S33, for the line feature diagram F_rowAnd said column profile F_colTiling to obtain row distribution and column distribution with dimension H × W × C

In a second aspect of the present invention, a computer-readable storage medium is provided, in which a computer program is stored, wherein the computer program is configured to perform the method according to any one of the above-mentioned aspects when the computer program runs.

In a third aspect of the present invention, an electronic device is provided, comprising a memory and a processor, wherein the memory stores a computer program, and the processor is configured to execute the computer program to perform the method according to any one of the above technical solutions.

The invention has the following beneficial effects: the table row and column prediction is simplified, and the higher stability of the prediction is ensured; the prediction of table rows and columns is completed in the same convolutional network, so that the debugging and the deployment are facilitated; the table row and column distribution is obtained first, and then the table cell distribution is obtained, so that the robustness is improved by the bottom-up method.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a flowchart illustrating a method for identifying a table structure based on row-column division according to an embodiment of the present invention;

FIG. 2 is a schematic illustration of the row distribution obtained in the embodiment of FIG. 1;

FIG. 3 is a schematic illustration of the column distribution obtained in the embodiment of FIG. 1;

fig. 4 is a schematic diagram of the cell distribution in the embodiment of fig. 1.

Detailed Description

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover non-exclusive inclusions, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

As shown in fig. 1, a first aspect of this embodiment is a method for identifying a table structure based on row-column division, including the following steps:

and S1, acquiring a form image.

In the embodiment of the present invention, the picture including the table may be obtained by a scanner, a high-speed scanner, a digital camera, a mobile terminal with a camera, and the like, which is not limited in the present invention.

In the embodiment of the present invention, the picture may include contents such as tables, characters, and pictures, and colors of the background, the tables, and the characters in the picture may be white, black, red, yellow, blue, and the like, which is not limited in the present invention.

And S2, extracting a table feature diagram comprising row features and column features.

In the embodiment of the invention, the table is an ordered organization form formed by a plurality of rows and columns, and the intersection areas of the rows and the columns form a plurality of cells in the table. Based on the row and column distributions, a cell distribution can be constructed, knowing the structure of the table.

Specifically, the convolutional neural network based on deep learning is used as a backbone network for feature extraction, and the backbone network may be VGG, ResNet, MobileNet, or the like, which is not limited in this disclosure.

And S3, processing the row characteristics and the column characteristics to respectively obtain the row distribution and the column distribution of the table.

Specifically, the method comprises the following steps:

s31, slicing the table feature map, namely, respectively extracting the maximum value of each row and each column of the feature map on the channel dimension by using a network based on an attention mechanism;

S33, for the line feature diagram F_rowAnd said column profile F_colTiling, i.e. F_rowCopying W times along the width axis, adding F_colH copies along the height axis result in a row distribution and a column distribution with dimensions H × W × C, respectively

The slicing operation simplifies the column division on each channel into the prediction of a row of elements, and the tiling operation restores the feature map to the size before slicing, so that on one hand, rough soft prediction is generated to guide the learning of a row and column prediction network, and on the other hand, error correction can be performed by means of the row and column prediction network to avoid generating large errors.

Normalizing the characteristic value of the characteristic diagram after the tiling operation to 0-1 through Softmax; the row and column information streams are added to the upsampled overall stream, respectively. And finally, multiplying the normalized feature map and the added feature map to obtain an output feature map. This operation is intended to extract attention on the columns from the column information stream and suppress irrelevant information, which is finally applied to the information stream enhanced by the overall information stream.

In a second aspect of this embodiment, a computer-readable storage medium is provided, in which a computer program is stored, where the computer program is configured to execute the method in any one of the above technical solutions when the computer program is executed.

In a third aspect of the present embodiment, an electronic device is provided, which includes a memory and a processor, where the memory stores a computer program, and the processor is configured to execute the computer program to perform the method in any one of the above technical solutions.

The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. The table structure identification method based on row-column division is characterized by comprising the following steps,

s1, obtaining a form image;

s2, extracting a table feature map comprising row features and column features;

2. The table structure recognition method of claim 1, wherein the extracting of the row features and the column features of the table in S2 specifically includes performing feature extraction by using a convolutional neural network based on deep learning as a backbone network, where the convolutional neural network is VGG, ResNet, or MobileNet.

3. The table structure recognition method according to claim 1, wherein step S3 is specifically,

4. A computer-readable storage medium, in which a computer program is stored, wherein the computer program is configured to carry out the method of any one of claims 1 to 3 when executed.

5. An electronic device comprising a memory and a processor, wherein the memory has stored therein a computer program, and wherein the processor is arranged to execute the computer program to perform the method of any of claims 1 to 3.