CN113743318A - Table structure identification method based on row and column division, storage medium and electronic device - Google Patents
Table structure identification method based on row and column division, storage medium and electronic device Download PDFInfo
- Publication number
- CN113743318A CN113743318A CN202111042986.5A CN202111042986A CN113743318A CN 113743318 A CN113743318 A CN 113743318A CN 202111042986 A CN202111042986 A CN 202111042986A CN 113743318 A CN113743318 A CN 113743318A
- Authority
- CN
- China
- Prior art keywords
- row
- column
- distribution
- features
- feature map
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a table structure identification method based on row-column division, a storage medium and an electronic device, wherein the method comprises the steps of obtaining a table image; extracting a table feature map comprising row features and column features; processing the row characteristics and the column characteristics to respectively obtain row distribution and column distribution of the table; and judging whether the areas of the row distribution and the column distribution are overlapped, wherein the overlapped part is a table cell, and otherwise, the overlapped part is a background. The invention simplifies the table row and column prediction and ensures higher stability of the prediction; the prediction of table rows and columns is completed in the same convolutional network, so that the debugging and the deployment are facilitated; the table row and column distribution is obtained first, and then the table cell distribution is obtained, so that the robustness is improved by the bottom-up method.
Description
Technical Field
The invention belongs to the technical field of computer vision and artificial intelligence, and particularly relates to a table structure identification method based on row-column division, a storage medium and an electronic device.
Background
In daily life, a form is a general and common text object, and how to detect and identify the form in massive data becomes a necessary and challenging task. The table detection and the table structure identification form a complete table identification task. The purpose of form detection is to locate the form area in the page, which many researchers define as a target detection problem. Table structure identification is a more difficult task compared to table detection, with the goal of obtaining structure information for the table. Early table structure identification studies were primarily based on heuristic rules, i.e., a series of rules were developed to detect tables that met specific conditions. However, the table identification method based on the heuristic rule is difficult to design, is limited in a certain scene, and cannot show good generalization capability. At present, most researchers use deep learning methods such as target detection and image segmentation to identify the table structure. For the special structure of the table, the frame lines between rows and columns can be used as objects for identification, but the frame lines of the table occupy fewer pixels, which causes the problem of unbalance between positive and negative samples. Some studies propose consistency assumptions for the table structure: all rows of the table start from the start of the first column and end at the end of the last column; all columns start from the start of the first row and end at the end of the last row. Therefore, for the column features, only the classification of the first row of pixels needs to be predicted and then expanded to obtain the whole column prediction image, and for the row features, only the classification of the first column of pixels needs to be predicted. Although the complexity of the row-column division can be reduced, a large fault tolerance rate is easily generated, and the whole prediction graph is influenced by the classification prediction error of a certain pixel position.
Disclosure of Invention
In view of the defects in the prior art, the table structure identification task is divided into the table row and column division tasks, and complete table structure information is constructed through the divided row and column information.
In a first aspect of the present invention, a method for identifying a table structure based on row-column division is provided, which includes the following steps,
s1, obtaining a form image;
s2, extracting a table feature map comprising row features and column features;
s3, processing the row characteristics and the column characteristics to respectively obtain row distribution and column distribution of the table;
s4, judging whether the areas of the row distribution and the column distribution are overlapped, wherein the overlapped part is a table cell, and otherwise, the overlapped part is a background.
Further, the extracting of the row features and the column features of the table in step S2 is specifically to perform feature extraction by using a convolutional neural network based on deep learning as a backbone network, where the convolutional neural network is VGG, ResNet, or MobileNet.
Further, in step S3, specifically,
s31, respectively extracting the maximum value of each line and each column of the feature map on the channel dimension by using a network based on an attention mechanism;
s32, correspondingly generating a distribution of a column of pixels and a distribution of a row of pixels;
assuming that the input table feature map has a size of H × W × C, a line feature map F having a size of H × 1 × C is outputrowAnd a column profile F of size 1 XWxCcol;
S33, for the line feature diagram FrowAnd said column profile FcolTiling to obtain row distribution and column distribution with dimension H × W × C
In a second aspect of the present invention, a computer-readable storage medium is provided, in which a computer program is stored, wherein the computer program is configured to perform the method according to any one of the above-mentioned aspects when the computer program runs.
In a third aspect of the present invention, an electronic device is provided, comprising a memory and a processor, wherein the memory stores a computer program, and the processor is configured to execute the computer program to perform the method according to any one of the above technical solutions.
The invention has the following beneficial effects: the table row and column prediction is simplified, and the higher stability of the prediction is ensured; the prediction of table rows and columns is completed in the same convolutional network, so that the debugging and the deployment are facilitated; the table row and column distribution is obtained first, and then the table cell distribution is obtained, so that the robustness is improved by the bottom-up method.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flowchart illustrating a method for identifying a table structure based on row-column division according to an embodiment of the present invention;
FIG. 2 is a schematic illustration of the row distribution obtained in the embodiment of FIG. 1;
FIG. 3 is a schematic illustration of the column distribution obtained in the embodiment of FIG. 1;
fig. 4 is a schematic diagram of the cell distribution in the embodiment of fig. 1.
Detailed Description
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover non-exclusive inclusions, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
As shown in fig. 1, a first aspect of this embodiment is a method for identifying a table structure based on row-column division, including the following steps:
and S1, acquiring a form image.
In the embodiment of the present invention, the picture including the table may be obtained by a scanner, a high-speed scanner, a digital camera, a mobile terminal with a camera, and the like, which is not limited in the present invention.
In the embodiment of the present invention, the picture may include contents such as tables, characters, and pictures, and colors of the background, the tables, and the characters in the picture may be white, black, red, yellow, blue, and the like, which is not limited in the present invention.
And S2, extracting a table feature diagram comprising row features and column features.
In the embodiment of the invention, the table is an ordered organization form formed by a plurality of rows and columns, and the intersection areas of the rows and the columns form a plurality of cells in the table. Based on the row and column distributions, a cell distribution can be constructed, knowing the structure of the table.
Specifically, the convolutional neural network based on deep learning is used as a backbone network for feature extraction, and the backbone network may be VGG, ResNet, MobileNet, or the like, which is not limited in this disclosure.
And S3, processing the row characteristics and the column characteristics to respectively obtain the row distribution and the column distribution of the table.
Specifically, the method comprises the following steps:
s31, slicing the table feature map, namely, respectively extracting the maximum value of each row and each column of the feature map on the channel dimension by using a network based on an attention mechanism;
s32, correspondingly generating a distribution of a column of pixels and a distribution of a row of pixels;
assuming that the input table feature map has a size of H × W × C, a line feature map F having a size of H × 1 × C is outputrowAnd a column profile F of size 1 XWxCcol;
S33, for the line feature diagram FrowAnd said column profile FcolTiling, i.e. FrowCopying W times along the width axis, adding FcolH copies along the height axis result in a row distribution and a column distribution with dimensions H × W × C, respectively
The slicing operation simplifies the column division on each channel into the prediction of a row of elements, and the tiling operation restores the feature map to the size before slicing, so that on one hand, rough soft prediction is generated to guide the learning of a row and column prediction network, and on the other hand, error correction can be performed by means of the row and column prediction network to avoid generating large errors.
Normalizing the characteristic value of the characteristic diagram after the tiling operation to 0-1 through Softmax; the row and column information streams are added to the upsampled overall stream, respectively. And finally, multiplying the normalized feature map and the added feature map to obtain an output feature map. This operation is intended to extract attention on the columns from the column information stream and suppress irrelevant information, which is finally applied to the information stream enhanced by the overall information stream.
S4, judging whether the areas of the row distribution and the column distribution are overlapped, wherein the overlapped part is a table cell, and otherwise, the overlapped part is a background.
In a second aspect of this embodiment, a computer-readable storage medium is provided, in which a computer program is stored, where the computer program is configured to execute the method in any one of the above technical solutions when the computer program is executed.
In a third aspect of the present embodiment, an electronic device is provided, which includes a memory and a processor, where the memory stores a computer program, and the processor is configured to execute the computer program to perform the method in any one of the above technical solutions.
The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.
Claims (5)
1. The table structure identification method based on row-column division is characterized by comprising the following steps,
s1, obtaining a form image;
s2, extracting a table feature map comprising row features and column features;
s3, processing the row characteristics and the column characteristics to respectively obtain row distribution and column distribution of the table;
s4, judging whether the areas of the row distribution and the column distribution are overlapped, wherein the overlapped part is a table cell, and otherwise, the overlapped part is a background.
2. The table structure recognition method of claim 1, wherein the extracting of the row features and the column features of the table in S2 specifically includes performing feature extraction by using a convolutional neural network based on deep learning as a backbone network, where the convolutional neural network is VGG, ResNet, or MobileNet.
3. The table structure recognition method according to claim 1, wherein step S3 is specifically,
s31, respectively extracting the maximum value of each line and each column of the feature map on the channel dimension by using a network based on an attention mechanism;
s32, correspondingly generating a distribution of a column of pixels and a distribution of a row of pixels;
assuming that the input table feature map has a size of H × W × C, a line feature map F having a size of H × 1 × C is outputrowAnd a column profile F of size 1 XWxCcol;
S33, for the line feature diagram FrowAnd said column profile FcolTiling to obtain row distribution and column distribution with dimension H × W × C
4. A computer-readable storage medium, in which a computer program is stored, wherein the computer program is configured to carry out the method of any one of claims 1 to 3 when executed.
5. An electronic device comprising a memory and a processor, wherein the memory has stored therein a computer program, and wherein the processor is arranged to execute the computer program to perform the method of any of claims 1 to 3.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111042986.5A CN113743318A (en) | 2021-09-07 | 2021-09-07 | Table structure identification method based on row and column division, storage medium and electronic device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111042986.5A CN113743318A (en) | 2021-09-07 | 2021-09-07 | Table structure identification method based on row and column division, storage medium and electronic device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113743318A true CN113743318A (en) | 2021-12-03 |
Family
ID=78736459
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111042986.5A Withdrawn CN113743318A (en) | 2021-09-07 | 2021-09-07 | Table structure identification method based on row and column division, storage medium and electronic device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113743318A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115331245A (en) * | 2022-10-12 | 2022-11-11 | 中南民族大学 | Table structure identification method based on image instance segmentation |
TWI806392B (en) * | 2022-01-27 | 2023-06-21 | 國立高雄師範大學 | Table detection method of table text |
-
2021
- 2021-09-07 CN CN202111042986.5A patent/CN113743318A/en not_active Withdrawn
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI806392B (en) * | 2022-01-27 | 2023-06-21 | 國立高雄師範大學 | Table detection method of table text |
CN115331245A (en) * | 2022-10-12 | 2022-11-11 | 中南民族大学 | Table structure identification method based on image instance segmentation |
CN115331245B (en) * | 2022-10-12 | 2023-02-03 | 中南民族大学 | Table structure identification method based on image instance segmentation |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8712188B2 (en) | System and method for document orientation detection | |
US8947736B2 (en) | Method for binarizing scanned document images containing gray or light colored text printed with halftone pattern | |
US8693779B1 (en) | Segmenting printed media pages into articles | |
CN109241861B (en) | Mathematical formula identification method, device, equipment and storage medium | |
CN109635805B (en) | Image text positioning method and device and image text identification method and device | |
JPH0721319A (en) | Automatic determination device of asian language | |
CN112070649B (en) | Method and system for removing specific character string watermark | |
CN113743318A (en) | Table structure identification method based on row and column division, storage medium and electronic device | |
CN110443235B (en) | Intelligent paper test paper total score identification method and system | |
US10423851B2 (en) | Method, apparatus, and computer-readable medium for processing an image with horizontal and vertical text | |
CN111680690A (en) | Character recognition method and device | |
US20190005325A1 (en) | Identification of emphasized text in electronic documents | |
US10586125B2 (en) | Line removal method, apparatus, and computer-readable medium | |
CN115035539B (en) | Document anomaly detection network model construction method and device, electronic equipment and medium | |
CN114283156A (en) | Method and device for removing document image color and handwriting | |
CN111461070A (en) | Text recognition method and device, electronic equipment and storage medium | |
CN111626145A (en) | Simple and effective incomplete form identification and page-crossing splicing method | |
CN114565927A (en) | Table identification method and device, electronic equipment and storage medium | |
CN109948598B (en) | Document layout intelligent analysis method and device | |
US20080310715A1 (en) | Applying a segmentation engine to different mappings of a digital image | |
CN115797939A (en) | Two-stage italic character recognition method and device based on deep learning | |
CN112580738B (en) | AttentionOCR text recognition method and device based on improvement | |
CN114926829A (en) | Certificate detection method and device, electronic equipment and storage medium | |
US10185885B2 (en) | Tex line detection | |
CN113793264A (en) | Archive image processing method and system based on convolution model and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20211203 |
|
WW01 | Invention patent application withdrawn after publication |