CN114419643A

CN114419643A - Method, system, equipment and storage medium for identifying table structure

Info

Publication number: CN114419643A
Application number: CN202111563490.2A
Authority: CN
Inventors: 薛洋; 王弘毅; 金连文
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2021-12-20
Filing date: 2021-12-20
Publication date: 2022-04-29

Abstract

The invention relates to the field of artificial intelligence and computers, in particular to a method, a system, electronic equipment and a storage medium for identifying a table structure, wherein the method comprises the following steps: performing semantic segmentation on a cell area, a table frame line area and a table area of an image to be recognized by using a pre-trained semantic segmentation model to obtain a cell area segmentation graph, a table frame line area segmentation graph and a table area segmentation graph; fusing the cell area segmentation graph and the table frame line area segmentation graph to obtain a fused cell area segmentation graph; carrying out control point extraction on the table area segmentation graph to obtain correction transformation; correcting the fusion cell segmentation graph by using correction transformation to obtain an alignment cell region segmentation graph; and performing connected domain extraction analysis on the aligned cell region segmentation graph, and obtaining table structure information according to matching conditions. The method and the device can solve the influence caused by the deformation of the table structure in the table structure recognition scene, and improve the accuracy rate of table structure recognition.

Description

Method, system, equipment and storage medium for identifying table structure

Technical Field

The invention relates to the field of artificial intelligence and computers, in particular to a method, a system, equipment and a storage medium for identifying a table structure.

Background

The form is a data organization form, is commonly used by people in daily life, the scenes of the form are rich and diverse, including electronic documents such as PDF, Excel and scanned documents, and also including printed documents such as bills and bills, and in addition, more complex natural scenes such as food packages and street posters are also provided, and the structure of the photographed form in such scenes usually has more serious deformation and is a more challenging part in the form structure identification technology.

At present, a mainstream table structure identification method, whether based on traditional image processing or based on deep learning, must be based on a strong structure prior, that is, the table must be rectangular, and once the deformation condition is met, the structure prior is lost, and the identification accuracy is seriously reduced.

Disclosure of Invention

In order to solve the technical problems in the prior art, the invention provides a method, a system, equipment and a storage medium for identifying a table structure, which can solve the influence caused by deformation of the table structure in a table structure identification scene and improve the accuracy rate of table structure identification.

The invention aims to provide a table structure identification method.

A second object of the present invention is to provide a system for table structure identification.

It is a third object of the invention to provide a computer apparatus.

It is a fourth object of the present invention to provide a storage medium.

The first purpose of the invention can be achieved by adopting the following technical scheme:

a method of table structure identification, the method comprising:

s1, performing semantic segmentation on the cell area, the table frame line area and the table area of the image to be recognized by using a pre-trained semantic segmentation model to obtain a cell area segmentation graph, a table frame line area segmentation graph and a table area segmentation graph;

s2, fusing the cell area segmentation graph and the table frame line area segmentation graph to obtain a fused cell area segmentation graph;

s3, extracting control points of the table area segmentation graph to obtain correction transformation;

s4, correcting the fusion cell segmentation graph by using correction transformation to obtain an alignment cell region segmentation graph;

and S5, performing connected domain extraction analysis on the aligned cell region segmentation graph, and obtaining table structure information according to matching conditions.

Specifically, the step S1 of using the pre-trained semantic segmentation model as the neural network depeplab v3+ specifically includes:

changing the depeplab V3+ last layer multi-classification output of the neural network to form three two-classification outputs, and calculating and performing gradient back transmission on the loss function respectively relative to the labeled images of the cell area, the table frame line area and the table area;

when the neural network depeplab V3+ is in the forward direction, semantic segmentation maps of the cell area, the table frame line area and the table area are output at the same time.

Further, the step S2 specifically includes:

s21, carrying out pixel-level inversion operation on the table frame line region segmentation graph to obtain an inverted table frame line region segmentation graph;

and S22, carrying out pixel level and operation on the cell area segmentation graph and the frame line area segmentation graph of the reverse table to obtain a fused cell area segmentation graph.

The second purpose of the invention can be achieved by adopting the following technical scheme:

a system for table structure identification, the system comprising:

the semantic segmentation module is used for performing semantic segmentation on the cell area, the table frame line area and the table area of the image to be recognized by using a pre-trained semantic segmentation model to obtain a cell area segmentation graph, a table frame line area segmentation graph and a table area segmentation graph;

the segmentation map fusion module is used for fusing the cell region segmentation map and the table frame line region segmentation map to obtain a fused cell region segmentation map;

the control point extraction module is used for extracting control points of the table region segmentation graph to obtain correction transformation;

the segmentation map correction module is used for correcting the fusion cell segmentation map by using correction transformation to obtain an aligned cell region segmentation map;

and the extraction analysis module is used for carrying out connected domain extraction analysis on the aligned cell region segmentation graph and obtaining the table structure information according to the matching conditions.

The third purpose of the invention can be achieved by adopting the following technical scheme:

a computer device comprises a processor and a memory for storing a program executable by the processor, wherein the processor executes the program stored in the memory to realize the table structure identification method.

The fourth purpose of the invention can be achieved by adopting the following technical scheme:

a storage medium stores a program which, when executed by a processor, implements the above-described method of table structure recognition.

Compared with the prior art, the invention has the following advantages and beneficial effects:

according to the invention, the segmentation maps of the cell areas and the table frame line areas are fused to obtain the fusion cell area segmentation map, and the segmentation maps of the cell areas and the table frame line areas make up for the deficiencies of the cell areas and the table frame line areas to improve the identification accuracy of the table structure; control point extraction is carried out on the table region segmentation graph to obtain correction transformation, and structure prior is recovered, so that the next processing is facilitated, and the table structure identification accuracy is improved; and correcting the fusion cell segmentation graph by using correction transformation to obtain an alignment cell region segmentation graph, performing connected domain extraction analysis on the alignment cell region segmentation graph, and obtaining table structure information according to matching conditions. The method and the device can solve the influence caused by the deformation of the table structure in the table structure recognition scene, and improve the accuracy rate of table structure recognition.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the structures shown in the drawings without creative efforts.

Fig. 1 is a general flowchart of a table structure identification method in embodiment 1 of the present invention;

fig. 2 is a schematic diagram of a cell area division diagram, a table frame line division diagram, and a table area division diagram in embodiment 1 of the present invention;

FIG. 3 is a flowchart illustrating a process of generating a segmentation map of the fusion cell in embodiment 1 of the present invention;

FIG. 4 is a diagram illustrating the effect of an image to be recognized according to embodiment 1 of the present invention;

FIG. 5 is a diagram showing the cell region division obtained in example 1 of the present invention;

FIG. 6 is a table frame line segmentation chart obtained in example 1 of the present invention;

FIG. 7 is a schematic diagram of a segmentation map of a fused cell area obtained in example 1 of the present invention;

FIG. 8 is a schematic view of a segmentation chart of a cell region before and after correction in example 1 of the present invention;

fig. 9 is a specific flowchart for obtaining table structure information in embodiment 1 of the present invention.

Detailed Description

The technical solutions of the present invention will be further described in detail with reference to the accompanying drawings and examples, and it is obvious that the described examples are some, but not all, examples of the present invention, and the embodiments of the present invention are not limited thereto. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Example 1

As shown in fig. 1, the present embodiment provides a table structure identification method, including the following steps:

s1, performing semantic segmentation on the cell area, the table frame line area and the table area of the image to be recognized by using the pre-trained semantic segmentation model to obtain a cell area segmentation graph, a table frame line area segmentation graph and a table area segmentation graph.

The image to be recognized in this embodiment is a scene image containing a form, and includes photographs containing a form, such as a print ticket, a bill, a food package, a street poster, and the like, which are taken by using an image pickup device, and also includes electronic documents such as PDF, Excel, and a scanned document.

The pre-trained semantic segmentation model described in this embodiment is a deep learning-based neural network depeplab v3+, and can perform pixel-level classification and labeling on an image.

In this embodiment, the training data of the pre-trained semantic segmentation model is obtained from an existing public data set.

Further, the specific step of performing semantic segmentation on the cell area, the table frame line area and the table area of the image to be recognized at the same time includes:

and changing the use of the depllabV 3+ multi-classification output of the last layer into three two-classification output, and calculating the loss function relative to the labeled images of the cell area, the table frame line area and the table area respectively and performing gradient back transmission.

When the deplabV 3+ network is in the forward direction, semantic segmentation maps of the cell area, the table frame line area and the table area are output at the same time.

As shown in fig. 2, the cell area division diagram, the table frame line area division diagram, and the table area division diagram are sequentially arranged from left to right.

And S2, fusing the cell area segmentation graph and the table frame line area segmentation graph to obtain a fused cell area segmentation graph.

As shown in fig. 3, the fusing the cell region segmentation map and the table frame line segmentation map specifically includes the steps of:

As shown in fig. 4-7, the operations of steps S1 and S2 are performed on one image to be recognized to obtain a segmentation map of the fused cell region. Fig. 4 is an effect diagram of an image to be recognized, and as shown in fig. 5, due to the properties of the table, the cell area segmentation diagram has very dense cell arrangement, and the segmentation diagram has a condition that different cells are adhered to each other, so that the adhered cells are divided into the same cell in subsequent operations, which affects the accuracy of table structure recognition.

Meanwhile, the cell structure is extracted from the table frame line segmentation graph, and the required area is completely surrounded by the table frame line mark. As shown in fig. 6, the table frame line region segmentation map has a long and thin region distribution due to the nature of the table frame line, and the result of the segmentation model is prone to have defects, which damages complete enclosure, causes cell loss, and also affects the accuracy of table structure identification.

As shown in fig. 7, which is a schematic diagram of a cell region segmentation graph, the cell region segmentation graph and a table frame line segmentation graph are fused to obtain a fused cell region segmentation graph, and the segmentation graph of the cell region and the table frame line region is used to make up for the deficiency, so as to improve the table structure identification accuracy.

And S3, extracting control points of the table region segmentation graph to obtain the correction transformation.

In an actual application scene, the table to be detected often has the condition of inclination and distortion, the original structure prior is damaged, at the moment, the table area needs to be corrected, the structure prior is recovered, the next processing is facilitated, and the table structure identification accuracy is improved.

The method comprises the following steps of extracting control points of a table area segmentation graph line to obtain correction transformation:

and S31, extracting four vertexes of the circumscribed quadrangle of the table area segmentation graph by using a polygon fitting algorithm.

S32, determining the center point and the width and the height of the table area according to the four vertexes, searching a middle control point according to a preset proportion, and enabling the middle control point and the four vertexes to form a first control point set.

And S33, generating a second control point set according to the width and the height of the table area and the preset proportion.

And S34, generating the correction transformation by using a thin plate spline interpolation algorithm according to the first control point set and the second control point set.

The rectification transformation used in the present embodiment is a thin-plate spline interpolation algorithm, which is a two-dimensional interpolation method used for image rectification or image registration. Given a pair of control point sets, the first control point set is used for describing the outline of the table area, and the second control point set is used for describing the target of the correction transformation, and as the target describes a rectangle, the distorted table area can be corrected into the rectangle after the transformation, and the structure prior is recovered.

From the first set of control points and the second set of control points, a corrective transformation can be obtained using the disclosed generic thin-plate spline interpolation calculation formula.

And S4, correcting the fused cell segmentation graph by using correction transformation to obtain an aligned cell region segmentation graph.

In step S3, the rectification transformation is obtained by analyzing the table region segmentation map, and since the shape of the fused cell segmentation map is consistent with that of the table region segmentation map, and only the fused cell segmentation map is more detailed and can separate the cells in the table region, the rectification transformation is applied to the fused cell segmentation map, and the aligned cell region segmentation map that is rectified into a rectangle can be obtained, and as shown in fig. 8, the fused cell region segmentation map that is curved on the left side is rectified into the aligned cell region segmentation map on the right side.

As shown in fig. 9, the aligning the cell region segmentation map for connected domain extraction analysis specifically includes the following steps:

and S51, carrying out image connected domain analysis on the alignment unit cell region segmentation graph by using a contour-based marking algorithm to obtain a plurality of connected regions, and converting the connected regions into a plurality of marking frames.

And S52, converting all the connected areas into mark frames respectively, wherein the mark frames are defined by coordinates, widths and heights of the upper left corners of the rectangles which are circumscribed by the connected areas.

The definition of the mark frame is a circumscribed rectangle of the connected region, and is described by the coordinates, width and height of the upper left corner of the circumscribed rectangle, namely [ x, y, w, h ]

And S53, filtering the mark frames by using preset adaptive conditions, and removing the mark frames meeting the conditions.

The preset self-adaptive conditions are as follows:

1) mark frame outside the range of the table area division graph;

2) and the area of the mark frame is smaller than that of the mark frame with a preset threshold value.

In this embodiment, the threshold is calculated according to the width and height of the table area and the number of the mark frames, and the threshold calculation formula is as follows:

threshold＝W*H/N/20

wherein, threshold is a threshold value, W is the width of the table area, H is the height of the table area, and N is the number of the mark frames.

And S54, performing transverse matching and longitudinal matching on the mark frame, and respectively generating a series of matching lists by the transverse matching and the longitudinal matching. The method for matching the marking frame in the transverse direction and the longitudinal direction specifically comprises the following steps:

the error between the middle points of the y coordinates of the pair of mark frames does not exceed the self-adaptive preset value, and the transverse matching is successful;

and the error between the points in the x coordinates of the pair of mark frames does not exceed the adaptive threshold value, and the longitudinal matching is successful.

Further, the matching divides a pair of mark frames into a source mark frame and a target mark frame, the adaptive threshold is determined according to the coordinates of the target mark frame, if the matching is transverse, the adaptive threshold is 1/2 of the height of the target mark frame,

if there is a vertical match, the adaptive threshold is 1/2 the width of the target marker box.

And S55, removing the duplicate of the matching list, and counting the occurrence frequency of each marking frame in the transverse and longitudinal matching lists to obtain the number of cross columns and cross rows of each marking frame.

In the matching process, in order to ensure complete matching, each mark frame can be used as a source mark frame to match with all other mark frames, a plurality of repeated matching lists are generated in a series of finally generated matching lists, duplication removal is needed, and only one of the repeated matching lists is reserved, so that a matching list capable of correctly describing table structure information can be obtained.

The mark frame with the condition of crossing rows and crossing columns can appear in different row and column matching lists at the same time, and the number of appearance is counted, namely the number of the crossing rows and the crossing columns of the mark frame.

S56, sorting the matching lists among the matching lists in each row, and sorting the marking frames in the matching lists in each row to obtain the table structure information.

Sorting the matching lists among all the rows of matching lists, and arranging the matching lists from small to large according to the average value of the vertical midpoints of the built-in mark frames; sorting the marking frames in the row matching list, wherein the marking frames are arranged from small to large according to the horizontal numerical value of the upper left corner coordinate of the marking frames; and combining the row-crossing and column-crossing numbers of the mark frames in the sequence from top to bottom among the row matching lists and from left to right in the row matching lists to obtain the table structure information.

Example 2:

the embodiment provides a table structure recognition system, which includes a semantic segmentation module, a segmentation map fusion module, a control point extraction module, a segmentation map rectification module, and a segmentation map fusion module, where the specific functions of each module are as follows:

the segmentation map fusion module is used for fusing the cell region segmentation map and the table frame line region segmentation map to obtain a fusion cell region segmentation map and obtain a fusion cell region segmentation map;

Example 3:

the present embodiment provides a computer device, which may be a server, a computer, or the like, and includes a processor, a memory, an input device, a display, and a network interface connected by a system bus, where the processor is configured to provide computing and control capabilities, the memory includes a nonvolatile storage medium and an internal memory, the nonvolatile storage medium stores an operating system, a computer program, and a database, the internal memory provides an environment for the operating system and the computer program in the nonvolatile storage medium to run, and when the processor executes the computer program stored in the memory, the method for identifying a table structure of the foregoing embodiment 1 is implemented, as follows:

performing semantic segmentation on a cell area, a table frame line area and a table area of an image to be recognized by using a pre-trained semantic segmentation model to obtain a cell area segmentation graph, a table frame line area segmentation graph and a table area segmentation graph;

fusing the cell area segmentation graph and the table frame line area segmentation graph to obtain a fused cell area segmentation graph;

carrying out control point extraction on the table area segmentation graph to obtain correction transformation;

correcting the fusion cell segmentation graph by using correction transformation to obtain an alignment cell region segmentation graph;

performing connected domain extraction analysis on the aligned cell region segmentation graph, and obtaining table structure information according to matching conditions;

example 4:

the present embodiment provides a storage medium, which is a computer-readable storage medium, and stores a computer program, and when the program is executed by a processor, and the processor executes the computer program stored in the memory, the method for identifying a table structure of the foregoing embodiment 1 is implemented as follows:

fusing the cell area segmentation graph and the table frame line area segmentation graph to obtain a fused cell area segmentation graph and obtain a fused cell area segmentation graph;

the storage medium in this embodiment may be a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a Random Access Memory (RAM), a usb disk, a removable hard disk, or other media.

In conclusion, the pre-trained semantic segmentation model is used for performing semantic segmentation on the cell area, the table frame line area and the table area of the image to be recognized at the same time, the segmentation maps of the cell area and the table frame line area are fused to obtain a fused cell area segmentation map, and the segmentation maps of the cell area and the table frame line area make up for the deficiencies of each other to improve the accuracy of table structure recognition; control point extraction is carried out on the table region segmentation graph to obtain correction transformation, and structure prior is recovered, so that the next processing is facilitated, and the table structure identification accuracy is improved; and correcting the fusion cell segmentation graph by using correction transformation to obtain an alignment cell region segmentation graph, performing connected domain extraction analysis on the alignment cell region segmentation graph, and obtaining table structure information according to matching conditions. The method and the device can solve the influence caused by the deformation of the table structure in the table structure recognition scene, and improve the accuracy rate of table structure recognition.

The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims

1. A method for identifying a table structure, the method comprising the steps of:

2. The method of claim 1, wherein the pre-trained semantic segmentation model is a neural network depeplab v3+, and the step S1 specifically includes:

3. The method for identifying a table structure according to claim 1, wherein the step S2 specifically includes:

4. The method for identifying a table structure according to claim 1, wherein the step S3 specifically includes:

s31, extracting four vertexes of a circumscribed quadrangle of the table area segmentation graph by using a polygon fitting algorithm;

s32, determining the center point and the width and the height of the table area according to the four vertexes, searching a middle control point according to a preset proportion, and enabling the middle control point and the four vertexes to form a first control point set;

s33, generating a second control point set according to the table area width and height and the preset proportion;

and S34, generating the correction transformation by using a thin plate spline interpolation algorithm according to the first control point and the second control point.

5. The method for identifying a table structure according to claim 1, wherein the step S5 specifically includes:

s51, carrying out image connected domain analysis on the alignment unit cell region segmentation graph by using a contour-based marking algorithm to obtain a plurality of connected regions;

S53, filtering the mark frames by using preset adaptive conditions, and removing the mark frames meeting the preset adaptive conditions;

s54, performing transverse matching and longitudinal matching on the mark frame, and respectively generating a series of matching lists by the transverse matching and the longitudinal matching;

s55, removing the duplication of the matching list, and counting the occurrence times of each marking frame in the transverse and longitudinal matching lists to obtain the number of cross columns and cross rows of each marking frame;

6. The table structure recognition method of claim 5, wherein the preset adaptive conditions used in step S53 are:

1) mark frame outside the range of the table area division graph;

7. The table structure identifying method as claimed in claim 6, wherein the predetermined threshold is calculated by the formula:

threshold＝W*H/N/20

8. A form structure recognition system, the system comprising:

9. A computer device comprising a processor and a memory for storing a program executable by the processor, wherein the processor, when executing the program stored in the memory, implements a table structure recognition method as claimed in any one of claims 1 to 7.

10. A storage medium storing a program, wherein the program, when executed by a processor, implements a table structure recognition method according to any one of claims 1 to 7.