CN110362620B - Table data structuring method based on machine learning - Google Patents
Table data structuring method based on machine learning Download PDFInfo
- Publication number
- CN110362620B CN110362620B CN201910623601.0A CN201910623601A CN110362620B CN 110362620 B CN110362620 B CN 110362620B CN 201910623601 A CN201910623601 A CN 201910623601A CN 110362620 B CN110362620 B CN 110362620B
- Authority
- CN
- China
- Prior art keywords
- score
- row
- processed
- column
- cell
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/258—Data format conversion from or to a database
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/374—Thesaurus
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to a table data structuring method based on machine learning, which is used for counting the number of objects in each cell in a large number of sample electronic tables to form a dictionary table, obtaining the score of each cell in the electronic table to be processed by combining the occurrence frequency of the objects in each cell in the electronic table to be processed and the number of the objects in the dictionary table, and taking the score of each cell as a minimum unit, realizing the acquisition of a header row or a header column in the electronic table to be processed by comparing rows and columns, thereby obtaining each header item, further extracting and structuring data items based on each header item, solving the defects that the prior art only recognizes a transverse header and cannot recognize a plurality of headers by depending on rules, and accurately and efficiently realizing the data structuring processing of the electronic table.
Description
Technical Field
The invention relates to a table data structuring method based on machine learning, and belongs to the technical field of table data structuring.
Background
The spreadsheet is the most commonly used computer software tool, and in the prior art, for a Sheet (spreadsheet) with unknown content, the data items of each cell can only be read after a file is opened, and the steps are as follows:
(1) opening an Excel file by using an interface;
(2) reading the Sheet in the Excel file by using an interface;
(3) the interface is used to read the cells in Sheet.
In the execution process of the method, the meaning of each data item is unknown, so that the data cannot be structured. Because the meaning of the data item is described by the header of the table, the data cannot be understood without knowing the header of the table. Therefore, in order to complete the structuring of the table data, some jobs use an assumption that the header of the table exists in the head row of the table, and based on this assumption, the header can be extracted and then the data can be extracted, so as to complete the structuring of the table data, and the execution steps are as follows:
(1) opening an Excel file by using an interface;
(2) reading the Sheet in the Excel file by using an interface;
(3) reading a first row of cells in the Sheet by using an interface to serve as a header;
(4) and reading the data corresponding to each header according to the columns to complete data structuring.
This assumption has obvious defects, the extracted header is only a horizontal header, and the header must be in the head row, and there are cases of misjudgment in the cases of a table with a vertical header, a header in a non-head row of the table, and a plurality of rows of headers in one table. Therefore, some work optimizes the operation based on prior knowledge, and solves the problem that the header is not in the first line, and the steps are as follows:
(1) opening an Excel file by using an interface;
(2) reading the Sheet in the Excel file by using an interface;
(3) sequentially reading the data of each row and each column in the Sheet by using an interface until the data with knowledge is met (through rule matching, such as a mobile phone number, an identity card, a bank card and the like), sequentially searching a first row which does not accord with the rule from the row and the column, and using the row as a header;
(4) and reading the data corresponding to each header according to the columns to complete data structuring.
This method also has a problem that erroneous judgment occurs even when there are a plurality of vertical headers and one header, and the header cannot be recognized for a table without recognition data.
Disclosure of Invention
The technical problem to be solved by the invention is to provide a table data structuring method based on machine learning, which can accurately identify the table head items in the electronic table and efficiently complete the structuring of the data items in the electronic table based on each table head item.
The invention adopts the following technical scheme for solving the technical problems: the invention designs a table data structuring method based on machine learning, which is used for structuring data items in an electronic table to be processed and is characterized by comprising the following steps:
a, counting the number of objects in a preset number sample spreadsheet and in each cell, respectively obtaining each object and the number corresponding to the object, constructing a dictionary table, and entering step B;
b, counting the times count of the objects in the cells appearing in the spreadsheet to be processed respectively aiming at each cell in the spreadsheet to be processed, and then entering the step C;
step C, respectively aiming at each cell in the electronic form to be processed, obtaining the number c of the objects in the cell corresponding to the dictionary table, wherein if the dictionary table does not have the objects in the cell in the electronic form to be processed, the number of the objects in the cell in the electronic form to be processed corresponding to the dictionary table is 0, and then entering the step D;
and D, aiming at each cell in the spreadsheet to be processed respectively, according to the following formula:
obtaining a score corresponding to the cell, and then entering the step E;
step E, respectively aiming at each row in the spreadsheet to be processed, obtaining the sum of scores score corresponding to each cell in the row as the score corresponding to the row;
meanwhile, respectively aiming at each column in the electronic table to be processed, obtaining the sum of scores score corresponding to each cell in the column as the score corresponding to the column;
respectively corresponding scores of each row and each column in the electronic form to be processed are obtained, and then the step F is carried out;
f, clustering all rows in the electronic form to be processed according to the scores respectively corresponding to all rows in the electronic form to be processed, respectively obtaining the average value of the scores corresponding to all rows in all row clusters and all rows, taking the average value as the score respectively corresponding to all row clusters, and selecting the row cluster with the highest score as the row cluster to be selected;
meanwhile, according to the scores respectively corresponding to all columns in the electronic table to be processed, performing column clustering on all columns in the electronic table to be processed, respectively obtaining the average value of the scores corresponding to all columns in all column clusters and all columns, taking the average value as the score respectively corresponding to all column clusters, and selecting the column cluster with the highest score as the cluster of the columns to be selected;
then entering step G;
g, aiming at each row in the row cluster to be selected, selecting the row with the highest score, and obtaining the average score of each non-empty cell in the row according to the score of the row to be selected as the row cell average score;
meanwhile, aiming at each column in the cluster of the columns to be selected, selecting the column with the highest score, and obtaining the average score of each non-empty cell in the column according to the score of the column to be used as the column cell average score;
then entering step H;
step H, if the average score of the row cells is greater than the average score of the column cells, each row in the cluster of the row to be selected is each header row in the electronic table to be processed, each header item is obtained, and the step J is carried out;
if the average score of the row cells is smaller than the average score of the column cells, each column in the cluster of the columns to be selected is each header column in the electronic table to be processed, each header item is obtained, and the step J is carried out;
and J, reading each data item in the electronic form to be processed according to each header item in the electronic form to be processed, and structuring form data.
As a preferred technical scheme of the invention: in the step A, after the dictionary table is constructed and obtained, the following steps I to II are adopted, the dictionary table is updated, and then the step B is carried out;
step I, acquiring maximum quantity values of the quantity corresponding to each object in the dictionary table, and entering step II;
step II, respectively executing the following steps II-1 to II-2 aiming at each object in the dictionary table, updating the number corresponding to the object, and further updating the dictionary table;
II-1, judging whether the object belongs to a preset header item set, if so, setting the number corresponding to the object as the maximum number value, otherwise, entering a step II-2;
and II-2, judging whether the object belongs to a preset data item set, if so, setting the quantity corresponding to the object to be 0, otherwise, not modifying the quantity corresponding to the object.
As a preferred technical scheme of the invention: in the step F, according to the scores respectively corresponding to the rows in the electronic form to be processed, clustering is carried out on the rows in the electronic form to be processed according to the following steps Fa-1 to Fa-3;
step Fa-1, acquiring the minimum row score and the maximum row score in the scores respectively corresponding to each row in the spreadsheet to be processed, and entering the step Fa-2;
step Fa-2, aiming at the span from the minimum row score to the maximum row score, dividing according to the preset row score grades to obtain all row score intervals, and then entering the step Fa-3;
step Fa-3, dividing each row in the electronic form to be processed into each row score interval according to the corresponding score of each row in the electronic form to be processed, wherein each row score interval having the electronic form row to be processed is a row cluster;
meanwhile, according to the scores respectively corresponding to all columns in the electronic table to be processed, carrying out column clustering on all columns in the electronic table to be processed according to the following steps Fb-1 to Fb-3;
step Fb-1, acquiring the minimum column score and the maximum column score in the scores respectively corresponding to each column in the electronic table to be processed, and entering the step Fb-2;
step Fb-2, aiming at the span from the minimum column score to the maximum column score, performing rank division according to preset column score grades to obtain each column score interval, and then entering the step Fb-3;
and step Fb-3, dividing each column in the electronic table to be processed into each column score interval according to the corresponding score of each column in the electronic table to be processed, wherein each column score interval of the electronic table to be processed is owned, namely each column cluster is obtained.
Compared with the prior art, the table data structuring method based on machine learning has the following technical effects:
the invention designs a table data structuring method based on machine learning, which is used for counting the number of objects in each cell in a large number of sample electronic tables to form a dictionary table, obtaining the score of each cell in the electronic table to be processed by combining the occurrence frequency of the objects in each cell in the electronic table to be processed and the number of the objects in the dictionary table corresponding to the objects, taking the score of each cell as a minimum unit, and realizing the acquisition of a header row or a header column in the electronic table to be processed by comparing rows and columns, thereby obtaining each header item, further extracting and structuring data items based on each header item, solving the defects that the prior art only recognizes a transverse header and cannot recognize a plurality of headers by depending on rules, and accurately and efficiently realizing the data structuring processing of the electronic table.
Drawings
FIG. 1 is a schematic diagram of the present invention for designing a table data structuring method based on machine learning.
Detailed Description
The following description will explain embodiments of the present invention in further detail with reference to the accompanying drawings.
The invention designs a table data structuring method based on machine learning, which is used for carrying out structuring processing on data items in an electronic table to be processed and executing the following steps A to J in specific practical application.
And step A, counting the number of the objects in each cell in a preset number sample spreadsheet, respectively obtaining each object and the number corresponding to the object, constructing a dictionary table, updating the dictionary table by adopting the following steps I to II, and entering the step B.
And step I, acquiring the maximum quantity value of the quantity corresponding to each object in the dictionary table, and then entering the step II.
And step II, respectively executing the following steps II-1 to II-2 aiming at each object in the dictionary table, updating the number corresponding to the object, and further updating the dictionary table.
II-1, judging whether the object belongs to a preset header item set, if so, setting the number corresponding to the object as the maximum number value, otherwise, entering a step II-2;
and II-2, judging whether the object belongs to a preset data item set, if so, setting the quantity corresponding to the object to be 0, otherwise, not modifying the quantity corresponding to the object.
And B, counting the times count of the objects in the cells in the spreadsheet to be processed respectively aiming at each cell in the spreadsheet to be processed, and then entering the step C.
And C, respectively aiming at each cell in the electronic form to be processed, obtaining the number c of the objects in the cell corresponding to the dictionary table, wherein if the dictionary table does not have the objects in the cell in the electronic form to be processed, the number of the objects in the cell in the electronic form to be processed corresponding to the dictionary table is 0, and then entering the step D.
And D, aiming at each cell in the spreadsheet to be processed respectively, according to the following formula:
and E, obtaining the score corresponding to the cell, and then entering the step E.
Step E, respectively aiming at each row in the spreadsheet to be processed, obtaining the sum of scores score corresponding to each cell in the row as the score corresponding to the row;
meanwhile, respectively aiming at each column in the electronic table to be processed, obtaining the sum of scores score corresponding to each cell in the column as the score corresponding to the column;
and F, respectively obtaining the scores corresponding to each row and each column in the electronic table to be processed.
And F, according to the scores respectively corresponding to all the rows in the electronic form to be processed, clustering all the rows in the electronic form to be processed according to the following steps Fa-1 to Fa-3, respectively obtaining the average value of the scores corresponding to all the row clusters and all the rows, taking the average value as the score respectively corresponding to all the row clusters, and selecting the row cluster with the highest score as the cluster of the row to be selected.
Step Fa-1, acquiring the minimum row score and the maximum row score in the scores respectively corresponding to each row in the spreadsheet to be processed, and entering the step Fa-2;
step Fa-2, aiming at the span from the minimum row score to the maximum row score, dividing according to the preset row score grades to obtain all row score intervals, and then entering the step Fa-3;
and Fa-3, dividing each line in the electronic form to be processed into line score intervals according to the corresponding score of each line in the electronic form to be processed, wherein each line score interval having the electronic form line to be processed is the line cluster.
Meanwhile, according to the scores respectively corresponding to all columns in the electronic form to be processed, performing column clustering on all columns in the electronic form to be processed according to the following steps Fb-1 to Fb-3, respectively obtaining the average value of the scores corresponding to all columns in all column clusters and all columns, taking the average value as the score respectively corresponding to all column clusters, and selecting the column cluster with the highest score as the cluster of the columns to be selected.
Step Fb-1, acquiring the minimum column score and the maximum column score in the scores respectively corresponding to each column in the electronic table to be processed, and entering the step Fb-2;
step Fb-2, aiming at the span from the minimum column score to the maximum column score, performing rank division according to preset column score grades to obtain each column score interval, and then entering the step Fb-3;
and step Fb-3, dividing each column in the electronic table to be processed into each column score interval according to the corresponding score of each column in the electronic table to be processed, wherein each column score interval of the electronic table to be processed is owned, namely each column cluster is obtained.
And G, after the clusters of the rows to be selected and the clusters of the columns to be selected are obtained.
G, aiming at each row in the row cluster to be selected, selecting the row with the highest score, and obtaining the average score of each non-empty cell in the row according to the score of the row to be selected as the row cell average score;
meanwhile, aiming at each column in the cluster of the columns to be selected, selecting the column with the highest score, and obtaining the average score of each non-empty cell in the column according to the score of the column to be used as the column cell average score;
then step H is entered.
Step H, if the average score of the row cells is greater than the average score of the column cells, each row in the cluster of the row to be selected is each header row in the electronic table to be processed, each header item is obtained, and the step J is carried out;
if the average score of the row cells is smaller than the average score of the column cells, each column in the cluster of the columns to be selected is each header column in the electronic table to be processed, each header item is obtained, and the step J is carried out;
and J, reading each data item in the electronic form to be processed according to each header item in the electronic form to be processed, and structuring form data.
The table data structuring method based on machine learning is designed by the technical scheme, the quantity statistics is carried out on the objects in each cell in a large number of sample electronic tables to form a dictionary table, the score of each cell in the electronic table to be processed is obtained by combining the occurrence frequency of the objects in each cell in the electronic table to be processed and the quantity of the objects in the dictionary table corresponding to the objects, the score of each cell is taken as the minimum unit, the acquisition of a header row or a header column in the electronic table to be processed is realized by comparing the row and the column, each header item is obtained, and then the extraction and the structuring of the data items are carried out based on each header item, so that the defects that the data structuring of the electronic table is accurately and efficiently realized by depending on rules, only horizontal headers are recognized and a plurality of headers cannot be recognized in the prior art are overcome.
The embodiments of the present invention have been described in detail with reference to the drawings, but the present invention is not limited to the above embodiments, and various changes can be made within the knowledge of those skilled in the art without departing from the gist of the present invention.
Claims (3)
1. A table data structuring method based on machine learning is used for carrying out structuring processing on data items in an electronic table to be processed, and is characterized by comprising the following steps:
a, counting the number of objects in a preset number sample spreadsheet and in each cell, respectively obtaining each object and the number corresponding to the object, constructing a dictionary table, and entering step B;
b, counting the times count of the objects in the cells appearing in the spreadsheet to be processed respectively aiming at each cell in the spreadsheet to be processed, and then entering the step C;
step C, respectively aiming at each cell in the electronic form to be processed, obtaining the number c of the objects in the cell corresponding to the dictionary table, wherein if the dictionary table does not have the objects in the cell in the electronic form to be processed, the number of the objects in the cell in the electronic form to be processed corresponding to the dictionary table is 0, and then entering the step D;
and D, aiming at each cell in the spreadsheet to be processed respectively, according to the following formula:
obtaining a score corresponding to the cell, and then entering the step E;
step E, respectively aiming at each row in the spreadsheet to be processed, obtaining the sum of scores score corresponding to each cell in the row as the score corresponding to the row;
meanwhile, respectively aiming at each column in the electronic table to be processed, obtaining the sum of scores score corresponding to each cell in the column as the score corresponding to the column;
respectively corresponding scores of each row and each column in the electronic form to be processed are obtained, and then the step F is carried out;
f, clustering all rows in the electronic form to be processed according to the scores respectively corresponding to all rows in the electronic form to be processed, respectively obtaining the average value of the scores corresponding to all rows in all row clusters and all rows, taking the average value as the score respectively corresponding to all row clusters, and selecting the row cluster with the highest score as the row cluster to be selected;
meanwhile, according to the scores respectively corresponding to all columns in the electronic table to be processed, performing column clustering on all columns in the electronic table to be processed, respectively obtaining the average value of the scores corresponding to all columns in all column clusters and all columns, taking the average value as the score respectively corresponding to all column clusters, and selecting the column cluster with the highest score as the cluster of the columns to be selected;
then entering step G;
g, aiming at each row in the row cluster to be selected, selecting the row with the highest score, and obtaining the average score of each non-empty cell in the row according to the score of the row to be selected as the row cell average score;
meanwhile, aiming at each column in the cluster of the columns to be selected, selecting the column with the highest score, and obtaining the average score of each non-empty cell in the column according to the score of the column to be used as the column cell average score;
then entering step H;
step H, if the average score of the row cells is greater than the average score of the column cells, each row in the cluster of the row to be selected is each header row in the electronic table to be processed, each header item is obtained, and the step J is carried out;
if the average score of the row cells is smaller than the average score of the column cells, each column in the cluster of the columns to be selected is each header column in the electronic table to be processed, each header item is obtained, and the step J is carried out;
and J, reading each data item in the electronic form to be processed according to each header item in the electronic form to be processed, and structuring form data.
2. The table data structuring method based on machine learning according to claim 1, characterized in that: in the step A, after the dictionary table is constructed and obtained, the following steps I to II are adopted, the dictionary table is updated, and then the step B is carried out;
step I, acquiring maximum quantity values of the quantity corresponding to each object in the dictionary table, and entering step II;
step II, respectively executing the following steps II-1 to II-2 aiming at each object in the dictionary table, updating the number corresponding to the object, and further updating the dictionary table;
II-1, judging whether the object belongs to a preset header item set, if so, setting the number corresponding to the object as the maximum number value, otherwise, entering a step II-2;
and II-2, judging whether the object belongs to a preset data item set, if so, setting the quantity corresponding to the object to be 0, otherwise, not modifying the quantity corresponding to the object.
3. The table data structuring method based on machine learning according to claim 1, characterized in that: in the step F, according to the scores respectively corresponding to the rows in the electronic form to be processed, clustering is carried out on the rows in the electronic form to be processed according to the following steps Fa-1 to Fa-3;
step Fa-1, acquiring the minimum row score and the maximum row score in the scores respectively corresponding to each row in the spreadsheet to be processed, and entering the step Fa-2;
step Fa-2, aiming at the span from the minimum row score to the maximum row score, dividing according to the preset row score grades to obtain all row score intervals, and then entering the step Fa-3;
step Fa-3, dividing each row in the electronic form to be processed into each row score interval according to the corresponding score of each row in the electronic form to be processed, wherein each row score interval having the electronic form row to be processed is a row cluster;
meanwhile, according to the scores respectively corresponding to all columns in the electronic table to be processed, carrying out column clustering on all columns in the electronic table to be processed according to the following steps Fb-1 to Fb-3;
step Fb-1, acquiring the minimum column score and the maximum column score in the scores respectively corresponding to each column in the electronic table to be processed, and entering the step Fb-2;
step Fb-2, aiming at the span from the minimum column score to the maximum column score, performing rank division according to preset column score grades to obtain each column score interval, and then entering the step Fb-3;
and step Fb-3, dividing each column in the electronic table to be processed into each column score interval according to the corresponding score of each column in the electronic table to be processed, wherein each column score interval of the electronic table to be processed is owned, namely each column cluster is obtained.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910623601.0A CN110362620B (en) | 2019-07-11 | 2019-07-11 | Table data structuring method based on machine learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910623601.0A CN110362620B (en) | 2019-07-11 | 2019-07-11 | Table data structuring method based on machine learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110362620A CN110362620A (en) | 2019-10-22 |
CN110362620B true CN110362620B (en) | 2021-04-06 |
Family
ID=68218702
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910623601.0A Active CN110362620B (en) | 2019-07-11 | 2019-07-11 | Table data structuring method based on machine learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110362620B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US12038982B2 (en) | 2021-10-08 | 2024-07-16 | Beijing Baidu Netcom Science Technology Co., Ltd. | Method of extracting table information, electronic device, and storage medium |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111523420B (en) * | 2020-04-14 | 2023-07-07 | 南京烽火星空通信发展有限公司 | Header classification and header column semantic recognition method based on multi-task deep neural network |
CN113010503A (en) * | 2021-03-01 | 2021-06-22 | 广州智筑信息技术有限公司 | Engineering cost data intelligent analysis method and system based on deep learning |
CN113901214B (en) * | 2021-10-08 | 2023-11-17 | 北京百度网讯科技有限公司 | Method and device for extracting form information, electronic equipment and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1741020A (en) * | 2005-09-29 | 2006-03-01 | 北京勤哲软件技术有限责任公司 | Method for storing electronic table unit lattice content with relational data base |
CN102799574A (en) * | 2012-06-29 | 2012-11-28 | 无锡永中软件有限公司 | Data partitioning and merging method for electronic forms |
CN106156239A (en) * | 2015-04-27 | 2016-11-23 | ***通信集团公司 | A kind of form abstracting method and device |
CN108009264A (en) * | 2017-12-14 | 2018-05-08 | 北京航天测控技术有限公司 | A kind of comparative approach of versions of data for Excel format files |
CN109522452A (en) * | 2018-11-13 | 2019-03-26 | 南京烽火星空通信发展有限公司 | A kind of processing method of magnanimity semi-structured data |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040107205A1 (en) * | 2002-12-03 | 2004-06-03 | Lockheed Martin Corporation | Boolean rule-based system for clustering similar records |
-
2019
- 2019-07-11 CN CN201910623601.0A patent/CN110362620B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1741020A (en) * | 2005-09-29 | 2006-03-01 | 北京勤哲软件技术有限责任公司 | Method for storing electronic table unit lattice content with relational data base |
CN102799574A (en) * | 2012-06-29 | 2012-11-28 | 无锡永中软件有限公司 | Data partitioning and merging method for electronic forms |
CN106156239A (en) * | 2015-04-27 | 2016-11-23 | ***通信集团公司 | A kind of form abstracting method and device |
CN108009264A (en) * | 2017-12-14 | 2018-05-08 | 北京航天测控技术有限公司 | A kind of comparative approach of versions of data for Excel format files |
CN109522452A (en) * | 2018-11-13 | 2019-03-26 | 南京烽火星空通信发展有限公司 | A kind of processing method of magnanimity semi-structured data |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US12038982B2 (en) | 2021-10-08 | 2024-07-16 | Beijing Baidu Netcom Science Technology Co., Ltd. | Method of extracting table information, electronic device, and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN110362620A (en) | 2019-10-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110362620B (en) | Table data structuring method based on machine learning | |
US9785830B2 (en) | Methods for automatic structured extraction of data in OCR documents having tabular data | |
CN109101477B (en) | Enterprise field classification and enterprise keyword screening method | |
CN105260751B (en) | A kind of character recognition method and its system | |
CN105261109A (en) | Identification method of prefix letter of banknote | |
CN110134777B (en) | Question duplication eliminating method and device, electronic equipment and computer readable storage medium | |
CN107704539A (en) | The method and device of extensive text message batch structuring | |
CN100390815C (en) | Template optimized character recognition method and system | |
CN107526721B (en) | Ambiguity elimination method and device for comment vocabularies of e-commerce products | |
CN112651323B (en) | Chinese handwriting recognition method and system based on text line detection | |
CN100501764C (en) | Character recognition system and method | |
CN112395881B (en) | Material label construction method and device, readable storage medium and electronic equipment | |
CN111340020A (en) | Formula identification method, device, equipment and storage medium | |
CN112016481A (en) | Financial statement information detection and identification method based on OCR | |
CN110287493B (en) | Risk phrase identification method and device, electronic equipment and storage medium | |
CN102221976A (en) | Method for quickly inputting words based on incomplete identification | |
CN112084308A (en) | Method, system and storage medium for text type data recognition | |
CN113807158A (en) | PDF content extraction method, device and equipment | |
Joseph et al. | Feature extraction and classification techniques of MODI script character recognition | |
CN111340032A (en) | Character recognition method based on application scene in financial field | |
CN109472020B (en) | Feature alignment Chinese word segmentation method | |
CN1084502C (en) | Method and device for recognition of similar writing | |
CN114511857A (en) | OCR recognition result processing method, device, equipment and storage medium | |
CN113723501A (en) | Maximum diversity clustering construction method of pathogenic microorganism reference knowledge base | |
CN113361666A (en) | Handwritten character recognition method, system and medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |