CN109359108B - Report extraction method and device, storage medium and electronic equipment - Google Patents

Report extraction method and device, storage medium and electronic equipment Download PDF

Info

Publication number
CN109359108B
CN109359108B CN201810918482.7A CN201810918482A CN109359108B CN 109359108 B CN109359108 B CN 109359108B CN 201810918482 A CN201810918482 A CN 201810918482A CN 109359108 B CN109359108 B CN 109359108B
Authority
CN
China
Prior art keywords
report
area
target
data type
target report
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810918482.7A
Other languages
Chinese (zh)
Other versions
CN109359108A (en
Inventor
吴擒龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Neusoft Corp
Original Assignee
Neusoft Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Neusoft Corp filed Critical Neusoft Corp
Priority to CN201810918482.7A priority Critical patent/CN109359108B/en
Publication of CN109359108A publication Critical patent/CN109359108A/en
Application granted granted Critical
Publication of CN109359108B publication Critical patent/CN109359108B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The disclosure relates to a report extraction method, a report extraction device, a storage medium and an electronic device, wherein the method comprises the following steps: marking a target report according to preset databases of different data types to obtain data type identifications of each area and each area of the target report, wherein the target report is any report in a report set to be extracted; comparing the relative position of the areas of the target report with different data type identifications to obtain a comparison result; and extracting the report matched with the regional relative position relation of the target report from the report set according to the comparison result. Through the technical scheme disclosed by the invention, compared with the prior art that the reports are classified and extracted by manually marking different data type areas of each report, the efficiency and the accuracy can be improved.

Description

Report extraction method and device, storage medium and electronic equipment
Technical Field
The present disclosure relates to the field of data processing technologies, and in particular, to a report extraction method and apparatus, a storage medium, and an electronic device.
Background
The report forms are tables, icons and other formats used for dynamically displaying data, and the report forms can be used for carrying out preliminary statistics on events and vividly displaying relevant information of the events. A large number of reports are usually needed in daily work, and each report usually has different categories and is difficult to search uniformly.
In the prior art, in order to facilitate uniform retrieval of all reports, it is usually necessary to manually mark areas of different data types, such as the subject, statistical index, unit, dimension, and the like of each report, and classify and extract the reports according to the areas of different data types of each report.
Disclosure of Invention
In order to overcome the problems in the prior art, the present disclosure provides a report extraction method, apparatus, storage medium and electronic device.
In order to achieve the above object, the present disclosure provides a report extraction method, including:
marking a target report according to preset databases of different data types to obtain data type identifications of each area and each area of the target report, wherein the target report is any report in a report set to be extracted;
comparing the relative position of the areas of the target report with different data type identifications to obtain a comparison result;
and extracting the report matched with the regional relative position relation of the target report from the report set according to the comparison result.
Optionally, the marking a target report according to preset databases of different data types to obtain each area of the target report and a data type identifier of each area includes:
according to preset databases with different data types, aiming at each area of the target report, inquiring the database to which the elements of the area belong, and taking the preset data type of the database as the data type identifier of the area.
Optionally, the comparing the relative positions of the areas of the target report with different data type identifiers with respect to other reports in the report set with respect to the target report with reference to the relative position relationship between the areas of the target report with different data type identifiers includes:
and sequentially dividing the areas of any other report in the report set according to the relative position relation of the areas of the target report, and judging whether the areas at the same positions of the areas and the target report belong to the same data type after each area is obtained through division.
Optionally, the method further comprises:
if the data types of the area obtained by newly dividing any report form are different from the data types of the areas at the same position in the target report form, stopping dividing the area of any report form;
if the data type of the area obtained by newly dividing any report is the same as the data type of the area at the same position in the target report, continuously dividing the next area for any report;
the extracting of the report matched with the regional relative position relationship of the target report from the report set according to the comparison result comprises:
and extracting the report with the same relative position relation with the target report area from the report set according to the comparison result.
Optionally, the comparing the relative positions of the areas of the target report with different data type identifiers with respect to other reports in the report set with respect to the target report with reference to the relative position relationship between the areas of the target report with different data type identifiers includes:
according to the regional relative position relation of the target report, performing regional division on any other report in the report set;
after the area of any report is obtained, comparing whether the data type of each area in any report is the same as the data type of the area at the same position in the target report to obtain a comparison result;
the extracting of the report matched with the regional relative position relationship of the target report from the report set according to the comparison result includes:
determining the matching degree of the other reports and the target report according to the comparison result;
and extracting the report of which the matching degree with the target report reaches a threshold value in the report set.
Optionally, after the target report is marked according to the preset databases of different data types, the method further includes:
for each region, inquiring whether an element which is not in a database of the data type of the region exists in the region;
if the region has an element that does not exist in the database, the database is updated with the element.
The present disclosure also provides a report extraction device, including:
the system comprises a marking module, a data extraction module and a data extraction module, wherein the marking module is configured to mark a target report according to preset databases with different data types to obtain data type identifications of all areas and each area of the target report, and the target report is any report in a report set to be extracted;
the comparison module is configured to compare the relative positions of the areas of other reports in the report set with the target report by taking the relative position relationship between the areas with different data type identifications in the target report as a reference to obtain a comparison result;
and the extraction module is configured to extract the report matched with the regional relative position relation of the target report from the report set according to the comparison result.
Optionally, the marking module comprises:
and the marking sub-module is configured to query a database to which the elements of the region belong according to preset databases of different data types aiming at each region of the target report, and take the preset data type of the database as the data type identifier of the region.
Optionally, the alignment module comprises:
and the first comparison sub-module is configured to sequentially divide the areas of any other report in the report set according to the relative position relationship of the areas of the target report, and judge whether the areas at the same positions of the areas and the target report belong to the same data type after each division to obtain one area.
Optionally, the apparatus further comprises:
the first dividing module is configured to stop dividing any report into areas when the data types of the areas obtained by newly dividing any report are different from the areas at the same positions in the target report;
the second division module is configured to continue to divide the next area for any report when the data type of the area obtained by newly dividing any report is the same as the data type of the area at the same position in the target report;
the extraction module comprises:
and the first extraction submodule is configured to extract a report form with the same relative position relation with the target report form area from the report form set according to the comparison result.
Optionally, the alignment module comprises:
the third division submodule is configured to perform area division on any other report in the report set according to the area relative position relation of the target report;
the second comparison submodule is configured to compare whether the data type of each area in any report is the same as the data type of the area at the same position in the target report after the area of any report is obtained, and obtain a comparison result;
the extraction module comprises:
the matching degree determining sub-module is configured to determine the matching degree of the other reports and the target report according to the comparison result;
and the second extraction sub-module is configured to extract the report of which the matching degree with the target report reaches a threshold value in the report set.
Optionally, the apparatus further comprises:
the query module is configured to query whether an element which does not exist in the database of the data type of the area exists in the area or not according to each area after the target report is marked according to the preset databases of different data types;
an update module configured to update the database with an element that is not present in the database when the area has the element.
The present disclosure also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the report extraction method provided by the present disclosure.
The present disclosure also provides an electronic device, comprising:
a memory having a computer program stored thereon;
a processor for executing the computer program in the memory to implement the steps of the report extraction method provided by the present disclosure.
According to the technical scheme, the target report is marked according to the preset databases with different data types, the data type identifications of all the areas and each area of the target report are obtained, the relative position relation between all the areas with different data type identifications in the target report is used as a reference, the relative position of the areas of other reports in the report set is compared with that of the target report, and the report which is matched with the areas of the target report in the relative position relation is extracted from the report set according to the comparison result. Therefore, automatic area division of the reports based on the database is achieved, and classification extraction of the reports is further achieved.
Additional features and advantages of the disclosure will be set forth in the detailed description which follows.
Drawings
The accompanying drawings, which are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure without limiting the disclosure. In the drawings:
FIG. 1 is a flow chart illustrating a report extraction method according to an exemplary embodiment of the present disclosure;
FIG. 2 is a schematic diagram illustrating a tagged target report according to an exemplary embodiment of the present disclosure;
FIG. 3 is a schematic diagram illustrating a tagged target report according to another exemplary embodiment of the present disclosure;
FIG. 4 is a schematic diagram illustrating a report set in which any report is divided according to an exemplary embodiment of the present disclosure;
FIG. 5 is a diagram illustrating a report matching the regional relative position relationship of the target report shown in FIG. 2, according to an illustrative embodiment of the present disclosure;
FIG. 6 is a diagram illustrating a report matching the regional relative position relationship of the target report shown in FIG. 3, according to an illustrative embodiment of the present disclosure;
FIG. 7 is a flowchart illustrating a report extraction method according to another exemplary embodiment of the present disclosure;
FIG. 8 is a block diagram illustrating a report extraction apparatus according to an exemplary embodiment of the present disclosure;
FIG. 9 is a block diagram illustrating a report extraction apparatus according to another exemplary embodiment of the present disclosure;
fig. 10 is a block diagram illustrating an electronic device according to an exemplary embodiment of the present disclosure.
Detailed Description
The following detailed description of specific embodiments of the present disclosure is provided in connection with the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the present disclosure, are given by way of illustration and explanation only, not limitation.
It is noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.
In addition, in the present disclosure, the use of directional terms such as "upper, lower, left, right" generally means upper, lower, left, right as labeled in the corresponding drawings, unless otherwise specified.
The report is generally divided into a title, a subtitle, a header, a data area and other parts, and each part can be divided into different areas according to different data types of each element, wherein the data types can include a theme, a unit, a statistical index, a dimension and the like.
In order to make the technical solution of the embodiments of the present disclosure easier to understand for those skilled in the art, the related terms of the report are briefly introduced.
A data area: the continuous value area is usually located in the middle of the report.
A meter head: the profile of an event reflects that it is typically located at the left and upper side of the data area and comprises at least one row and/or column of elements.
Title: typically at the uppermost position of the report.
Subtitle: typically between the header and the header.
Subject matter: the summary describes the contents of the report, typically from the title of the report.
Unit: the standard quantity used for comparison when measuring a certain physical quantity, together with the statistical index, describes the meaning of a certain data, such as meta, ten thousand meta, family, head, etc., which is usually from the header and/or subheader of the report.
And (3) statistical indexes are as follows: it refers to a quantitative measure of the units or methods for measuring the development degree of things, such as area, population, density, yield, etc., and a statistical index corresponds to a unit, which is usually from the head of a report.
Dimension: refers to some characteristic of the data's attributes or things or phenomena, such as gender, region, time, etc., which is typically derived from the header of the report.
Fig. 1 is a flowchart illustrating a report extraction method according to an exemplary embodiment of the present disclosure, as shown in fig. 1, the method includes the following steps:
in step S101, according to preset databases of different data types, a target report is marked to obtain data type identifiers of each area and each area of the target report, where the target report is any report in a report set to be extracted.
In embodiments of the present disclosure, the database of different data types includes, but is not limited to: a database of statistical indicators storing metrics of data such as area, population, density, yield, etc.; a dimension database storing attributes of data, including attributes for describing administrative regions (e.g., Nanjing, Wuxi), attributes for describing time (e.g., 2012 and 2013), and attributes for describing categories (e.g., grains, potatoes, low-income households and medium-income households), and the like; and a unit database which stores common units such as units, ten thousand units, households, ten thousand households, heads and the like. Correspondingly, the data type identifier obtained by marking the report area based on the database may include: topics, data regions, statistical indicators, dimensions, and units, among others.
In an embodiment, the target report may be divided into a plurality of areas according to the relative position relationship between the header and the data area of the report and the distribution rules of the areas of different data types, such as the subject, the data area, the statistical index, the dimension, and the unit, and each area may be marked. For example, the data type of the continuous numerical region may be marked as a data area, and the data type of the uppermost region of the target report may be marked as a subject. For any other area of the target report, the database to which the element of the area belongs can be searched according to the preset databases of different data types, and the data type preset by the database is used as the data type identifier of the area.
For example, as illustrated in the target report 100 shown in fig. 2, the target report 100 may be first divided into a plurality of areas 110 to 180, the continuous numerical areas 110 and 120 are marked as data areas, and the area 180 is marked as a subject. Then, for the rest of the regions 130-170, after being queried, the elements in the regions 130 and 140 belong to the database of dimensions, the elements in the region 150 and the region 170 belong to the database of units, and the elements in the region 160 belong to the database of statistical indexes, the data types of the regions 130 and 140 are marked as dimensions, the data types of the regions 150 and 170 are marked as units, and the data type of the region 160 is marked as a statistical index.
It should be noted that fig. 2 only illustrates one way of dividing the target report into areas, in this embodiment, other various ways of dividing the target report into areas may be adopted, and this is not limited in the embodiment of the present disclosure.
In another embodiment, the title and the continuous value area of the target report may be identified first, the title is marked as the subject and the data type of the continuous value area is marked as the data area. Then, for any other element in the target report, querying a database to which the element belongs, taking a data type preset by the database as a data type identifier of the element, and dividing the element with the same data type identifier into an area, thereby marking the subject, the statistical index, the dimension, the unit, the data area and the like of the target report.
For example, as illustrated in the target report 200 shown in fig. 3, a title (i.e., an area 250) of the target report 200 is marked as a subject, a data type of a continuous numerical area 210 in the target report is marked as a data area, and for any remaining elements in the target report 200, if a query is made, elements on the left side adjacent to the data area 210 all belong to a database of units, the elements are divided into an area 220 and the unit is identified as the data type of the area 220. Similarly, according to the query result, the left elements adjacent to the area 220 are divided into an area 230, the statistical index is used as the data type identifier of the area 230, the upper elements adjacent to the data area 210 are divided into an area 240, and the dimension is used as the data type identifier of the area 240.
It is worth noting that in the embodiment of the present disclosure, one report may be located in one spreadsheet; all reports in the report set may also be located in the same spreadsheet. For the latter, since reports are usually separated from one another by blank cells in a spreadsheet, different reports in the spreadsheet can be distinguished based thereon.
In step S102, taking the relative position relationship between the areas with different data type identifiers in the target report as a reference, comparing the relative positions of the areas of other reports in the report set with the target report to obtain a comparison result.
In step S103, a report matching the relative position of the area of the target report is extracted from the report set according to the comparison result.
In the first implementation manner, any other report in the report set may be subjected to area division according to the area relative position relationship of the target report, and after the area of any report is obtained, whether the data type of each area in any report is the same as the data type of the area at the same position in the target report is compared, so as to obtain a comparison result. For each area in any report, whether the elements in the area belong to the database of the data type of the area with the same position in the target report can be inquired, and if the elements in the area exceeding the preset proportion belong to the database, the data type of the area with the same position in the target report is considered to be the same.
For example, as shown in fig. 4, the report 100 shown in fig. 2 is used as a target report, and the report 310 can be divided into areas according to the relative position relationship of each area of the target report 100, so as to obtain areas 311 to 318. The data type of region 318 may be marked as subject first; then, if the areas 311 and 312 are consecutive numerical areas, it can be determined that the data type of the area 311 is the same as that of the area 110 and the data type of the area 312 is the same as that of the area 120 of the target report; for the area 313, if it is found that the element of the area 313 exists in the dimension database, it can be determined that the data type of the area 313 is the same as that of the area 130 of the target report 100. Similarly, the data types of the areas 315-317 can be determined to be the same as the areas at the same location in the target report 100.
Correspondingly, the matching degree of other reports in the report set and the target report can be determined according to the comparison result, and the report of which the matching degree with the target report in the report set reaches the threshold value is extracted. The matching degree of any other report in the report set and the target report can be determined according to the proportion between the number of the areas with the same data type as the areas at the same position in the target report in any report and the total number of the areas of the report.
For example, as shown in fig. 5, with the report 100 shown in fig. 2 as a target report, reports 320 to 350 matching the regional relative position relationship of the target report 100 can be extracted from the report set; as shown in fig. 6, with the report 200 shown in fig. 3 as a target report, reports 360 to 380 that match the regional relative position relationship of the target report 200 can be extracted from the report set.
In the second embodiment, when comparing the relative position of the area of other reports in the report set with the target report, the area of any other report in the report set may be sequentially divided according to the relative position relationship of the area of the target report, and after each division, an area is obtained, whether the area at the same position as the target report belongs to the same data type is determined. If the data type of the area obtained by newly dividing any report is different from the data type of the area at the same position in the target report, stopping dividing the report into areas; and if the data type of the area obtained by newly dividing any report is the same as the data type of the area at the same position in the target report, continuously dividing the next area for any report. Correspondingly, according to the comparison result, the report with the same relative position with the area of the target report can be extracted from the report set. For the embodiment, for the report with the position which is not completely the same as the relative position of the target report area, the report is not divided into all areas, and the efficiency of the classification and extraction of the report is further improved.
In another exemplary embodiment of the disclosure, as shown in fig. 7, after the target report is marked according to the preset databases of different data types, it may be further queried, for each area of the target report, whether there is an element in the area that does not exist in the database of the data type to which the area belongs, and if there is an element in the area that cannot exist in the database, the database may be updated by using the element.
In addition, after the report matched with the regional relative position relationship of the target report is extracted from the report set, for the remaining reports in the report set, one of the reports can be selected as a new target report, and the report extraction method shown in fig. 1 is repeated to perform a new round of report extraction.
Optionally, based on the first implementation manner, the report with the highest matching degree with the target report may be used as a new target report according to the matching degree between each report in the remaining reports and the target report. Accordingly, when the report extraction method shown in fig. 1 is repeated to perform the next round of report extraction, the area marked with the data type in the new target report (i.e. the area with the same data type as the area at the same position in the target report of the previous round) may be reserved, and the unmarked area in the new target report (i.e. the area with the data type different from the area at the same position in the target report of the previous round) may be marked according to the preset database with different data types.
Optionally, based on the second embodiment, the report with the largest number of divided areas in the remaining reports may be used as a new target report. Correspondingly, when the report extraction method shown in fig. 1 is repeated to perform the next round of report extraction, the area in the new target report, which has the same data type as the area at the same position in the previous round of target report, may be reserved, and other areas in the new target report may be marked according to the preset database with different data types.
By the report extraction method disclosed by the embodiment of the disclosure, automatic area division of the reports based on the database can be realized, and further classification extraction of the reports is realized.
Fig. 8 is a block diagram illustrating a report extraction apparatus according to an exemplary embodiment of the present disclosure, and as shown in fig. 8, the apparatus 700 includes: a labeling module 701, an alignment module 702, and an extraction module 703.
The marking module 701 is configured to mark a target report according to preset databases of different data types to obtain data type identifiers of each area and each area of the target report, wherein the target report is any report in a report set to be extracted;
the comparison module 702 is configured to compare the relative positions of the areas of the target report with different data type identifiers, with reference to the relative position relationship between the areas of the target report, with other reports in the report set, and the target report, to obtain a comparison result;
the extraction module 703 is configured to extract a report matching the regional relative position relationship of the target report from the report set according to the comparison result.
In another embodiment, as shown in fig. 9, the marking module 701 includes:
the marking sub-module 711 is configured to, according to preset databases of different data types, query, for each area of the target report, a database to which an element of the area belongs, and use a data type preset by the database as a data type identifier of the area.
In another embodiment, as shown in fig. 9, the alignment module 702 includes:
the first comparison sub-module 721 is configured to sequentially divide the area of any other report in the report set according to the relative position relationship of the area of the target report, and after each division obtains an area, determine whether the area at the same position as the area of the target report belongs to the same data type.
In another embodiment, as shown in fig. 9, the apparatus 700 further comprises:
a first dividing module 704, configured to stop dividing any report into areas when the data types of the areas obtained by newly dividing any report are different from the areas at the same position in the target report;
a second dividing module 705, configured to continue dividing a next area for any report if the data type of the area obtained by newly dividing any report is the same as the data type of the area at the same position in the target report;
the extraction module 703 includes:
the first extraction sub-module 731 is configured to extract a report with the same relative position relationship as the target report area from the report set according to the comparison result.
In another embodiment, as shown in fig. 9, the alignment module 702 includes:
the division submodule 722 is configured to perform area division on any other report in the report set according to the area relative position relationship of the target report;
the second comparison submodule 723, configured to, after obtaining the area of any report, compare whether the data type of each area in any report is the same as the data type of the area at the same position in the target report, and obtain a comparison result;
the extraction module 703 includes:
a matching degree determining sub-module 732 configured to determine the matching degree of the other report and the target report according to the comparison result;
the second extraction sub-module 733 is configured to extract a report of which the matching degree with the target report reaches a threshold value in the report set.
In another embodiment, as shown in fig. 9, the apparatus 700 further comprises:
the query module 706 is configured to, after the target report is marked according to preset databases of different data types, query, for each of the areas, whether an element does not exist in the database of the data type to which the area belongs;
an update module 707 configured to update the database with an element that is not present in the database when the element is present in the region.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
Through the report extraction device, automatic area division can be performed on the reports based on the database, and then classification extraction of the reports is achieved.
Fig. 10 is a block diagram illustrating an electronic device 900 in accordance with an example embodiment. As shown in fig. 10, the electronic device 900 may include: a processor 901 and a memory 902. The electronic device 900 may also include one or more of a multimedia component 903, an input/output (I/O) interface 904, and a communications component 905.
The processor 901 is configured to control the overall operation of the electronic device 900, so as to complete all or part of the steps in the report extraction method. The memory 902 is used to store various types of data to support operation of the electronic device 900, such as instructions for any application or method operating on the electronic device 900 and application-related data, such as contact data, transmitted and received messages, pictures, audio, video, and the like. The Memory 902 may be implemented by any type of volatile or non-volatile Memory device or combination thereof, such as Static Random Access Memory (SRAM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Erasable Programmable Read-Only Memory (EPROM), Programmable Read-Only Memory (PROM), Read-Only Memory (ROM), magnetic Memory, flash Memory, magnetic disk or optical disk. The multimedia component 903 may include a screen and an audio component. Wherein the screen may be, for example, a touch screen and the audio component is used for outputting and/or inputting audio signals. For example, the audio component may include a microphone for receiving external audio signals. The received audio signal may further be stored in the memory 902 or transmitted through the communication component 905. The audio assembly also includes at least one speaker for outputting audio signals. The I/O interface 904 provides an interface between the processor 901 and other interface modules, such as a keyboard, mouse, buttons, etc. These buttons may be virtual buttons or physical buttons. The communication component 905 is used for wired or wireless communication between the electronic device 900 and other devices. Wireless Communication, such as Wi-Fi, bluetooth, Near Field Communication (NFC), 2G, 3G, or 4G, or a combination of one or more of them, so that the corresponding Communication component 905 may include: Wi-Fi module, bluetooth module, NFC module.
In an exemplary embodiment, the electronic Device 900 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic components for performing the above-described report extraction method.
In another exemplary embodiment, there is also provided a computer readable storage medium comprising program instructions which, when executed by a processor, implement the steps of the report extraction method described above. For example, the computer readable storage medium may be the memory 902 described above comprising program instructions executable by the processor 901 of the electronic device 900 to perform the report extraction method described above.
The preferred embodiments of the present disclosure are described in detail with reference to the accompanying drawings, however, the present disclosure is not limited to the specific details of the above embodiments, and various simple modifications may be made to the technical solution of the present disclosure within the technical idea of the present disclosure, and these simple modifications all belong to the protection scope of the present disclosure.
It should be noted that the various features described in the above embodiments may be combined in any suitable manner without departing from the scope of the invention. In order to avoid unnecessary repetition, various possible combinations will not be separately described in this disclosure.
In addition, any combination of various embodiments of the present disclosure may be made, and the same should be considered as the disclosure of the present disclosure, as long as it does not depart from the spirit of the present disclosure.

Claims (10)

1. A report extraction method is characterized by comprising the following steps:
marking a target report according to preset databases of different data types to obtain data type identifications of each area and each area of the target report, wherein the target report is any report in a report set to be extracted;
comparing the relative position of the areas of the target report with different data type identifications to obtain a comparison result;
and extracting the report matched with the regional relative position relation of the target report from the report set according to the comparison result.
2. The method according to claim 1, wherein the marking a target report according to preset databases of different data types to obtain each area of the target report and a data type identifier of each area comprises:
according to preset databases with different data types, aiming at each area of the target report, inquiring the database to which the elements of the area belong, and taking the preset data type of the database as the data type identifier of the area.
3. The method according to claim 1, wherein comparing the relative position of the areas of the other reports in the report set with the target report on the basis of the relative position relationship between the areas of the target report having different data type identifiers comprises:
and sequentially dividing the areas of any other report in the report set according to the relative position relation of the areas of the target report, and judging whether the areas at the same positions of the areas and the target report belong to the same data type after each area is obtained through division.
4. The method of claim 3, further comprising:
if the data types of the area obtained by newly dividing any report form are different from the data types of the areas at the same position in the target report form, stopping dividing the area of any report form;
if the data type of the area obtained by newly dividing any report is the same as the data type of the area at the same position in the target report, continuously dividing the next area for any report;
the extracting of the report matched with the regional relative position relationship of the target report from the report set according to the comparison result comprises:
and extracting the report with the same relative position relation with the target report area from the report set according to the comparison result.
5. The method according to claim 1 or 2, wherein comparing the relative position of the areas of the other reports in the report set with the target report on the basis of the relative position relationship between the areas of the target report having different data type identifiers comprises:
according to the regional relative position relation of the target report, performing regional division on any other report in the report set;
after the area of any report is obtained, comparing whether the data type of each area in any report is the same as the data type of the area at the same position in the target report to obtain a comparison result;
the extracting of the report matched with the regional relative position relationship of the target report from the report set according to the comparison result includes:
determining the matching degree of the other reports and the target report according to the comparison result;
and extracting the report of which the matching degree with the target report reaches a threshold value in the report set.
6. The method according to claim 1 or 2, wherein after the target report is marked according to the preset database with different data types, the method further comprises:
for each region, inquiring whether an element which is not in a database of the data type of the region exists in the region;
if the region has an element that does not exist in the database, the database is updated with the element.
7. A report extraction apparatus, comprising:
the system comprises a marking module, a data extraction module and a data extraction module, wherein the marking module is configured to mark a target report according to preset databases with different data types to obtain data type identifications of all areas and each area of the target report, and the target report is any report in a report set to be extracted;
the comparison module is configured to compare the relative positions of the areas of other reports in the report set with the target report by taking the relative position relationship between the areas with different data type identifications in the target report as a reference to obtain a comparison result;
and the extraction module is configured to extract the report matched with the regional relative position relation of the target report from the report set according to the comparison result.
8. The apparatus of claim 7, wherein the tagging module comprises:
and the marking sub-module is configured to query a database to which the elements of the region belong according to preset databases of different data types aiming at each region of the target report, and take the preset data type of the database as the data type identifier of the region.
9. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 6.
10. An electronic device, comprising:
a memory having a computer program stored thereon;
a processor for executing the computer program in the memory to carry out the steps of the method of any one of claims 1 to 6.
CN201810918482.7A 2018-08-13 2018-08-13 Report extraction method and device, storage medium and electronic equipment Active CN109359108B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810918482.7A CN109359108B (en) 2018-08-13 2018-08-13 Report extraction method and device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810918482.7A CN109359108B (en) 2018-08-13 2018-08-13 Report extraction method and device, storage medium and electronic equipment

Publications (2)

Publication Number Publication Date
CN109359108A CN109359108A (en) 2019-02-19
CN109359108B true CN109359108B (en) 2020-12-25

Family

ID=65349952

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810918482.7A Active CN109359108B (en) 2018-08-13 2018-08-13 Report extraction method and device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN109359108B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111159234A (en) * 2019-12-19 2020-05-15 中国建设银行股份有限公司 Method and device for comparing reports
CN112767013A (en) * 2021-01-05 2021-05-07 北京锐安科技有限公司 Business report splitting method, device, server and storage medium
CN114611478B (en) * 2022-03-22 2022-11-11 广西电网有限责任公司 Information processing method and system based on artificial intelligence and cloud platform

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104252443A (en) * 2013-06-28 2014-12-31 腾讯科技(深圳)有限公司 Report generation method and device
CN105843784A (en) * 2016-03-18 2016-08-10 中国银行股份有限公司 Statement generation method and device
CN106528511A (en) * 2016-09-30 2017-03-22 东软集团股份有限公司 Form analysis method and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10180932B2 (en) * 2015-06-30 2019-01-15 Datawatch Corporation Systems and methods for automatically creating tables using auto-generated templates

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104252443A (en) * 2013-06-28 2014-12-31 腾讯科技(深圳)有限公司 Report generation method and device
CN105843784A (en) * 2016-03-18 2016-08-10 中国银行股份有限公司 Statement generation method and device
CN106528511A (en) * 2016-09-30 2017-03-22 东软集团股份有限公司 Form analysis method and device

Also Published As

Publication number Publication date
CN109359108A (en) 2019-02-19

Similar Documents

Publication Publication Date Title
CN109359108B (en) Report extraction method and device, storage medium and electronic equipment
CN107909330B (en) Workflow data processing method and device, storage medium and computer equipment
JP6741216B2 (en) Log analysis system, method and program
CN107085568B (en) Text similarity distinguishing method and device
CN109478311A (en) A kind of image-recognizing method and terminal
CN111078512A (en) Alarm record generation method and device, alarm equipment and storage medium
CN103324742A (en) Method and equipment for recommending keywords
US9213759B2 (en) System, apparatus, and method for executing a query including boolean and conditional expressions
CN108021713B (en) Document clustering method and device
CN104063432A (en) Information searching method and information searching device
CN108073678B (en) Document analysis processing method, system and device applied to big data analysis
CN110619061A (en) Video classification method and device, electronic equipment and readable medium
US20180329926A1 (en) Image-based semantic accommodation search
CN112836124A (en) Image data acquisition method and device, electronic equipment and storage medium
CN107943912A (en) A kind of response type Resource TOC data visualization management method, terminal and device
CN104240107A (en) Community data screening system and method thereof
CN111143724A (en) Data processing method, device, equipment and medium
JP6756378B2 (en) Anomaly detection methods, systems and programs
CN111488327B (en) Data standard management method and system
CN113204579B (en) Content association method, system, device, electronic equipment and storage medium
CN110442615B (en) Resource information processing method and device, electronic equipment and storage medium
WO2017081866A1 (en) Log analysis system, method, and program
KR20150142459A (en) Automated system and method of instrument index
CN109299224A (en) Solution querying method based on Zabbix, device, computer equipment
CN110737823B (en) Access intention mining method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant