CN113962210A - Intelligent report compiling method based on NLP technology - Google Patents
Intelligent report compiling method based on NLP technology Download PDFInfo
- Publication number
- CN113962210A CN113962210A CN202111403752.9A CN202111403752A CN113962210A CN 113962210 A CN113962210 A CN 113962210A CN 202111403752 A CN202111403752 A CN 202111403752A CN 113962210 A CN113962210 A CN 113962210A
- Authority
- CN
- China
- Prior art keywords
- report
- compiling
- algorithm
- data
- library
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 26
- 238000005516 engineering process Methods 0.000 title claims abstract description 23
- 230000011218 segmentation Effects 0.000 claims description 27
- 238000000605 extraction Methods 0.000 claims description 6
- 238000012545 processing Methods 0.000 claims description 6
- 238000013461 design Methods 0.000 claims description 4
- 238000010276 construction Methods 0.000 claims description 3
- 238000012216 screening Methods 0.000 claims description 3
- 238000003064 k means clustering Methods 0.000 claims 1
- 230000007547 defect Effects 0.000 abstract description 2
- 238000003058 natural language processing Methods 0.000 description 10
- 238000010586 diagram Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 2
- 238000002360 preparation method Methods 0.000 description 2
- 238000004801 process automation Methods 0.000 description 2
- 238000009825 accumulation Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/166—Editing, e.g. inserting or deleting
- G06F40/186—Templates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Probability & Statistics with Applications (AREA)
- Animal Behavior & Ethology (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses an intelligent report compiling method based on NLP technology, which comprises the following steps: s1, collecting industry report compiling standards and specifications, and constructing a report template library; s2, collecting data information required by compiling the industry report, and constructing a report material library; s3, collecting the latest Internet data information to form an Internet resource library; s4, selecting a report compiling template, inputting or selecting keywords, and finishing compiling a report main body; and S5, completing report compilation in an auxiliary manner through a knowledge graph technology. The invention has the advantages that the report intelligent compiling method based on the NLP technology intelligently recommends materials for compiling personnel based on the report material library and the Internet resource library, automatically inserts the report, improves the quality and efficiency of report compiling, ensures the creativity, comprehensiveness and rigor of report compiling by means of the knowledge association function, and makes up the defects of human factors.
Description
Technical Field
The invention relates to the field of professional technology report compiling, in particular to an intelligent report compiling method based on an NLP technology.
Background
The professional technical reports mainly comprise bidding documents, project recommendation, feasibility study reports, preliminary design reports, special subject reports, implementation schemes and the like. The compiling method of the professional technical report at the present stage generally refers to the direct compiling of similar professional technical reports, paper documents, basic data and the like, and simultaneously forms the final finished product report through internet information retrieval and auxiliary report compiling.
However, the above report preparation method has the following disadvantages: 1) the report compiling efficiency is low, the materials and internet information resources required by the report compiling cannot be efficiently obtained, the report compiling needs to be carried out by manual means, and automatic intelligent means are lacked; 2) due to the fact that the influence of human factors is large, due to the fact that levels, experiences and report compiling habits of different compiling personnel are different, report compiling quality is often different, and report achievements with unified standards and high quality cannot be formed; 3) the report compilation related data is dispersed in the personal computer of each compiler, so that the resource sharing cannot be effectively realized, and effective knowledge accumulation and experience storage are formed.
With the rapid development of information technology, natural language processing technology (NLP) is becoming mature, and has been used in many fields of news and literature, but it is rarely used in the field of intelligent preparation of professional technical reports, especially in the field of professional technical reports.
Therefore, it is necessary to establish an intelligent report compiling method based on the NLP technology to solve the problems of automation and low intelligence degree of the current report compiling and improve the quality and efficiency of the report compiling.
Disclosure of Invention
The invention aims to provide an intelligent report compiling method based on an NLP technology.
In order to achieve the purpose, the invention adopts the following technical scheme:
the invention discloses an intelligent report compiling method based on an NLP technology, which comprises the following steps:
s1, collecting industry report compiling standards and specifications, and constructing a report template library;
s2, collecting data information needed by compiling an industry report, and establishing a report material library;
s3, collecting the latest Internet data information to form an Internet resource library;
s4, selecting a report compiling template, inputting or selecting keywords, and finishing compiling a report main body;
and S5, completing report compilation in an auxiliary manner through a knowledge graph technology.
Further, in step S1, the industry reports, including bidding documents, project recommendations, feasibility study reports, preliminary design reports, topic reports, and implementation schemes, can be adjusted in real time according to the industry field;
the template includes basic information such as title, directory, and chapter.
Further, in step S2, the data material includes similar reports, related paper documents and related basic data;
the report material library is constructed by carrying out segmentation, sentence segmentation, word segmentation and keyword extraction on collected data through an NPL processing algorithm to form data segments;
the NPL processing algorithm comprises a segmentation algorithm, a sentence segmentation algorithm, a word segmentation algorithm, a keyword extraction algorithm, a clustering algorithm, a recommendation algorithm and a knowledge graph construction algorithm.
Further, step S2 specifically includes the following steps:
s2.1, performing word segmentation by adopting a conditional random field (English abbreviation is CRF) model and a user-defined dictionary;
and S2.2, extracting the key words of the data information by adopting a word frequency inverse text frequency algorithm (TF-IDF for short) and a text sorting algorithm (TextRank for short).
Further, step S3 specifically includes the following steps:
s3.1, performing word segmentation on the keywords input or selected by report compiling personnel, and acquiring Internet data information based on word segmentation results;
s3.2, collecting internet data by adopting a web crawler technology and a robot process automation (RPA for short);
and S3.3, screening the Internet data information, warehousing the latest Internet data information to form an Internet resource library, and assisting report compiling personnel to compile reports.
Further, step S4 specifically includes the following steps:
s4.1, recommending data segments according to the input or selected keywords;
s4.2, clustering text data types through a K-means (English is K-means) clustering algorithm, and recommending data materials of the same cluster;
and S4.3, automatically importing the recommended data into a report.
Further, step S5 specifically includes the following steps:
s5.1, performing keyword search association through a search association algorithm of a search engine according to the keywords input or selected in the step S4, establishing connection between the keywords and associated texts, and constructing a knowledge graph;
and S5.2, rapidly acquiring recommended related data information from the report material library and the Internet resource library again through the association words in the knowledge map, and assisting in report compilation.
The invention has the advantages that the report intelligent compiling method based on the NLP technology intelligently recommends materials for compiling personnel based on the report material library and the Internet resource library, automatically inserts the report, improves the quality and efficiency of report compiling, ensures the creativity, comprehensiveness and rigor of report compiling by means of the knowledge association function, and makes up the defects of human factors.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
FIG. 2 is a diagram of a report template library according to an embodiment of the present invention.
FIG. 3 is a diagram of a report materials library according to an embodiment of the present invention.
Fig. 4 is a schematic diagram of an internet repository according to an embodiment of the present invention.
FIG. 5 is a schematic diagram of a knowledge-graph association in accordance with an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be described clearly and completely below, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1, the intelligent report compiling method based on the NLP technology includes the following steps:
s1, collecting industry report compiling standards and specifications, and constructing a report template library; as shown in fig. 2.
The industry reports comprise bidding documents, project recommendation, feasibility study reports, preliminary design reports, special subject reports, implementation schemes and the like; the adjustment can be carried out in real time according to the industry field;
the template comprises basic information such as titles, catalogs and chapters;
s2, collecting data information needed by compiling an industry report, and establishing a report material library; as shown in fig. 3.
The data information comprises similar reports, related paper documents, related basic data and the like;
the report material library is constructed by carrying out segmentation, sentence segmentation, word segmentation and keyword extraction on collected data through a natural language (NPL) processing algorithm to form data segments;
the natural language (English abbreviated as NPL) processing algorithm comprises a segmentation algorithm, a sentence segmentation algorithm, a word segmentation algorithm, a keyword extraction algorithm, a clustering algorithm, a recommendation algorithm and a knowledge graph construction algorithm;
the method specifically comprises the following steps:
s2.1, performing word segmentation by adopting a conditional random field (English abbreviation is CRF) model and a user-defined dictionary;
s2.2, extracting keywords of the data information by adopting a word frequency inverse text frequency algorithm (TF-IDF for short) and a text sorting algorithm (TextRank for short);
s3, collecting the latest Internet data information to form an Internet resource library; as shown in fig. 4.
The method specifically comprises the following steps:
s3.1, performing word segmentation on the keywords input or selected by report compiling personnel, and acquiring Internet data information based on word segmentation results;
s3.2, collecting internet data by adopting a web crawler technology and a robot process automation (RPA for short);
s3.3, screening the Internet data information, warehousing the latest Internet data information to form an Internet resource library, and assisting report compiling personnel to compile reports;
s4, selecting a report compiling template, inputting or selecting keywords, and finishing compiling a report main body; the method specifically comprises the following steps:
s4.1, recommending data segments according to the input or selected keywords;
s4.2, clustering text data types through a K-means (English is K-means) clustering algorithm, and recommending data materials of the same cluster;
s4.3, automatically importing the recommended data into a report;
s5, completing report compilation in an auxiliary manner through a knowledge graph technology; as shown in fig. 5.
The method specifically comprises the following steps:
s5.1, performing keyword search association through a search association algorithm of a search engine according to the keywords input or selected in the step S4, establishing connection between the keywords and associated texts, and constructing a knowledge graph;
and S5.2, rapidly acquiring recommended related data information from the report material library and the Internet resource library again through the association words in the knowledge map, and assisting in report compilation.
The report intelligent compilation method based on the NLP technology is based on a report material library and an internet resource library, intelligently recommends materials for compilation personnel, automatically inserts reports, improves quality and efficiency of report compilation, ensures creativity, comprehensiveness and rigor of report compilation by means of a knowledge association function, and is applied to compilation of professional technical reports.
Claims (7)
1. An intelligent report compiling method based on NLP technology is characterized in that: the method comprises the following steps:
s1, collecting industry report compiling standards and specifications, and constructing a report template library;
s2, collecting data information required by compiling the industry report, and constructing a report material library;
s3, collecting the latest Internet data information to form an Internet resource library;
s4, selecting a report compiling template, inputting or selecting keywords, and finishing compiling a report main body;
and S5, completing report compilation in an auxiliary manner through a knowledge graph technology.
2. The method of claim 1, further comprising: in step S1, the industry reports, including bid documents, project recommendations, feasibility study reports, preliminary design reports, topic reports, and implementation schemes, can be adjusted in real time according to the industry field;
the template includes title, directory and chapter basic information.
3. The method of claim 1, further comprising: in step S2, the data material includes similar reports, related paper documents and related basic data;
the report material library is constructed by carrying out segmentation, sentence segmentation, word segmentation and keyword extraction on the collected data through an NPL processing algorithm to form data segments;
the NPL processing algorithm comprises a segmentation algorithm, a sentence segmentation algorithm, a word segmentation algorithm, a keyword extraction algorithm, a clustering algorithm, a recommendation algorithm and a knowledge graph construction algorithm.
4. The method of claim 1, further comprising: in step S2, the following contents are specifically included:
s2.1, performing word segmentation by adopting a conditional random field model and a user-defined dictionary;
s2.2, extracting the keywords of the data by adopting a word frequency inverse text frequency algorithm and a text sorting algorithm.
5. The method of claim 1, further comprising: in step S3, the following contents are specifically included:
s3.1, performing word segmentation on the keywords input or selected by report compiling personnel, and acquiring the Internet data information based on word segmentation results;
s3.2, collecting internet data by adopting a web crawler technology and a robot process automatic collection technology;
and S3.3, screening the Internet data information, warehousing the latest Internet data information to form the Internet resource library, and assisting report compiling personnel in compiling reports.
6. The method of claim 1, further comprising: in step S4, the following contents are specifically included:
s4.1, recommending the data segments according to the input or selection of the keywords;
s4.2, clustering text data types through a K-means clustering algorithm, and recommending the data materials of the same cluster;
and S4.3, automatically importing the recommended data into a report.
7. The method of claim 1, further comprising: in step S5, the following contents are specifically included:
s5.1, performing keyword search association according to the keywords input or selected in the step S4 through a search association algorithm of a search engine, establishing connection between the keywords and associated texts, and constructing the knowledge graph;
and S5.2, rapidly acquiring the recommended related data information from the report material library and the Internet resource library again through the association words in the knowledge map, and assisting in report compilation.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111403752.9A CN113962210A (en) | 2021-11-24 | 2021-11-24 | Intelligent report compiling method based on NLP technology |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111403752.9A CN113962210A (en) | 2021-11-24 | 2021-11-24 | Intelligent report compiling method based on NLP technology |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113962210A true CN113962210A (en) | 2022-01-21 |
Family
ID=79471833
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111403752.9A Pending CN113962210A (en) | 2021-11-24 | 2021-11-24 | Intelligent report compiling method based on NLP technology |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113962210A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116415559A (en) * | 2022-10-19 | 2023-07-11 | 国网浙江省电力有限公司开化县供电公司 | Online intelligent report writing and generating system and method |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101334784A (en) * | 2008-07-30 | 2008-12-31 | 施章祖 | Computer auxiliary report and knowledge base generation method |
CN104102713A (en) * | 2014-07-16 | 2014-10-15 | 百度在线网络技术(北京)有限公司 | Method and device for displaying recommendation results |
CN106649223A (en) * | 2016-12-23 | 2017-05-10 | 北京文因互联科技有限公司 | Financial report automatic generation method based on natural language processing |
CN109446344A (en) * | 2018-11-14 | 2019-03-08 | 同方知网(北京)技术有限公司 | A kind of intellectual analysis report automatic creation system based on big data |
CN110148043A (en) * | 2019-03-01 | 2019-08-20 | 安徽省优质采科技发展有限责任公司 | The bid and purchase information recommendation system and recommended method of knowledge based map |
CN111694940A (en) * | 2020-05-14 | 2020-09-22 | 平安科技(深圳)有限公司 | User report generation method and terminal equipment |
CN112199931A (en) * | 2020-09-24 | 2021-01-08 | 联合赤道环境评价有限公司 | Environment-friendly consultation report intelligent generation method based on big data |
CN113254574A (en) * | 2021-03-15 | 2021-08-13 | 河北地质大学 | Method, device and system for auxiliary generation of customs official documents |
CN113268971A (en) * | 2021-06-23 | 2021-08-17 | 中国平安人寿保险股份有限公司 | Intelligent generation method and device of demonstration report, computer equipment and storage medium |
CN113298435A (en) * | 2021-06-21 | 2021-08-24 | 中交第二航务工程局有限公司 | Intelligent construction scheme compiling method and system for building industry |
CN113569543A (en) * | 2021-07-13 | 2021-10-29 | 上海核工程研究设计院有限公司 | Implementation method of nuclear power engineering automatic report generation technology |
-
2021
- 2021-11-24 CN CN202111403752.9A patent/CN113962210A/en active Pending
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101334784A (en) * | 2008-07-30 | 2008-12-31 | 施章祖 | Computer auxiliary report and knowledge base generation method |
CN104102713A (en) * | 2014-07-16 | 2014-10-15 | 百度在线网络技术(北京)有限公司 | Method and device for displaying recommendation results |
CN106649223A (en) * | 2016-12-23 | 2017-05-10 | 北京文因互联科技有限公司 | Financial report automatic generation method based on natural language processing |
CN109446344A (en) * | 2018-11-14 | 2019-03-08 | 同方知网(北京)技术有限公司 | A kind of intellectual analysis report automatic creation system based on big data |
CN110148043A (en) * | 2019-03-01 | 2019-08-20 | 安徽省优质采科技发展有限责任公司 | The bid and purchase information recommendation system and recommended method of knowledge based map |
CN111694940A (en) * | 2020-05-14 | 2020-09-22 | 平安科技(深圳)有限公司 | User report generation method and terminal equipment |
CN112199931A (en) * | 2020-09-24 | 2021-01-08 | 联合赤道环境评价有限公司 | Environment-friendly consultation report intelligent generation method based on big data |
CN113254574A (en) * | 2021-03-15 | 2021-08-13 | 河北地质大学 | Method, device and system for auxiliary generation of customs official documents |
CN113298435A (en) * | 2021-06-21 | 2021-08-24 | 中交第二航务工程局有限公司 | Intelligent construction scheme compiling method and system for building industry |
CN113268971A (en) * | 2021-06-23 | 2021-08-17 | 中国平安人寿保险股份有限公司 | Intelligent generation method and device of demonstration report, computer equipment and storage medium |
CN113569543A (en) * | 2021-07-13 | 2021-10-29 | 上海核工程研究设计院有限公司 | Implementation method of nuclear power engineering automatic report generation technology |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116415559A (en) * | 2022-10-19 | 2023-07-11 | 国网浙江省电力有限公司开化县供电公司 | Online intelligent report writing and generating system and method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109189942B (en) | Construction method and device of patent data knowledge graph | |
CN101404015B (en) | Automatically generating a hierarchy of terms | |
CN100458795C (en) | Intelligent word input method and input method system and updating method thereof | |
CN102184262A (en) | Web-based text classification mining system and web-based text classification mining method | |
CN103092943B (en) | A kind of method of advertisement scheduling and advertisement scheduling server | |
CN112352232A (en) | Classification tree generation | |
US10740406B2 (en) | Matching of an input document to documents in a document collection | |
CN104281702A (en) | Power keyword segmentation based data retrieval method and device | |
CN110188349A (en) | A kind of automation writing method based on extraction-type multiple file summarization method | |
CN102609427A (en) | Public opinion vertical search analysis system and method | |
CA3166094A1 (en) | Commodity short title generation method and apparatus | |
CN109325146A (en) | A kind of video recommendation method, device, storage medium and server | |
CN112334890A (en) | Subject set refinement | |
CN105760524A (en) | Multi-level and multi-class classification method for science news headlines | |
Sivakumar | Effectual web content mining using noise removal from web pages | |
Leonandya et al. | A semi-supervised algorithm for Indonesian named entity recognition | |
CN113962210A (en) | Intelligent report compiling method based on NLP technology | |
CN103488741A (en) | Online semantic excavation system of Chinese polysemic words and based on uniform resource locator (URL) | |
CN104462552A (en) | Question and answer page core word extracting method and device | |
Costa et al. | Semantic enrichment of product data supported by machine learning techniques | |
CN111401056A (en) | Method for extracting keywords from various texts | |
CN102103604B (en) | Method and device for determining core weight of term | |
CN116304347A (en) | Git command recommendation method based on crowd-sourced knowledge | |
Gupta et al. | Tools of opinion mining | |
Moumtzidou et al. | Discovery of environmental nodes in the web |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |