CN113962210A - Intelligent report compiling method based on NLP technology - Google Patents

Intelligent report compiling method based on NLP technology Download PDF

Info

Publication number
CN113962210A
CN113962210A CN202111403752.9A CN202111403752A CN113962210A CN 113962210 A CN113962210 A CN 113962210A CN 202111403752 A CN202111403752 A CN 202111403752A CN 113962210 A CN113962210 A CN 113962210A
Authority
CN
China
Prior art keywords
report
compiling
algorithm
data
library
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111403752.9A
Other languages
Chinese (zh)
Inventor
谢遵党
杨顺群
王楠
蔺志刚
王美斋
邹琮
常学军
王陆
陶玉波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yellow River Engineering Consulting Co Ltd
Original Assignee
Yellow River Engineering Consulting Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yellow River Engineering Consulting Co Ltd filed Critical Yellow River Engineering Consulting Co Ltd
Priority to CN202111403752.9A priority Critical patent/CN113962210A/en
Publication of CN113962210A publication Critical patent/CN113962210A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/186Templates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Animal Behavior & Ethology (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses an intelligent report compiling method based on NLP technology, which comprises the following steps: s1, collecting industry report compiling standards and specifications, and constructing a report template library; s2, collecting data information required by compiling the industry report, and constructing a report material library; s3, collecting the latest Internet data information to form an Internet resource library; s4, selecting a report compiling template, inputting or selecting keywords, and finishing compiling a report main body; and S5, completing report compilation in an auxiliary manner through a knowledge graph technology. The invention has the advantages that the report intelligent compiling method based on the NLP technology intelligently recommends materials for compiling personnel based on the report material library and the Internet resource library, automatically inserts the report, improves the quality and efficiency of report compiling, ensures the creativity, comprehensiveness and rigor of report compiling by means of the knowledge association function, and makes up the defects of human factors.

Description

Intelligent report compiling method based on NLP technology
Technical Field
The invention relates to the field of professional technology report compiling, in particular to an intelligent report compiling method based on an NLP technology.
Background
The professional technical reports mainly comprise bidding documents, project recommendation, feasibility study reports, preliminary design reports, special subject reports, implementation schemes and the like. The compiling method of the professional technical report at the present stage generally refers to the direct compiling of similar professional technical reports, paper documents, basic data and the like, and simultaneously forms the final finished product report through internet information retrieval and auxiliary report compiling.
However, the above report preparation method has the following disadvantages: 1) the report compiling efficiency is low, the materials and internet information resources required by the report compiling cannot be efficiently obtained, the report compiling needs to be carried out by manual means, and automatic intelligent means are lacked; 2) due to the fact that the influence of human factors is large, due to the fact that levels, experiences and report compiling habits of different compiling personnel are different, report compiling quality is often different, and report achievements with unified standards and high quality cannot be formed; 3) the report compilation related data is dispersed in the personal computer of each compiler, so that the resource sharing cannot be effectively realized, and effective knowledge accumulation and experience storage are formed.
With the rapid development of information technology, natural language processing technology (NLP) is becoming mature, and has been used in many fields of news and literature, but it is rarely used in the field of intelligent preparation of professional technical reports, especially in the field of professional technical reports.
Therefore, it is necessary to establish an intelligent report compiling method based on the NLP technology to solve the problems of automation and low intelligence degree of the current report compiling and improve the quality and efficiency of the report compiling.
Disclosure of Invention
The invention aims to provide an intelligent report compiling method based on an NLP technology.
In order to achieve the purpose, the invention adopts the following technical scheme:
the invention discloses an intelligent report compiling method based on an NLP technology, which comprises the following steps:
s1, collecting industry report compiling standards and specifications, and constructing a report template library;
s2, collecting data information needed by compiling an industry report, and establishing a report material library;
s3, collecting the latest Internet data information to form an Internet resource library;
s4, selecting a report compiling template, inputting or selecting keywords, and finishing compiling a report main body;
and S5, completing report compilation in an auxiliary manner through a knowledge graph technology.
Further, in step S1, the industry reports, including bidding documents, project recommendations, feasibility study reports, preliminary design reports, topic reports, and implementation schemes, can be adjusted in real time according to the industry field;
the template includes basic information such as title, directory, and chapter.
Further, in step S2, the data material includes similar reports, related paper documents and related basic data;
the report material library is constructed by carrying out segmentation, sentence segmentation, word segmentation and keyword extraction on collected data through an NPL processing algorithm to form data segments;
the NPL processing algorithm comprises a segmentation algorithm, a sentence segmentation algorithm, a word segmentation algorithm, a keyword extraction algorithm, a clustering algorithm, a recommendation algorithm and a knowledge graph construction algorithm.
Further, step S2 specifically includes the following steps:
s2.1, performing word segmentation by adopting a conditional random field (English abbreviation is CRF) model and a user-defined dictionary;
and S2.2, extracting the key words of the data information by adopting a word frequency inverse text frequency algorithm (TF-IDF for short) and a text sorting algorithm (TextRank for short).
Further, step S3 specifically includes the following steps:
s3.1, performing word segmentation on the keywords input or selected by report compiling personnel, and acquiring Internet data information based on word segmentation results;
s3.2, collecting internet data by adopting a web crawler technology and a robot process automation (RPA for short);
and S3.3, screening the Internet data information, warehousing the latest Internet data information to form an Internet resource library, and assisting report compiling personnel to compile reports.
Further, step S4 specifically includes the following steps:
s4.1, recommending data segments according to the input or selected keywords;
s4.2, clustering text data types through a K-means (English is K-means) clustering algorithm, and recommending data materials of the same cluster;
and S4.3, automatically importing the recommended data into a report.
Further, step S5 specifically includes the following steps:
s5.1, performing keyword search association through a search association algorithm of a search engine according to the keywords input or selected in the step S4, establishing connection between the keywords and associated texts, and constructing a knowledge graph;
and S5.2, rapidly acquiring recommended related data information from the report material library and the Internet resource library again through the association words in the knowledge map, and assisting in report compilation.
The invention has the advantages that the report intelligent compiling method based on the NLP technology intelligently recommends materials for compiling personnel based on the report material library and the Internet resource library, automatically inserts the report, improves the quality and efficiency of report compiling, ensures the creativity, comprehensiveness and rigor of report compiling by means of the knowledge association function, and makes up the defects of human factors.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
FIG. 2 is a diagram of a report template library according to an embodiment of the present invention.
FIG. 3 is a diagram of a report materials library according to an embodiment of the present invention.
Fig. 4 is a schematic diagram of an internet repository according to an embodiment of the present invention.
FIG. 5 is a schematic diagram of a knowledge-graph association in accordance with an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be described clearly and completely below, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1, the intelligent report compiling method based on the NLP technology includes the following steps:
s1, collecting industry report compiling standards and specifications, and constructing a report template library; as shown in fig. 2.
The industry reports comprise bidding documents, project recommendation, feasibility study reports, preliminary design reports, special subject reports, implementation schemes and the like; the adjustment can be carried out in real time according to the industry field;
the template comprises basic information such as titles, catalogs and chapters;
s2, collecting data information needed by compiling an industry report, and establishing a report material library; as shown in fig. 3.
The data information comprises similar reports, related paper documents, related basic data and the like;
the report material library is constructed by carrying out segmentation, sentence segmentation, word segmentation and keyword extraction on collected data through a natural language (NPL) processing algorithm to form data segments;
the natural language (English abbreviated as NPL) processing algorithm comprises a segmentation algorithm, a sentence segmentation algorithm, a word segmentation algorithm, a keyword extraction algorithm, a clustering algorithm, a recommendation algorithm and a knowledge graph construction algorithm;
the method specifically comprises the following steps:
s2.1, performing word segmentation by adopting a conditional random field (English abbreviation is CRF) model and a user-defined dictionary;
s2.2, extracting keywords of the data information by adopting a word frequency inverse text frequency algorithm (TF-IDF for short) and a text sorting algorithm (TextRank for short);
s3, collecting the latest Internet data information to form an Internet resource library; as shown in fig. 4.
The method specifically comprises the following steps:
s3.1, performing word segmentation on the keywords input or selected by report compiling personnel, and acquiring Internet data information based on word segmentation results;
s3.2, collecting internet data by adopting a web crawler technology and a robot process automation (RPA for short);
s3.3, screening the Internet data information, warehousing the latest Internet data information to form an Internet resource library, and assisting report compiling personnel to compile reports;
s4, selecting a report compiling template, inputting or selecting keywords, and finishing compiling a report main body; the method specifically comprises the following steps:
s4.1, recommending data segments according to the input or selected keywords;
s4.2, clustering text data types through a K-means (English is K-means) clustering algorithm, and recommending data materials of the same cluster;
s4.3, automatically importing the recommended data into a report;
s5, completing report compilation in an auxiliary manner through a knowledge graph technology; as shown in fig. 5.
The method specifically comprises the following steps:
s5.1, performing keyword search association through a search association algorithm of a search engine according to the keywords input or selected in the step S4, establishing connection between the keywords and associated texts, and constructing a knowledge graph;
and S5.2, rapidly acquiring recommended related data information from the report material library and the Internet resource library again through the association words in the knowledge map, and assisting in report compilation.
The report intelligent compilation method based on the NLP technology is based on a report material library and an internet resource library, intelligently recommends materials for compilation personnel, automatically inserts reports, improves quality and efficiency of report compilation, ensures creativity, comprehensiveness and rigor of report compilation by means of a knowledge association function, and is applied to compilation of professional technical reports.

Claims (7)

1. An intelligent report compiling method based on NLP technology is characterized in that: the method comprises the following steps:
s1, collecting industry report compiling standards and specifications, and constructing a report template library;
s2, collecting data information required by compiling the industry report, and constructing a report material library;
s3, collecting the latest Internet data information to form an Internet resource library;
s4, selecting a report compiling template, inputting or selecting keywords, and finishing compiling a report main body;
and S5, completing report compilation in an auxiliary manner through a knowledge graph technology.
2. The method of claim 1, further comprising: in step S1, the industry reports, including bid documents, project recommendations, feasibility study reports, preliminary design reports, topic reports, and implementation schemes, can be adjusted in real time according to the industry field;
the template includes title, directory and chapter basic information.
3. The method of claim 1, further comprising: in step S2, the data material includes similar reports, related paper documents and related basic data;
the report material library is constructed by carrying out segmentation, sentence segmentation, word segmentation and keyword extraction on the collected data through an NPL processing algorithm to form data segments;
the NPL processing algorithm comprises a segmentation algorithm, a sentence segmentation algorithm, a word segmentation algorithm, a keyword extraction algorithm, a clustering algorithm, a recommendation algorithm and a knowledge graph construction algorithm.
4. The method of claim 1, further comprising: in step S2, the following contents are specifically included:
s2.1, performing word segmentation by adopting a conditional random field model and a user-defined dictionary;
s2.2, extracting the keywords of the data by adopting a word frequency inverse text frequency algorithm and a text sorting algorithm.
5. The method of claim 1, further comprising: in step S3, the following contents are specifically included:
s3.1, performing word segmentation on the keywords input or selected by report compiling personnel, and acquiring the Internet data information based on word segmentation results;
s3.2, collecting internet data by adopting a web crawler technology and a robot process automatic collection technology;
and S3.3, screening the Internet data information, warehousing the latest Internet data information to form the Internet resource library, and assisting report compiling personnel in compiling reports.
6. The method of claim 1, further comprising: in step S4, the following contents are specifically included:
s4.1, recommending the data segments according to the input or selection of the keywords;
s4.2, clustering text data types through a K-means clustering algorithm, and recommending the data materials of the same cluster;
and S4.3, automatically importing the recommended data into a report.
7. The method of claim 1, further comprising: in step S5, the following contents are specifically included:
s5.1, performing keyword search association according to the keywords input or selected in the step S4 through a search association algorithm of a search engine, establishing connection between the keywords and associated texts, and constructing the knowledge graph;
and S5.2, rapidly acquiring the recommended related data information from the report material library and the Internet resource library again through the association words in the knowledge map, and assisting in report compilation.
CN202111403752.9A 2021-11-24 2021-11-24 Intelligent report compiling method based on NLP technology Pending CN113962210A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111403752.9A CN113962210A (en) 2021-11-24 2021-11-24 Intelligent report compiling method based on NLP technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111403752.9A CN113962210A (en) 2021-11-24 2021-11-24 Intelligent report compiling method based on NLP technology

Publications (1)

Publication Number Publication Date
CN113962210A true CN113962210A (en) 2022-01-21

Family

ID=79471833

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111403752.9A Pending CN113962210A (en) 2021-11-24 2021-11-24 Intelligent report compiling method based on NLP technology

Country Status (1)

Country Link
CN (1) CN113962210A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116415559A (en) * 2022-10-19 2023-07-11 国网浙江省电力有限公司开化县供电公司 Online intelligent report writing and generating system and method

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101334784A (en) * 2008-07-30 2008-12-31 施章祖 Computer auxiliary report and knowledge base generation method
CN104102713A (en) * 2014-07-16 2014-10-15 百度在线网络技术(北京)有限公司 Method and device for displaying recommendation results
CN106649223A (en) * 2016-12-23 2017-05-10 北京文因互联科技有限公司 Financial report automatic generation method based on natural language processing
CN109446344A (en) * 2018-11-14 2019-03-08 同方知网(北京)技术有限公司 A kind of intellectual analysis report automatic creation system based on big data
CN110148043A (en) * 2019-03-01 2019-08-20 安徽省优质采科技发展有限责任公司 The bid and purchase information recommendation system and recommended method of knowledge based map
CN111694940A (en) * 2020-05-14 2020-09-22 平安科技(深圳)有限公司 User report generation method and terminal equipment
CN112199931A (en) * 2020-09-24 2021-01-08 联合赤道环境评价有限公司 Environment-friendly consultation report intelligent generation method based on big data
CN113254574A (en) * 2021-03-15 2021-08-13 河北地质大学 Method, device and system for auxiliary generation of customs official documents
CN113268971A (en) * 2021-06-23 2021-08-17 中国平安人寿保险股份有限公司 Intelligent generation method and device of demonstration report, computer equipment and storage medium
CN113298435A (en) * 2021-06-21 2021-08-24 中交第二航务工程局有限公司 Intelligent construction scheme compiling method and system for building industry
CN113569543A (en) * 2021-07-13 2021-10-29 上海核工程研究设计院有限公司 Implementation method of nuclear power engineering automatic report generation technology

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101334784A (en) * 2008-07-30 2008-12-31 施章祖 Computer auxiliary report and knowledge base generation method
CN104102713A (en) * 2014-07-16 2014-10-15 百度在线网络技术(北京)有限公司 Method and device for displaying recommendation results
CN106649223A (en) * 2016-12-23 2017-05-10 北京文因互联科技有限公司 Financial report automatic generation method based on natural language processing
CN109446344A (en) * 2018-11-14 2019-03-08 同方知网(北京)技术有限公司 A kind of intellectual analysis report automatic creation system based on big data
CN110148043A (en) * 2019-03-01 2019-08-20 安徽省优质采科技发展有限责任公司 The bid and purchase information recommendation system and recommended method of knowledge based map
CN111694940A (en) * 2020-05-14 2020-09-22 平安科技(深圳)有限公司 User report generation method and terminal equipment
CN112199931A (en) * 2020-09-24 2021-01-08 联合赤道环境评价有限公司 Environment-friendly consultation report intelligent generation method based on big data
CN113254574A (en) * 2021-03-15 2021-08-13 河北地质大学 Method, device and system for auxiliary generation of customs official documents
CN113298435A (en) * 2021-06-21 2021-08-24 中交第二航务工程局有限公司 Intelligent construction scheme compiling method and system for building industry
CN113268971A (en) * 2021-06-23 2021-08-17 中国平安人寿保险股份有限公司 Intelligent generation method and device of demonstration report, computer equipment and storage medium
CN113569543A (en) * 2021-07-13 2021-10-29 上海核工程研究设计院有限公司 Implementation method of nuclear power engineering automatic report generation technology

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116415559A (en) * 2022-10-19 2023-07-11 国网浙江省电力有限公司开化县供电公司 Online intelligent report writing and generating system and method

Similar Documents

Publication Publication Date Title
CN109189942B (en) Construction method and device of patent data knowledge graph
CN101404015B (en) Automatically generating a hierarchy of terms
CN100458795C (en) Intelligent word input method and input method system and updating method thereof
CN102184262A (en) Web-based text classification mining system and web-based text classification mining method
CN103092943B (en) A kind of method of advertisement scheduling and advertisement scheduling server
CN112352232A (en) Classification tree generation
US10740406B2 (en) Matching of an input document to documents in a document collection
CN104281702A (en) Power keyword segmentation based data retrieval method and device
CN110188349A (en) A kind of automation writing method based on extraction-type multiple file summarization method
CN102609427A (en) Public opinion vertical search analysis system and method
CA3166094A1 (en) Commodity short title generation method and apparatus
CN109325146A (en) A kind of video recommendation method, device, storage medium and server
CN112334890A (en) Subject set refinement
CN105760524A (en) Multi-level and multi-class classification method for science news headlines
Sivakumar Effectual web content mining using noise removal from web pages
Leonandya et al. A semi-supervised algorithm for Indonesian named entity recognition
CN113962210A (en) Intelligent report compiling method based on NLP technology
CN103488741A (en) Online semantic excavation system of Chinese polysemic words and based on uniform resource locator (URL)
CN104462552A (en) Question and answer page core word extracting method and device
Costa et al. Semantic enrichment of product data supported by machine learning techniques
CN111401056A (en) Method for extracting keywords from various texts
CN102103604B (en) Method and device for determining core weight of term
CN116304347A (en) Git command recommendation method based on crowd-sourced knowledge
Gupta et al. Tools of opinion mining
Moumtzidou et al. Discovery of environmental nodes in the web

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination