CN113836884A

CN113836884A - Official document template recommendation method and system

Info

Publication number: CN113836884A
Application number: CN202111105392.4A
Authority: CN
Inventors: 周剑明; 林俊德; 陈立峰; 林诚汉
Original assignee: Fujia Newland Software Engineering Co ltd
Current assignee: Fujia Newland Software Engineering Co ltd
Priority date: 2021-09-22
Filing date: 2021-09-22
Publication date: 2021-12-24

Abstract

The invention provides a method and a system for recommending an official document template in the technical field of official document processing, wherein the method comprises the following steps: step S10, obtaining official document historical data and preprocessing the official document historical data; step S20, creating an official document template extraction model, and extracting a plurality of official document templates from official document historical data by using the official document template extraction model; step S30, creating and training an official document template matching model for matching the official document template and the official document title; and step S40, acquiring a new official document title input by the user, matching a corresponding official document template based on the new official document title and the official document template matching model, and completing official document template recommendation. The invention has the advantages that: the efficiency and the quality of official document writing have greatly been promoted.

Description

Official document template recommendation method and system

Technical Field

The invention relates to the technical field of official document processing, in particular to an official document template recommendation method and system.

Background

The official document writing has unique writing format and writing content, the writing format of rigorous professional can reflect the familiarity of the writer with the service, and the official document content can effectively exert the due information transfer utility; the writer needs to keep continuous learning input and writing output of the updated contents of the related fields for a long time to ensure the quality of the written contents.

At present, in the official document writing process, there are often pain spots such as complicated format adjustment, difficult searching of writing materials, and easy careless omission of character proofreading, and writers often need to consume a large amount of time on format typesetting and content writing, resulting in low official document processing efficiency, and then resulting in untimely work transaction processing, affecting the whole efficiency of enterprise organization operation, and increasing the labor cost of enterprises.

Therefore, how to provide a method and a system for recommending a document template to improve the efficiency and quality of document writing becomes an urgent problem to be solved.

Disclosure of Invention

The technical problem to be solved by the invention is to provide a method and a system for recommending a document template, so that the efficiency and the quality of document writing are improved.

In a first aspect, the present invention provides a method for recommending an official document template, comprising the following steps:

step S10, obtaining official document historical data and preprocessing the official document historical data;

step S20, creating an official document template extraction model, and extracting a plurality of official document templates from official document historical data by using the official document template extraction model;

step S30, creating and training an official document template matching model for matching the official document template and the official document title;

and step S40, acquiring a new official document title input by the user, matching a corresponding official document template based on the new official document title and the official document template matching model, and completing official document template recommendation.

Further, the step S10 is specifically:

acquiring official document historical data from an official document database, and carrying out preprocessing including data cleaning and data standardization on the official document historical data;

the official document historical data at least comprises an official document ID, a department ID, an official document title, a text, a sender, a receiver, receiving time and processing time;

the data cleaning specifically comprises the steps of carrying out validity check on the historical data of the official document, eliminating illegal characters and removing duplicate of the official document;

the data standardization is specifically to unify the official document historical data by official document coding, official document format and official document type.

Further, the step S20 is specifically:

creating an official document template extraction model based on a neural network, and performing paradigm training including pre-training and fine-tuning on the official document template extraction model by utilizing the preprocessed official document historical data;

and extracting a plurality of official document templates from official document historical data by using the trained official document template extraction model, and storing each official document template into a template database.

Further, the step S30 is specifically:

and establishing a document template matching model for matching the document template and the document titles based on a neural network, extracting a plurality of document titles from the document historical data to form a document title training set, and performing pre-training and fine-tuning paradigm training on the document template matching model by using the document title training set.

Further, the step S40 is specifically:

acquiring new official document titles input by a user, matching the new official document titles with official document titles in official document historical data by using a BM25 algorithm, screening N official document titles with highest similarity, inputting the official document titles with the highest similarity into an official document template matching model respectively to match with corresponding official document templates, and completing official document template recommendation.

In a second aspect, the present invention provides a document template recommendation system, including the following modules:

the system comprises a document historical data preprocessing module, a document historical data processing module and a document data processing module, wherein the document historical data preprocessing module is used for acquiring document historical data and preprocessing the document historical data;

the official document template generating module is used for creating an official document template extraction model and extracting a plurality of official document templates from official document historical data by using the official document template extraction model;

the official document template matching model creating module is used for creating and training an official document template matching model used for matching the official document template with the official document title;

and the official document template recommendation module is used for acquiring a new official document title input by a user, matching a corresponding official document template based on the new official document title and the official document template matching model, and completing official document template recommendation.

Further, the official document historical data preprocessing module specifically comprises:

Further, the official document template generation module specifically is:

Further, the official document template matching model creation module specifically is:

Further, the official document template recommendation module specifically comprises:

The invention has the advantages that:

by creating the official document template extraction model and the official document template matching model, and performing paradigm training including pre-training and fine-tuning on the official document template extraction model and the official document template matching model, the official document template extraction model can more accurately identify entity information, so that a high-quality official document template is generated, the official document template matching model can better match official document titles and official document templates, a BM25 algorithm is used for matching new official document titles input by a user and official document titles in official document historical data, the official document title with the highest similarity is selected and input into the official document template matching model, so that the corresponding official document template can be matched immediately, the efficiency and the quality of official document template recommendation are greatly improved, a writer can quickly write by using the recommended high-quality official document template, the efficiency and the quality of official document writing are greatly improved, and the official document processing efficiency is finally improved, the whole efficiency of enterprise's mechanism operation is promoted, the cost of labor of enterprise is reduced.

Drawings

The invention will be further described with reference to the following examples with reference to the accompanying drawings.

FIG. 1 is a flowchart of a document template recommendation method according to the present invention.

FIG. 2 is a schematic structural diagram of an official document template recommendation system according to the present invention.

Detailed Description

The technical scheme in the embodiment of the application has the following general idea: creating an official document template extraction model to extract an official document template from official document historical data, creating an official document template matching model to match the official document template and the official document title, matching a new official document title input by a user with the official document title in the official document historical data by utilizing a BM25 algorithm, screening out the official document template with the highest similarity to input the official document title and match the corresponding official document template in the official document template matching model, and recommending the official document template so as to improve the efficiency and quality of recommending the official document template and further improve the efficiency and quality of writing the official document.

Referring to fig. 1 to 2, a preferred embodiment of a document template recommendation method according to the present invention includes the following steps:

step S30, creating and training an official document template matching model for matching the official document template and the official document title; performing semantic matching on the official document template and the official document title through the official document template matching model;

The method comprises the steps that a user can immediately obtain a high-quality recommended official document template only by inputting a new official document title, and then the official document template is used for fast writing.

The step S10 specifically includes:

The step S20 specifically includes:

The official document template extraction model is used for identifying entity information including names of people, place names, mechanism names and proper nouns in the text, replacing the identified entity information with spaces, and further merging and normalizing the text to generate an official document template; merging rules comprise continuous space merging, invalid word merging, html label rules and paragraph label rules.

Pre-training (pre-training/trained) refers to a model that is pre-trained or refers to a process of pre-training a model; fine-tuning (fine-tuning) refers to the process of applying a pre-trained model to its own data set to adapt the parameters to its own data set.

As most people lack enough data sets during model training, and overfitting is easily caused if the model is trained from the beginning, the method of pre-training and fine-tuning is adopted, so that time and computing resources are saved, and a better effect can be achieved quickly.

The step S30 specifically includes:

Extracting the official document titles in the official document historical data according to the format of 'title \ tsim _ title \ tdissim _ title' to form an official document title training set; wherein, title, sim _ title and dissim _ title are space word segmentation official document titles, which are segmented by extracting a model through the official document template, and are separated by a tab character '\ t'; sim _ title represents a positive case similar to the title, and dissim _ title represents a random negative case dissimilar to the title; and (3) generating a short text semantic matching document template matching model (SimNet model) special for the document title field by adopting the paradigm training of 'pre-training + similar fine tuning'.

The step S40 specifically includes:

acquiring new official document titles input by a user, matching the new official document titles with official document titles in official document historical data by utilizing a BM25 algorithm, screening N official document titles with highest similarity, respectively inputting the official document titles with the highest similarity into an official document template matching model to match with corresponding official document templates, and returning the matched official document templates through official document IDs to complete official document template recommendation. Namely, the BM25 algorithm is used for realizing rough recall, and the similar official document titles of topN are quickly searched in the official document historical data and returned for the new official document title.

The invention discloses a preferred embodiment of a document template recommendation system, which comprises the following modules:

the official document template matching model creating module is used for creating and training an official document template matching model used for matching the official document template with the official document title; performing semantic matching on the official document template and the official document title through the official document template matching model;

The official document historical data preprocessing module specifically comprises:

The official document template generation module specifically comprises:

The official document template matching model creating module specifically comprises:

The official document template recommendation module specifically comprises:

In summary, the invention has the advantages that:

Although specific embodiments of the invention have been described above, it will be understood by those skilled in the art that the specific embodiments described are illustrative only and are not limiting upon the scope of the invention, and that equivalent modifications and variations can be made by those skilled in the art without departing from the spirit of the invention, which is to be limited only by the appended claims.

Claims

1. An official document template recommendation method is characterized in that: the method comprises the following steps:

2. The official document template recommendation method of claim 1, characterized in that: the step S10 specifically includes:

3. The official document template recommendation method of claim 1, characterized in that: the step S20 specifically includes:

4. The official document template recommendation method of claim 1, characterized in that: the step S30 specifically includes:

5. The official document template recommendation method of claim 1, characterized in that: the step S40 specifically includes:

6. An official document template recommendation system is characterized in that: the system comprises the following modules:

7. The official document template recommendation system of claim 6, characterized in that: the official document historical data preprocessing module specifically comprises:

8. The official document template recommendation system of claim 6, characterized in that: the official document template generation module specifically comprises:

9. The official document template recommendation system of claim 6, characterized in that: the official document template matching model creating module specifically comprises:

10. The official document template recommendation system of claim 6, characterized in that: the official document template recommendation module specifically comprises: