CN117992601A - Document generation method and device based on artificial intelligence - Google Patents

Document generation method and device based on artificial intelligence Download PDF

Info

Publication number
CN117992601A
CN117992601A CN202410398569.1A CN202410398569A CN117992601A CN 117992601 A CN117992601 A CN 117992601A CN 202410398569 A CN202410398569 A CN 202410398569A CN 117992601 A CN117992601 A CN 117992601A
Authority
CN
China
Prior art keywords
writing
document
model
text
logic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410398569.1A
Other languages
Chinese (zh)
Inventor
史延莹
赵元杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zijincheng Credit Investigation Co ltd
Original Assignee
Zijincheng Credit Investigation Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zijincheng Credit Investigation Co ltd filed Critical Zijincheng Credit Investigation Co ltd
Priority to CN202410398569.1A priority Critical patent/CN117992601A/en
Publication of CN117992601A publication Critical patent/CN117992601A/en
Pending legal-status Critical Current

Links

Landscapes

  • Machine Translation (AREA)

Abstract

The application provides a document generation method and device based on artificial intelligence, and relates to the field of artificial intelligence, wherein the method comprises the following steps: encoding historical document data through a preset natural semantic recognition model, determining corresponding text semantic representation, extracting key information from the text semantic representation, determining a corresponding text logic structure according to the extracted key information, and determining corresponding writing logic and writing context according to the key information and the text logic structure; determining the relation between entities in a preset knowledge graph according to the writing logic and the writing context, constructing to obtain the knowledge graph, taking an expansion result obtained according to the knowledge graph as a model training set, and carrying out model training on a preset training model to obtain a document writing model; receiving a document writing request sent by a user, and obtaining a corresponding target document according to the document writing request and the document writing model; the application can effectively improve the accuracy and efficiency of the document generation.

Description

Document generation method and device based on artificial intelligence
Technical Field
The application relates to the field of artificial intelligence, in particular to a document generation method and device based on artificial intelligence.
Background
Current staff often encounter the following problems when writing documents:
1. the format specification is complex:
The documents have strict format requirements, such as titles, main sending units, text structures, font sizes, date of formation, etc. all have specific format specifications. The operator is easy to have inaccurate format or missing in the actual operation.
2. Difficulty in writing content:
The content of the document must be strict, accurate and logically clear, ensuring that the expression meets specific language habits and word standards. Meanwhile, according to different document types (such as notification, decision, report and the like), different line ways and expression strategies need to be flexibly applied.
3. Information retrieval and integration:
In the writing process, a great deal of related documents, laws and regulations and historical data are often required to be consulted, and key information is arranged and refined. This process can be time consuming and requires high information screening and summarization capabilities.
Based on the above, staff is required to complete high-quality document writing in a short time, and a large amount of related files, laws and regulations and history information are required to be collected, so that the efficiency and accuracy of document writing are low.
Disclosure of Invention
Aiming at the problems in the prior art, the application provides a document generation method and device based on artificial intelligence, which can effectively improve the accuracy and efficiency of document generation.
In order to solve at least one of the problems, the application provides the following technical scheme:
in a first aspect, the present application provides an artificial intelligence based document generation method, including:
encoding historical document data through a preset natural semantic recognition model, determining corresponding text semantic representation, extracting key information from the text semantic representation, determining a corresponding text logic structure according to the extracted key information, and determining corresponding writing logic and writing context according to the key information and the text logic structure;
determining the relation between entities in a preset knowledge graph according to the writing logic and the writing context, constructing to obtain the knowledge graph, taking an expansion result obtained according to the knowledge graph as a model training set, and carrying out model training on a preset training model to obtain a document writing model;
And receiving a document writing request sent by a user, and obtaining a corresponding target document according to the document writing request and the document writing model.
Further, before encoding the historical document data through a preset natural semantic recognition model and determining the corresponding text semantic representation, the method comprises the following steps:
Removing noise and characters from the historical document data, and marking the event, the entity and the text of the relation among the entities on the historical document data after the noise and the characters are removed;
and converting the historical document data subjected to the text annotation into vector representation.
Further, the extracting key information from the text semantic representation, and determining a corresponding text logic structure according to the extracted key information, includes:
extracting at least one of an event, an entity and a relationship between entities from the text semantic representation;
And determining a corresponding text logic structure according to the text semantic representation corresponding text structure and at least one of the extracted event, entity and relationship among the entities.
Further, the determining corresponding writing logic and writing context according to the key information and the text logic structure comprises the following steps:
And determining a corresponding writing purpose according to the key information, and determining a corresponding text paragraph structure according to the text logic structure.
And determining corresponding writing logic and writing context according to the text paragraph structure and the writing style corresponding to the writing purpose.
Further, the determining the relationship between the entities in the preset knowledge graph according to the writing logic and the writing context and constructing to obtain the knowledge graph includes:
setting the writing logic and the writing context as entities in a preset knowledge graph and determining the relationship between the entities;
And carrying out entity expansion according to the entity information of the entity, and carrying out relationship expansion according to the relationship path of the relationship between the entities to obtain a knowledge graph after the entity expansion and the relationship expansion.
Further, the training of the model by using the expansion result obtained according to the knowledge graph as a model training set and performing model training on a preset training model to obtain a document writing model includes:
converting the format of the expansion result obtained by the knowledge graph according to the data format of the model training set, and setting a corresponding label for each data sample to obtain the model training set, wherein the label comprises the affiliated writing logic or the affiliated writing context;
and inputting the model training set into a preset training model to perform model training to obtain a document writing model.
Further, after the model training is performed on the preset pre-training model to obtain the document writing model, the method comprises the following steps:
Taking an expansion result obtained by the knowledge graph as a verification set to carry out model evaluation on the document writing model;
and carrying out parameter adjustment on the document writing model according to the model evaluation result to obtain an updated document writing model.
In a second aspect, the present application provides an artificial intelligence based document generation apparatus, comprising:
The historical document data analysis module is used for encoding the historical document data through a preset natural semantic recognition model, determining corresponding text semantic representation, extracting key information from the text semantic representation, determining a corresponding text logic structure according to the extracted key information, and determining corresponding writing logic and writing context according to the key information and the text logic structure;
The document writing model construction module is used for determining the relation between the entities in the preset knowledge graph according to the writing logic and the writing context, constructing and obtaining the knowledge graph, taking an expansion result obtained according to the knowledge graph as a model training set, and carrying out model training on a preset training model to obtain a document writing model;
And the automatic document writing module is used for receiving a document writing request sent by a user and obtaining a corresponding target document according to the document writing request and the document writing model.
In a third aspect, the present application provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the artificial intelligence based document generation method when the program is executed.
In a fourth aspect, the present application provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the artificial intelligence based document generation method.
In a fifth aspect, the present application provides a computer program product comprising computer programs/instructions which when executed by a processor implement the steps of the artificial intelligence based document generation method.
According to the technical scheme, the application provides an artificial intelligence-based document generation method and device, which are characterized in that historical document data is encoded through a preset natural semantic recognition model, corresponding text semantic representations are determined, key information is extracted from the text semantic representations, corresponding text logic structures are determined according to the extracted key information, and corresponding writing logic and writing context are determined according to the key information and the text logic structures; determining the relation between entities in a preset knowledge graph according to the writing logic and the writing context, constructing to obtain the knowledge graph, taking an expansion result obtained according to the knowledge graph as a model training set, and carrying out model training on a preset training model to obtain a document writing model; receiving a document writing request sent by a user, and obtaining a corresponding target document according to the document writing request and the document writing model, thereby effectively improving the accuracy and efficiency of document generation.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow diagram of an artificial intelligence based document generation method according to an embodiment of the present application;
FIG. 2 is a second flow chart of an artificial intelligence based document generation method according to an embodiment of the present application;
FIG. 3 is a third flow chart of an artificial intelligence based document generation method according to an embodiment of the present application;
FIG. 4 is a flow chart of an artificial intelligence based document generation method according to an embodiment of the present application;
FIG. 5 is a flowchart of an artificial intelligence based document generation method according to an embodiment of the present application;
FIG. 6 is a flowchart of an artificial intelligence based document generation method according to an embodiment of the present application;
FIG. 7 is a flow chart of an artificial intelligence based document generation method according to an embodiment of the present application;
FIG. 8 is a block diagram of an artificial intelligence based document generating device in an embodiment of the present application;
fig. 9 is a schematic structural diagram of an electronic device in an embodiment of the application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The technical scheme of the application obtains, stores, uses, processes and the like the data, which all meet the relevant regulations of national laws and regulations.
Considering that in the writing process of documents in the prior art, a great deal of related files, laws and regulations and historical data are often required to be consulted, and key information is arranged and refined. The application provides a method and a device for generating a document based on artificial intelligence, which are used for encoding historical document data through a preset natural semantic recognition model, determining corresponding text semantic representation, extracting key information from the text semantic representation, determining a corresponding text logic structure according to the extracted key information, and determining corresponding writing logic and writing context according to the key information and the text logic structure; determining the relation between entities in a preset knowledge graph according to the writing logic and the writing context, constructing to obtain the knowledge graph, taking an expansion result obtained according to the knowledge graph as a model training set, and carrying out model training on a preset training model to obtain a document writing model; receiving a document writing request sent by a user, and obtaining a target document according to the document writing request and a document writing model, so that document generation efficiency and accuracy can be effectively improved.
In order to effectively improve the accuracy and efficiency of document generation, the application provides an embodiment of an artificial intelligence-based document generation method, referring to fig. 1, wherein the artificial intelligence-based document generation method specifically comprises the following steps:
Step S101: and encoding the historical document data through a preset natural semantic recognition model, determining corresponding text semantic representation, extracting key information from the text semantic representation, determining a corresponding text logic structure according to the extracted key information, and determining corresponding writing logic and writing context according to the key information and the text logic structure.
Optionally, in this embodiment, a natural semantic recognition model suitable for text encoding may be selected, for example, BERT, GPT, etc., historical document data may be input into a preset natural semantic recognition model to obtain semantic representation of text, and then key semantic features may be extracted from model output, which may include word embedding, sentence embedding, or vector representation of text segments, entity recognition may be performed by using model output, key entity information in text, for example, name of person, place name, organization name, etc., and relationship information between entities in text may be extracted.
Optionally, in this embodiment, the semantic representation output by the natural semantic recognition model may be used to perform syntax analysis, determine a main predicate structure, a clause relationship, and the like in the text, determine paragraph division of the text according to the semantic representation and the syntax analysis, and find out natural paragraphs and topic transition points, so as to determine the text logic structure.
Optionally, in this embodiment, the key information extracted in the foregoing steps may be integrated to form a comprehensive semantic representation of the text, and then the integrated semantic representation is used to perform logical reasoning, determine a logical structure and a logical relationship of the text, determine a theme or a theme of the text according to the key information and the logical structure, consider context information in the text, and set a writing context, so as to ensure that the generated text conforms to the style and the word of the history document. To determine the authoring logic and the authoring context.
By converting the historical document data into semantic representations and extracting key information therefrom, the model is enabled to understand the structure and content of the text. By determining the logical structure and the authoring context, the generated text is more logical and context-consistent. This process combines semantic analysis and logical reasoning so that the generated text better meets the authoring requirements.
Step S102: determining the relation between entities in a preset knowledge graph according to the writing logic and the writing context, constructing the knowledge graph, taking an expansion result obtained according to the knowledge graph as a model training set, and carrying out model training on a preset training model to obtain a document writing model.
Optionally, in this embodiment, key entities, such as specific nouns, organizations, places, characters, and the like, may be determined according to the writing logic and the writing context, and relationships among the entities may be determined, which may include a superior-subordinate relationship, a attribution relationship, an action relationship, and the like, then, the determined entities are used as nodes of the knowledge graph, each node represents a key entity, the constructed edges represent the relationships among the entities, and the types and weights of the edges may be determined according to the actual relationships.
Optionally, in this embodiment, the knowledge graph may be expanded by utilizing the authoring logic and context through graph inference or query, related entities and relationships are added, and entity attribute information, such as characteristics, attributes, history background, and the like of the entities, is added.
Optionally, in this embodiment, the expanded knowledge graph result may be used as a part of a model training set, where each entity and relationship become a training sample, and a label is set for each training sample to indicate the correct class or relationship of the sample, so as to supervise model learning.
Alternatively, a pre-training model suitable for atlas learning, such as a neural network (Graph Neural Network) or an attention mechanism model, may be selected in this embodiment. The constructed knowledge graph is input into a graph neural network for training so as to learn complex relations and context information among entities.
Step S103: and receiving a document writing request sent by a user, and obtaining a corresponding target document according to the document writing request and the document writing model.
Optionally, in this embodiment, after receiving a document composition request sent by a user, the request includes a document type, a theme, key information, a format requirement, and the like of the request, the request sent by the user may be parsed, the key information may be extracted, and the type, the theme, and other necessary elements of the document may be determined. And writing a model by utilizing the previously trained document, inputting the analyzed request information, and calling the model to generate the target document. And the model generates a target document of the corresponding format and content according to the user request and the writing logic.
Therefore, the embodiment can generate the target document meeting the user requirement by utilizing the pre-trained model according to the document writing request of the user. And (3) continuously optimizing and monitoring the system to ensure that the system can adapt to the document requirements of different types and styles.
From the above description, it can be seen that, according to the artificial intelligence-based document generation method provided by the embodiment of the present application, the historical document data can be encoded by a preset natural semantic recognition model, corresponding text semantic representations are determined, key information is extracted from the text semantic representations, corresponding text logic structures are determined according to the extracted key information, and corresponding writing logic and writing context are determined according to the key information and the text logic structures; determining the relation between entities in a preset knowledge graph according to the writing logic and the writing context, constructing to obtain the knowledge graph, taking an expansion result obtained according to the knowledge graph as a model training set, and carrying out model training on a preset training model to obtain a document writing model; receiving a document writing request sent by a user, and obtaining a target document according to the document writing request and a document writing model, so that document generation efficiency and accuracy can be effectively improved.
In an embodiment of the artificial intelligence based document generating method of the present application, referring to fig. 2, the following may be further specifically included:
Step S201: removing noise and characters from the historical document data, and marking the event, the entity and the text of the relation among the entities on the historical document data after the noise and the characters are removed;
Step S202: and converting the historical document data subjected to the text annotation into vector representation.
Optionally, in this embodiment, when processing the historical document data, noise and character cleaning are first required to ensure the purity and consistency of the text. The goal of this step is to eliminate the interference, making the text easier to understand and analyze. The embodiment is realized by the following steps:
text cleaning: non-text characters, such as HTML tags, special symbols, etc., are removed using tools such as regular expressions.
Character removal: unnecessary characters, such as punctuation marks, numbers, are culled to ensure consistency of the text body.
This stage of processing aims at normalizing the text to make it more suitable for subsequent semantic analysis.
And a second step of: text labels for events, entities, and relationships between entities:
the cleaned text needs to be further annotated so that the machine can understand the events, entities and relationships between entities contained in the text. This step includes:
Event labeling: through natural language processing technology, the events in the text are identified, and key information of the events, such as event types, occurrence time and the like, are marked.
Entity labeling: entities in the text, such as person names, place names, organization, etc., are identified and labeled.
And (3) relationship labeling: relationships among entities in the text, such as character relationships, organization attribution relationships and the like, are identified and marked accordingly.
This step provides critical semantic information for the machine to understand the text, and lays a foundation for subsequent information extraction.
And a third step of: conversion to a vector representation:
The annotated text needs to be converted into a vector representation so that the computer can better understand and process. This step includes:
Word embedding model: and converting the text after washing and labeling into vector representation by utilizing a pre-trained word embedding model. This can be achieved by Word2Vec, gloVe, etc. models.
The context means: considering the text context, ensuring that the vector representation of entities and events contains rich semantic information may be implemented through a context window or other methods.
Thereby enabling the text to be understood in numerical form by a computer to provide input for a subsequent machine learning model.
In summary, the process flows of noise and character removal, event, entity labeling and vector representation of the historical document data are used to extract more meaningful information from the original text, and provide powerful support for further analysis and application. The process combines a plurality of technical means such as natural language processing, machine learning, text mining and the like, so that the historical document data has higher analyzability and usability.
In an embodiment of the artificial intelligence based document generating method of the present application, referring to fig. 3, the following may be further specifically included:
Step S301: extracting at least one of an event, an entity and a relationship between entities from the text semantic representation;
Step S302: and determining a corresponding text logic structure according to the text semantic representation corresponding text structure and at least one of the extracted event, entity and relationship among the entities.
Optionally, in this embodiment, in an automatic writing scenario for processing historical document data, it is important to extract events, entities and relationships between entities from the text semantic representation. Firstly, through advanced natural language processing technology, the embodiment uses an embedded model to encode the historical document data, so as to obtain semantic representation of the text, wherein the semantic representation comprises key events, entities and relationships among the entities.
Extracting events, entities and relationships among the entities: in this application scenario, the present embodiment extracts information about decisions, actions and related entities from the document. By adopting technologies such as Named Entity Recognition (NER) and relation extraction, the embodiment recognizes key entities involved in text, such as characters, organizations, time, etc., and captures relations between entities, such as decision events, action events, etc. For example, in a history document, "organization a issues a new policy", the "organization a" and the "new policy" are key entities, and the relationship is "issue".
Determining a text logic structure: with the extracted events, entities and relationships among the entities, the embodiment can determine the logical structure of the text to better understand the content of the document. In this step, the present embodiment considers associations between events and entities in the documents and maps these associations onto logical structures. For example, in processing a document describing a decision of an organization, the present embodiment determines that the logical structure is that "decision event" includes two core elements of "decision body" and "decision content".
Fusing text structure and semantic information: in order to fully understand the document, the embodiment combines the text structure and the extracted semantic information to construct a more complete text understanding model. In this process, we can use tools such as a grammar analyzer, semantic role labeling, etc. to further understand sentence structure and context. Such fusion helps to build a more detailed and accurate logical structure.
In the scenario of automatic writing of historical document data, this process provides a key semantic basis for subsequent intelligent writing. By deep mining of semantic information of historical documents, the embodiment can more accurately understand meanings contained in the semantic information, and a solid foundation is laid for generating a new document with high quality and logic accord with a model.
In an embodiment of the artificial intelligence based document generating method of the present application, referring to fig. 4, the following may be further specifically included:
Step S401: determining a corresponding writing purpose according to the key information, and determining a corresponding text paragraph structure according to the text logic structure;
step S402: and determining corresponding writing logic and writing context according to the text paragraph structure and the writing style corresponding to the writing purpose.
Optionally, in this embodiment, in the scenario of automatic writing of historical document data, the determination of the structure of text paragraphs, writing logic and writing context from the key information to the writing purpose is a critical loop in the whole automatic writing process. The method specifically comprises the following steps:
Determining the writing purpose according to the key information: on the basis of the extraction of the key information, the embodiment can identify key issues, topics or decision events in the text. This key information helps the present embodiment to accurately define the purpose of writing, that is, the underlying goal of writing a document. For example, if we extract the key information of "new policy issue" from the history document, the present embodiment can determine that the authoring is aimed at introducing and interpreting the new policy.
Determining paragraph structure according to the text logic structure: through analysis of the text logic structure, the embodiment can understand the relation between different events and entities in the text, so that a reasonable text paragraph structure is constructed. Each paragraph may be expanded around a particular event, decision, or topic, ensuring logical coherence and structural clarity. For example, when composing a portfolio regarding a new policy, different paragraphs may introduce policy context, primary content, implementation plan, etc., respectively.
Determining writing logic and writing context: based on the text paragraph structure and the authoring purpose, the present embodiment further determines the overall authoring logic and authoring context. This includes selecting the proper manner of narration, mood, style of words, etc. to ensure that the overall style of the document is consistent with the objectives of writing. For example, if the authoring objective is to introduce a policy, the authoring logic may include a gradual expansion from background to specific policy content, and the authoring context may choose more formal terms and expressions.
Fusing text structure and writing style: finally, in the whole automatic writing process, the embodiment comprehensively considers the text structure, the writing logic and the writing context, ensures that the text structure, the writing logic and the writing context are coordinated with each other, and forms a complete document with uniform style. This involves careful information organization, sentence concatenation, and paragraph conversion, so that the entire text appears natural and smooth both logically and contextually.
In summary, by gradually determining the writing purpose, the text paragraph structure, the writing logic and the writing context from the key information, the embodiment can build an automatic writing system, generate new documents according with logic and style according to the historical document data, and provide an efficient and reliable solution for document writing.
In an embodiment of the artificial intelligence based document generating method of the present application, referring to fig. 5, the following may be further specifically included:
Step S501: setting the writing logic and the writing context as entities in a preset knowledge graph and determining the relationship between the entities;
step S502: and carrying out entity expansion according to the entity information of the entity, and carrying out relationship expansion according to the relationship path of the relationship between the entities to obtain a knowledge graph after the entity expansion and the relationship expansion.
Optionally, in this embodiment, during the automated process of writing the document, the writing logic and the writing context are set as entities in the preset knowledge graph, and the relationship between the entities is determined to further expand the knowledge graph, so as to provide more relevant information. The method specifically comprises the following steps:
Setting writing logic and writing context as a knowledge graph entity: the writing logic and the writing context are set as the entities in the knowledge graph to establish corresponding nodes in the knowledge graph so as to expand the entities and the relations subsequently. This can be viewed as converting abstract concepts of authoring purposes, paragraph structures, etc., into computer-understandable entities.
Determining the relationship between entities: based on the text logic structure and the writing context, a relationship between entities is determined, such as an association between "writing logic" and "writing destination", or a relationship between "writing context" and "text paragraph structure". These relationships may be represented as edges in the knowledge graph, connecting the corresponding nodes.
And (3) expanding the entity and the relationship: and expanding the knowledge graph based on the set entity and relationship. This involves two main aspects:
physical expansion: and (3) carrying out entity expansion according to the writing logic and the writing context as entities, and finding out more information related to the entity expansion. For example, if the authoring logic is "introduce new policies," then details, relevant regulations, etc. relating to the new policies may be expanded.
Relationship expansion: and expanding the relationship among the entities according to the set relationship path. The method can further expand more association relations by searching other entities in the knowledge graph. For example, by associating "authoring logic" with "authoring purpose", the relationship between "authoring purpose" and "text paragraph structure" can be expanded.
Finally, through the expansion of the entity and the relationship, the embodiment obtains a richer and more detailed knowledge graph. The knowledge graph contains various entities and relations related to the writing logic and the writing context, and provides more background information and associated knowledge for the subsequent writing process.
By converting the writing logic and the writing context into the entities in the knowledge graph and expanding the entities and the relations, the embodiment can build a more comprehensive context in the knowledge graph, and provide richer and deeper information support for automatic document writing. This helps ensure that the generated document is both logical and rich enough in information.
In an embodiment of the artificial intelligence based document generating method of the present application, referring to fig. 6, the following may be further specifically included:
Step S601: converting the format of the expansion result obtained by the knowledge graph according to the data format of the model training set, and setting a corresponding label for each data sample to obtain the model training set, wherein the label comprises the affiliated writing logic or the affiliated writing context;
Step S602: and inputting the model training set into a preset training model to perform model training to obtain a document writing model.
Optionally, in this embodiment, after the expansion result of the knowledge graph is obtained, the knowledge graph needs to be converted into a model training set, a corresponding label is set, and a preset pre-training model is used for model training. The method specifically comprises the following steps:
Format conversion and tag settings: and carrying out format conversion on the expansion result of the knowledge graph according to the data format of the model training set. For example, includes mapping nodes and edges in the atlas to data samples in the model training set, ensuring that each node and edge has a corresponding feature representation. At the same time, a corresponding label is set for each data sample, which can represent the associated authoring logic or authoring context.
Constructing a model training set: and forming the data after format conversion and label setting into a model training set. Each data sample contains a portion of the information in the atlas, which is characterized by entities and relationships, and labels represent the associated authoring logic or authoring context. This training set will be used to train the document composition model.
Inputting a pre-training model for training: and inputting the constructed model training set into a preset pre-training model to perform model training. During the training process, the model will learn the representation of the entities and relationships in the knowledge graph and attempt to predict the authoring logic or context to which each data sample belongs. This process utilizes a priori knowledge of the natural language understanding of the pre-trained model and adapts to the needs of the particular task through iterative optimization.
After model training, the embodiment obtains a document writing model capable of understanding the context and logic in the knowledge graph. The model can generate a document meeting requirements according to the input writing logic and the writing context. The training of the model enables understanding and application of key information, logical structures and contexts, enabling automated writing of documents meeting expectations.
Through the steps, the embodiment completes the construction process from the knowledge graph expansion result to the document writing model. The model can ensure that the generated document has logic consistency and accords with specific writing purposes and contexts by means of rich information in the knowledge graph when writing the document.
In an embodiment of the artificial intelligence based document generating method of the present application, referring to fig. 7, the following may be further specifically included:
step S701: taking an expansion result obtained by the knowledge graph as a verification set to carry out model evaluation on the document writing model;
Step S702: and carrying out parameter adjustment on the document writing model according to the model evaluation result to obtain an updated document writing model.
Optionally, in this embodiment, the knowledge spectrum expansion result is used as a verification set for evaluating the trained document writing model. In the evaluation process, the model generates corresponding documents and compares the generated documents with the real tags in the verification set. The evaluation index may include logical consistency of the generated text, grammar correctness, consistency with associated information in the knowledge-graph, and the like.
And according to the result of the model evaluation, carrying out parameter adjustment of the document writing model. Including, for example, adjusting the hyper-parameters of the model, optimizing the learning rate of the algorithm, etc. The method aims to improve the performance of the model, so that the model is more accurate and meets the expected standard when generating the official document.
And obtaining an updated document writing model through parameter adjustment. This model, after evaluation by the validation set, is more adapted to generate documents that conform to logic, structure, and context. The process is an iterative optimization process, and parameters of the model are continuously adjusted to gradually optimize the model, so that the requirements of specific writing tasks are better met.
Comprehensively, by taking the expansion result of the knowledge graph as a verification set and evaluating and adjusting parameters of the document writing model, the performance of the model can be continuously improved, so that the model is better suitable for actual application scenes. This iterative process may be iterated to continually optimize and improve the performance of the document composition model.
In order to effectively improve the accuracy and efficiency of document generation, the present application provides an embodiment of an artificial intelligence based document generation device for implementing all or part of the content of the artificial intelligence based document generation method, referring to fig. 8, the artificial intelligence based document generation device specifically includes the following contents:
The historical document data analysis module 10 is configured to encode historical document data by presetting a natural semantic recognition model, determine a corresponding text semantic representation, extract key information from the text semantic representation, determine a corresponding text logic structure according to the extracted key information, and determine a corresponding writing logic and a writing context according to the key information and the text logic structure;
The document composition model construction module 20 is configured to determine a relationship between entities in a preset knowledge graph according to the composition logic and the composition context, construct a knowledge graph, and perform model training on a preset pre-training model by using an expansion result obtained according to the knowledge graph as a model training set to obtain a document composition model;
The automatic document writing module 30 is configured to receive a document writing request sent by a user, and obtain a corresponding target document according to the document writing request and the document writing model.
As can be seen from the above description, the artificial intelligence-based document generation device provided by the embodiment of the present application is capable of encoding historical document data by presetting a natural semantic recognition model, determining a corresponding text semantic representation, extracting key information from the text semantic representation, determining a corresponding text logic structure according to the extracted key information, and determining a corresponding writing logic and a writing context according to the key information and the text logic structure; determining the relation between entities in a preset knowledge graph according to the writing logic and the writing context, constructing to obtain the knowledge graph, taking an expansion result obtained according to the knowledge graph as a model training set, and carrying out model training on a preset training model to obtain a document writing model; receiving a document writing request sent by a user, and obtaining a target document according to the document writing request and a document writing model, so that document generation efficiency and accuracy can be effectively improved.
In order to effectively improve the accuracy and efficiency of document generation from a hardware level, the application provides an embodiment of an electronic device for implementing all or part of contents in the artificial intelligence-based document generation method, wherein the electronic device specifically comprises the following contents:
A processor (processor), a memory (memory), a communication interface (Communications Interface), and a bus; the processor, the memory and the communication interface complete communication with each other through the bus; the communication interface is used for realizing information transmission between the artificial intelligence-based document generating device and related equipment such as a core service system, a user terminal, a related database and the like; the logic controller may be a desktop computer, a tablet computer, a mobile terminal, etc., and the embodiment is not limited thereto. In this embodiment, the logic controller may refer to an embodiment of the document generating method based on artificial intelligence and an embodiment of the document generating device based on artificial intelligence, and the contents thereof are incorporated herein and are not repeated here.
It is understood that the user terminal may include a smart phone, a tablet electronic device, a network set top box, a portable computer, a desktop computer, a Personal Digital Assistant (PDA), a vehicle-mounted device, a smart wearable device, etc. Wherein, intelligent wearing equipment can include intelligent glasses, intelligent wrist-watch, intelligent bracelet etc..
In practical applications, part of the document generation method based on artificial intelligence can be executed on the electronic device side as described above, or all operations can be completed in the client device. Specifically, the selection may be made according to the processing capability of the client device, and restrictions of the use scenario of the user. The application is not limited in this regard. If all operations are performed in the client device, the client device may further include a processor.
The client device may have a communication module (i.e. a communication unit) and may be connected to a remote server in a communication manner, so as to implement data transmission with the server. The server may include a server on the side of the task scheduling center, and in other implementations may include a server of an intermediate platform, such as a server of a third party server platform having a communication link with the task scheduling center server. The server may include a single computer device, a server cluster formed by a plurality of servers, or a server structure of a distributed device.
Fig. 9 is a schematic structural diagram of an electronic device in an embodiment of the application. As shown in fig. 9, the electronic device 9600 may include a central processor 9100 and a memory 9140; the memory 9140 is coupled to the central processor 9100. Notably, this fig. 9 is exemplary; other types of structures may also be used in addition to or in place of the structures to implement telecommunications functions or other functions.
In one embodiment, the artificial intelligence based document generation method functionality may be integrated into the central processor 9100. The central processor 9100 may be configured to perform the following control:
Step S101: and encoding the historical document data through a preset natural semantic recognition model, determining corresponding text semantic representation, extracting key information from the text semantic representation, determining a corresponding text logic structure according to the extracted key information, and determining corresponding writing logic and writing context according to the key information and the text logic structure.
Step S102: determining the relation between entities in a preset knowledge graph according to the writing logic and the writing context, constructing the knowledge graph, taking an expansion result obtained according to the knowledge graph as a model training set, and carrying out model training on a preset training model to obtain a document writing model.
Step S103: and receiving a document writing request sent by a user, and obtaining a corresponding target document according to the document writing request and the document writing model.
As can be seen from the above description, the electronic device provided by the embodiment of the present application encodes the historical document data by presetting a natural semantic recognition model, determines a corresponding text semantic representation, extracts key information from the text semantic representation, determines a corresponding text logic structure according to the extracted key information, and determines a corresponding writing logic and a writing context according to the key information and the text logic structure; determining the relation between entities in a preset knowledge graph according to the writing logic and the writing context, constructing to obtain the knowledge graph, taking an expansion result obtained according to the knowledge graph as a model training set, and carrying out model training on a preset training model to obtain a document writing model; receiving a document writing request sent by a user, and obtaining a target document according to the document writing request and a document writing model, so that document generation efficiency and accuracy can be effectively improved.
In another embodiment, the artificial intelligence based document generating apparatus may be configured separately from the central processor 9100, for example, the artificial intelligence based document generating apparatus may be configured as a chip connected to the central processor 9100, and the artificial intelligence based document generating method function is implemented by control of the central processor.
As shown in fig. 9, the electronic device 9600 may further include: a communication module 9110, an input unit 9120, an audio processor 9130, a display 9160, and a power supply 9170. It is noted that the electronic device 9600 need not include all of the components shown in fig. 9; in addition, the electronic device 9600 may further include components not shown in fig. 9, and reference may be made to the related art.
As shown in fig. 9, the central processor 9100, sometimes referred to as a controller or operational control, may include a microprocessor or other processor device and/or logic device, which central processor 9100 receives inputs and controls the operation of the various components of the electronic device 9600.
The memory 9140 may be, for example, one or more of a buffer, a flash memory, a hard drive, a removable media, a volatile memory, a non-volatile memory, or other suitable device. The information about failure may be stored, and a program for executing the information may be stored. And the central processor 9100 can execute the program stored in the memory 9140 to realize information storage or processing, and the like.
The input unit 9120 provides input to the central processor 9100. The input unit 9120 is, for example, a key or a touch input device. The power supply 9170 is used to provide power to the electronic device 9600. The display 9160 is used for displaying display objects such as images and characters. The display may be, for example, but not limited to, an LCD display.
The memory 9140 may be a solid state memory such as Read Only Memory (ROM), random Access Memory (RAM), SIM card, etc. But also a memory which holds information even when powered down, can be selectively erased and provided with further data, an example of which is sometimes referred to as EPROM or the like. The memory 9140 may also be some other type of device. The memory 9140 includes a buffer memory 9141 (sometimes referred to as a buffer). The memory 9140 may include an application/function storage portion 9142, the application/function storage portion 9142 storing application programs and function programs or a flow for executing operations of the electronic device 9600 by the central processor 9100.
The memory 9140 may also include a data store 9143, the data store 9143 for storing data, such as contacts, digital data, pictures, sounds, and/or any other data used by an electronic device. The driver storage portion 9144 of the memory 9140 may include various drivers of the electronic device for communication functions and/or for performing other functions of the electronic device (e.g., messaging applications, address book applications, etc.).
The communication module 9110 is a transmitter/receiver 9110 that transmits and receives signals via an antenna 9111. A communication module (transmitter/receiver) 9110 is coupled to the central processor 9100 to provide input signals and receive output signals, as in the case of conventional mobile communication terminals.
Based on different communication technologies, a plurality of communication modules 9110, such as a cellular network module, a bluetooth module, and/or a wireless local area network module, etc., may be provided in the same electronic device. The communication module (transmitter/receiver) 9110 is also coupled to a speaker 9131 and a microphone 9132 via an audio processor 9130 to provide audio output via the speaker 9131 and to receive audio input from the microphone 9132 to implement usual telecommunications functions. The audio processor 9130 can include any suitable buffers, decoders, amplifiers and so forth. In addition, the audio processor 9130 is also coupled to the central processor 9100 so that sound can be recorded locally through the microphone 9132 and sound stored locally can be played through the speaker 9131.
An embodiment of the present application further provides a computer-readable storage medium capable of implementing all steps in the artificial intelligence-based document generation method in which the execution subject in the above embodiment is a server or a client, the computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements all steps in the artificial intelligence-based document generation method in which the execution subject in the above embodiment is a server or a client, for example, the processor implements the following steps when executing the computer program:
Step S101: and encoding the historical document data through a preset natural semantic recognition model, determining corresponding text semantic representation, extracting key information from the text semantic representation, determining a corresponding text logic structure according to the extracted key information, and determining corresponding writing logic and writing context according to the key information and the text logic structure.
Step S102: determining the relation between entities in a preset knowledge graph according to the writing logic and the writing context, constructing the knowledge graph, taking an expansion result obtained according to the knowledge graph as a model training set, and carrying out model training on a preset training model to obtain a document writing model.
Step S103: and receiving a document writing request sent by a user, and obtaining a corresponding target document according to the document writing request and the document writing model.
As can be seen from the above description, the computer readable storage medium provided by the embodiment of the present application encodes the historical document data by presetting a natural semantic recognition model, determines a corresponding text semantic representation, extracts key information from the text semantic representation, determines a corresponding text logic structure according to the extracted key information, and determines a corresponding writing logic and a writing context according to the key information and the text logic structure; determining the relation between entities in a preset knowledge graph according to the writing logic and the writing context, constructing to obtain the knowledge graph, taking an expansion result obtained according to the knowledge graph as a model training set, and carrying out model training on a preset training model to obtain a document writing model; receiving a document writing request sent by a user, and obtaining a target document according to the document writing request and a document writing model, so that document generation efficiency and accuracy can be effectively improved.
Embodiments of the present application further provide a computer program product capable of implementing all the steps in the artificial intelligence based document generating method in which the execution subject in the above embodiments is a server or a client, and the computer program/instructions implement the steps of the artificial intelligence based document generating method when executed by a processor, for example, the computer program/instructions implement the steps of:
Step S101: and encoding the historical document data through a preset natural semantic recognition model, determining corresponding text semantic representation, extracting key information from the text semantic representation, determining a corresponding text logic structure according to the extracted key information, and determining corresponding writing logic and writing context according to the key information and the text logic structure.
Step S102: determining the relation between entities in a preset knowledge graph according to the writing logic and the writing context, constructing the knowledge graph, taking an expansion result obtained according to the knowledge graph as a model training set, and carrying out model training on a preset training model to obtain a document writing model.
Step S103: and receiving a document writing request sent by a user, and obtaining a corresponding target document according to the document writing request and the document writing model.
As can be seen from the above description, the computer program product provided by the embodiment of the present application encodes the historical document data by presetting a natural semantic recognition model, determines a corresponding text semantic representation, extracts key information from the text semantic representation, determines a corresponding text logic structure according to the extracted key information, and determines a corresponding writing logic and a writing context according to the key information and the text logic structure; determining the relation between entities in a preset knowledge graph according to the writing logic and the writing context, constructing to obtain the knowledge graph, taking an expansion result obtained according to the knowledge graph as a model training set, and carrying out model training on a preset training model to obtain a document writing model; receiving a document writing request sent by a user, and obtaining a target document according to the document writing request and a document writing model, so that document generation efficiency and accuracy can be effectively improved.
It will be apparent to those skilled in the art that embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (devices), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The principles and embodiments of the present invention have been described in detail with reference to specific examples, which are provided to facilitate understanding of the method and core ideas of the present invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.

Claims (8)

1. An artificial intelligence based document generation method, the method comprising:
encoding historical document data through a preset natural semantic recognition model, determining corresponding text semantic representation, extracting key information from the text semantic representation, determining a corresponding text logic structure according to the extracted key information, and determining corresponding writing logic and writing context according to the key information and the text logic structure;
determining the relation between entities in a preset knowledge graph according to the writing logic and the writing context, constructing to obtain the knowledge graph, taking an expansion result obtained according to the knowledge graph as a model training set, and carrying out model training on a preset training model to obtain a document writing model;
And receiving a document writing request sent by a user, and obtaining a corresponding target document according to the document writing request and the document writing model.
2. The artificial intelligence based document generation method according to claim 1, wherein before the encoding of the historical document data by the preset natural semantic recognition model, determining the corresponding text semantic representation, comprising:
Removing noise and characters from the historical document data, and marking the event, the entity and the text of the relation among the entities on the historical document data after the noise and the characters are removed;
and converting the historical document data subjected to the text annotation into vector representation.
3. The artificial intelligence based document generation method of claim 1, wherein the extracting key information from the text semantic representation and determining a corresponding text logic structure according to the extracted key information comprises:
extracting at least one of an event, an entity and a relationship between entities from the text semantic representation;
And determining a corresponding text logic structure according to the text semantic representation corresponding text structure and at least one of the extracted event, entity and relationship among the entities.
4. The artificial intelligence based document generation method of claim 1, wherein the determining corresponding authoring logic and authoring context based on the key information and the text logic structure comprises:
determining a corresponding writing purpose according to the key information, and determining a corresponding text paragraph structure according to the text logic structure;
And determining corresponding writing logic and writing context according to the text paragraph structure and the writing style corresponding to the writing purpose.
5. The method for generating artificial intelligence based document according to claim 1, wherein the determining the relationship between the entities in the preset knowledge graph according to the authoring logic and the authoring context and constructing the knowledge graph includes:
setting the writing logic and the writing context as entities in a preset knowledge graph and determining the relationship between the entities;
And carrying out entity expansion according to the entity information of the entity, and carrying out relationship expansion according to the relationship path of the relationship between the entities to obtain a knowledge graph after the entity expansion and the relationship expansion.
6. The artificial intelligence-based document generation method according to claim 1, wherein the training the model training of the preset training model by using the expansion result obtained according to the knowledge graph as a model training set to obtain a document writing model comprises:
converting the format of the expansion result obtained by the knowledge graph according to the data format of the model training set, and setting a corresponding label for each data sample to obtain the model training set, wherein the label comprises the affiliated writing logic or the affiliated writing context;
and inputting the model training set into a preset training model to perform model training to obtain a document writing model.
7. The artificial intelligence based document generation method according to claim 1, wherein after the model training is performed on the preset pre-training model to obtain a document writing model, the method comprises:
Taking an expansion result obtained by the knowledge graph as a verification set to carry out model evaluation on the document writing model;
and carrying out parameter adjustment on the document writing model according to the model evaluation result to obtain an updated document writing model.
8. An artificial intelligence based document generation device, the device comprising:
The historical document data analysis module is used for encoding the historical document data through a preset natural semantic recognition model, determining corresponding text semantic representation, extracting key information from the text semantic representation, determining a corresponding text logic structure according to the extracted key information, and determining corresponding writing logic and writing context according to the key information and the text logic structure;
The document writing model construction module is used for determining the relation between the entities in the preset knowledge graph according to the writing logic and the writing context, constructing and obtaining the knowledge graph, taking an expansion result obtained according to the knowledge graph as a model training set, and carrying out model training on a preset training model to obtain a document writing model;
And the automatic document writing module is used for receiving a document writing request sent by a user and obtaining a corresponding target document according to the document writing request and the document writing model.
CN202410398569.1A 2024-04-03 2024-04-03 Document generation method and device based on artificial intelligence Pending CN117992601A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410398569.1A CN117992601A (en) 2024-04-03 2024-04-03 Document generation method and device based on artificial intelligence

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410398569.1A CN117992601A (en) 2024-04-03 2024-04-03 Document generation method and device based on artificial intelligence

Publications (1)

Publication Number Publication Date
CN117992601A true CN117992601A (en) 2024-05-07

Family

ID=90895672

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410398569.1A Pending CN117992601A (en) 2024-04-03 2024-04-03 Document generation method and device based on artificial intelligence

Country Status (1)

Country Link
CN (1) CN117992601A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200218988A1 (en) * 2019-01-08 2020-07-09 International Business Machines Corporation Generating free text representing semantic relationships between linked entities in a knowledge graph
CN111832275A (en) * 2020-09-21 2020-10-27 北京百度网讯科技有限公司 Text creation method, device, equipment and storage medium
CN113626614A (en) * 2021-08-19 2021-11-09 车智互联(北京)科技有限公司 Method, device, equipment and storage medium for constructing information text generation model
CN113919336A (en) * 2021-10-20 2022-01-11 平安科技(深圳)有限公司 Article generation method and device based on deep learning and related equipment
CN117521813A (en) * 2023-11-20 2024-02-06 中诚华隆计算机技术有限公司 Scenario generation method, device, equipment and chip based on knowledge graph

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200218988A1 (en) * 2019-01-08 2020-07-09 International Business Machines Corporation Generating free text representing semantic relationships between linked entities in a knowledge graph
CN111832275A (en) * 2020-09-21 2020-10-27 北京百度网讯科技有限公司 Text creation method, device, equipment and storage medium
CN113626614A (en) * 2021-08-19 2021-11-09 车智互联(北京)科技有限公司 Method, device, equipment and storage medium for constructing information text generation model
CN113919336A (en) * 2021-10-20 2022-01-11 平安科技(深圳)有限公司 Article generation method and device based on deep learning and related equipment
CN117521813A (en) * 2023-11-20 2024-02-06 中诚华隆计算机技术有限公司 Scenario generation method, device, equipment and chip based on knowledge graph

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
RIK KONCEL-KEDZIORSKI: "text generation knowledge graph entity model paragraph semantic context", 《COMPUTATION AND LANGUAGE》, 22 March 2022 (2022-03-22), pages 1 - 10 *
冯晨: "基于知识图谱的文本自动生成技术的研究与实现", 《中国优秀硕士学位论文全文数据库 信息科技》, 15 July 2020 (2020-07-15), pages 138 - 1606 *

Similar Documents

Publication Publication Date Title
CN111241237B (en) Intelligent question-answer data processing method and device based on operation and maintenance service
CN111708869B (en) Processing method and device for man-machine conversation
US11762926B2 (en) Recommending web API's and associated endpoints
CN111417949A (en) Content-based transformation of digital documents
CN112948534A (en) Interaction method and system for intelligent man-machine conversation and electronic equipment
KR102294364B1 (en) System for automatically converting document based on artificial intelligence and method thereof
CN111767394A (en) Abstract extraction method and device based on artificial intelligence expert system
CN111798118B (en) Enterprise operation risk monitoring method and device
CN111553138B (en) Auxiliary writing method and device for standardizing content structure document
CN112463942A (en) Text processing method and device, electronic equipment and computer readable storage medium
CN113342948A (en) Intelligent question and answer method and device
CN115115984A (en) Video data processing method, apparatus, program product, computer device, and medium
CN114911893A (en) Method and system for automatically constructing knowledge base based on knowledge graph
CN113220951B (en) Medical clinic support method and system based on intelligent content
CN112582073B (en) Medical information acquisition method, device, electronic equipment and medium
CN114282498A (en) Data knowledge processing system applied to electric power transaction
CN112199954B (en) Disease entity matching method and device based on voice semantics and computer equipment
CN113672699A (en) Knowledge graph-based NL2SQL generation method
CN112925895A (en) Natural language software operation and maintenance method and device
CN117473054A (en) Knowledge graph-based general intelligent question-answering method and device
CN117992601A (en) Document generation method and device based on artificial intelligence
CN114842982A (en) Knowledge expression method, device and system for medical information system
CN113887244A (en) Text processing method and device
CN113158635A (en) Electronic report generation method and device
CN107220249A (en) Full-text search based on classification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination