CN107622042A - Document generation method and device, storage medium and electronic equipment - Google Patents

Document generation method and device, storage medium and electronic equipment Download PDF

Info

Publication number
CN107622042A
CN107622042A CN201710758045.9A CN201710758045A CN107622042A CN 107622042 A CN107622042 A CN 107622042A CN 201710758045 A CN201710758045 A CN 201710758045A CN 107622042 A CN107622042 A CN 107622042A
Authority
CN
China
Prior art keywords
document
sentence
knowledge
nodes
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710758045.9A
Other languages
Chinese (zh)
Other versions
CN107622042B (en
Inventor
师玉娇
李宝善
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Iflytek Shanghai Mdt Infotech Ltd
iFlytek Co Ltd
Original Assignee
iFlytek Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by iFlytek Co Ltd filed Critical iFlytek Co Ltd
Priority to CN201710758045.9A priority Critical patent/CN107622042B/en
Publication of CN107622042A publication Critical patent/CN107622042A/en
Application granted granted Critical
Publication of CN107622042B publication Critical patent/CN107622042B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a document generation method, a document generation device, a storage medium and electronic equipment, wherein the method comprises the following steps: extracting information of related data of a document to be generated to acquire content information required by document generation; performing knowledge representation on the content information; based on the knowledge representation of the content information, a document is automatically generated. The method and the device can solve the problems of time and labor consumption, low efficiency and the like caused by that the content structure unfixed document can only be generated by adopting a manual method in the prior art.

Description

A kind of document generation method, device, storage medium and electronic equipment
Technical field
The present invention relates to document to generate field, more particularly to a kind of document generation method, device, storage medium and electronics are set It is standby.
Background technology
Spatial term (Natural Language Generation, NLG) technology is artificial intelligence A very active field in (Artificial Intelligence, AI), it has in all trades and professions widely should With the generating of, such as military papers, the generation of judicial document.So that the administration of justice is applied as an example, during the administration of justice is handled a case, required document Numerous, the processing work of its document is a pith during judicial handle a case, and the efficiency of processing, which directly affects, handles a case The height of efficiency.
Existing judicial document generation mainly uses following two methods:
1st, generated based on artificial document:Be related to the relevant staff in field by document, based on professional knowledge, experience and Document related data to be generated, the document of manually generated needs.
2nd, the document generation based on mould plate technique:Several phases are constructed in advance according to several situations being likely to occur in advance first The template answered, each template include some constants and some variables.After user inputs certain information, text generator by this A little information are embedded into substitute variable in template as character string, generate text.
However, above-mentioned existing scheme, which lacks, has following deficiency:Taken time and effort, influenceed whole based on artificial document generation method Individual case handling efficiency, and under the present situation that current case increasingly increases, document generation undoubtedly brings very heavy to judicial functionary Work load;Document generation method based on mould plate technique, compared with pure manual method, though reduce department to a certain extent Method person works measure, but it can only solve the document generation of fixed structure part, and versatility is poor.
The content of the invention
To overcome above-mentioned the shortcomings of the prior art, the purpose of the present invention is to provide a kind of document generation method, dress Put, storage medium and electronic equipment, document can not only be fixed to generate content structure using manual method to solve prior art It is caused take time and effort, the problems such as efficiency is low.
For the above-mentioned purpose, technical scheme provided by the invention is as follows:
A kind of document generation method, comprises the following steps:
Step 1, information extraction is carried out to the related data of document to be generated, obtain the content information that document generation needs;
Step 2, the representation of knowledge is carried out to the content information;
Step 3, based on the representation of knowledge of the content information, automatically generate document.
Optionally, step 2 further comprises:
Structure of Knowledge Representation is determined to the content information;
By carrying out grammer and constituent analysis to particular content corresponding to each node in the Structure of Knowledge Representation, institute is filled Each node in Structure of Knowledge Representation is stated, rudimental knowledge is obtained and represents structure;
Obtained rudimental knowledge being represented to, each node in structure carries out polymerization restructuring, obtains knowing for the content information Know and represent.
Optionally, step 1 further comprises:
To each mark unit is labeled in the document related data to be generated;
The information of document generation needs is extracted from the data marked according to document type to be generated.
Optionally, in step 1, in addition to:Similarity analysis is carried out to the content extracted, it is low to reject similarity Content information.
Optionally, described the step of obtained preliminary representation of knowledge interior joint is carried out into polymerization restructuring, further wraps Include:
A document is chosen as documents based on, the initial knowledge for choosing the documents based on successively represents structure penult In each node;
The initial knowledge for traveling through other documents represents the penult of structure, compares two nodes and corresponds to the semantic similar of sentence Degree;
According to comparative result two nodes are carried out with polymerization restructuring.
Optionally, described the step of carrying out to two nodes according to comparative result and polymerize restructuring, is:
If comparative result, which is two nodes, which corresponds to sentence semantics, is associated, according to syntactic rule knowing in the documents based on Know and represent to merge origin node with other document nodes contrasted in structure;
If comparative result, which is two nodes, corresponds to the semantic identical of sentence, to the identical semantic node in other documents Deleted;
If comparative result corresponds to for two nodes, sentence semantics are unrelated, by each higher level corresponding to the node in other documents Node, increase in the Structure of Knowledge Representation of the documents based on.
Optionally, whether the predicate of sentence corresponding to two nodes is identical according to the syntactic rule carries out phase to two nodes The merging treatment answered.
Optionally, whether the predicate of the sentence according to corresponding to two nodes is identical is carried out at corresponding merging to two nodes Reason step specifically includes:
If the predicate that two nodes correspond to sentence is identical, determine whether the subject of two sentences is consistent;
If the subject of two sentences is consistent, a sentence is merged into, other portions if the subject of two sentences is inconsistent It is point identical, then two sentences are merged into the sentence with composite parts.
Optionally, whether the predicate of the sentence according to corresponding to two nodes is identical is carried out at corresponding merging to two nodes Reason step specifically includes:
Alternatively, whether the predicate of the sentence according to corresponding to two nodes is identical is carried out at corresponding merging to two nodes Reason step specifically includes:
If predicate corresponding to two nodes differs, determine whether the subject of two sentences is identical;
If the subject of two sentences is identical, the subject for omitting sentence corresponding to other document nodes merges.
Alternatively, if the subject of two sentences is identical, determine whether the adverbial modifier of two sentences is identical;
If the adverbial modifier of two sentences is also identical, the adverbial modifier for omitting sentence corresponding to other document nodes merges.
Optionally, when predicate corresponding to two nodes differs, if in two sentences, corresponding to the documents based on node The agent of sentence is identical with the modified composition of sentence corresponding to other document nodes, then sentence corresponding to other document nodes Corresponding composition change into refer to composition word.
The present invention also provides a kind of document generating means, including:
Information extracting unit, for carrying out information extraction to the related data of document to be generated, obtaining document generation needs Content information;
Representation of knowledge unit, for carrying out the representation of knowledge to the content information;
Document generation unit, for the representation of knowledge based on the content information, automatically generate document.
Optionally, the representation of knowledge unit further comprises:
Structure of Knowledge Representation determining unit, for determining Structure of Knowledge Representation to the content information;
Analytic unit, for by the Structure of Knowledge Representation corresponding to each node particular content carry out grammer and into Analysis, fills each node in the Structure of Knowledge Representation, obtains rudimental knowledge and represents structure;
Going to polymerize recomposition unit, each node for representing obtained rudimental knowledge in structure carries out polymerization restructuring, Obtain the representation of knowledge of the content information.
Optionally, it is described to go polymerization recomposition unit to include:
Node selection unit, for choosing a document as documents based on, the documents based on initial knowledge is chosen successively Represent each node in most penult in structure;
Comparing unit is traveled through, penult is represented for traveling through other document initial knowledges, compares two nodes and correspond to sentence Semantic similarity;
Comparative result processing unit, for being carried out polymerizeing reorganization according to comparative result.
The present invention also provides a kind of storage medium, wherein being stored with a plurality of instruction, the instruction is loaded by processor, performs The step of method described above.
The present invention also provides a kind of electronic equipment, and the electronic equipment includes;
Storage medium, a plurality of instruction is stored with, the instruction is loaded by processor, the step of performing method described above; And
Processor, for performing the instruction in the storage medium.
A kind of document generation method of the present invention, device, storage medium and electronic equipment from text to be generated is related by providing The content information that document generation needs is extracted in material, and the representation of knowledge is carried out to the content information, document is automatically generated, realizes For the purpose automatically generated of the unfixed document of content structure, manual method next life can only be used by solving prior art Into take time and effort caused by the unfixed document of content structure, efficiency is low the problems such as.
Brief description of the drawings
Fig. 1 is a kind of step flow chart of one embodiment of document generation method of the present invention;
Fig. 2 is the thin portion flow chart of step 102 in the specific embodiment of the invention;
Fig. 3 is the Structure of Knowledge Representation of every document (content information each extracted) in the specific embodiment of the invention Figure;
Fig. 4 is the thin portion flow chart of step S3 in the specific embodiment of the invention;
Fig. 5 is a kind of structural representation of one embodiment of document generating means of the present invention;
Fig. 6 is the detail structure chart of representation of knowledge unit in the specific embodiment of the invention;
Fig. 7 is the structural representation for the electronic equipment that the present invention is used for document generation method.
Embodiment
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, control is illustrated below The embodiment of the present invention.It should be evident that drawings in the following description are only some embodiments of the present invention, for For those of ordinary skill in the art, on the premise of not paying creative work, other can also be obtained according to these accompanying drawings Accompanying drawing, and obtain other embodiments.
To make simplified form, part related to the present invention is only schematically show in each figure, they are not represented Its practical structures as product.In addition, so that simplified form readily appreciates, there is identical structure or function in some figures Part, one of those is only symbolically depicted, or only marked one of those.Herein, "one" is not only represented " only this ", the situation of " more than one " can also be represented.
In one embodiment of the invention, as shown in figure 1, a kind of document generation method of the present invention, comprises the following steps:
Step 101, information extraction is carried out to the related data of document to be generated, obtains the content letter that document generation needs Breath;
In order to generate document to be generated, it is necessary first to obtain the related data of document to be generated.Therefore, take out entering row information , it is necessary to first obtain the related data of document to be generated before taking.The related data is typically gathered by electronic equipment and obtained, as A kind of example, the related data of document to be generated can be gathered by the camera device of smart machine, and it is carried out (Optical Character Recognition, optical character identification) obtains, wherein, smart machine can be mobile phone, individual Computer, tablet personal computer etc., the related data of certain document to be generated can also be obtained by smart machine by being manually entered, this hair It is bright to be not limited.
In disclosure scheme, document to be generated can be the unfixed document of content and structure, such as judicial document, system Document, economic document, military papers etc. are counted, by taking the generation of judicial document as an example, the document to be generated can be that content and structure is equal Unfixed merit description, defendant's criminal history, hearing process etc., the related data of document to be generated then refer to being used to give birth to Into the data of document, for judicial document, the related data of document to be generated may include hearing record, evidentiary material, court's trial note Record, the bill of complaint, instrument of appeal etc..
Herein it should be noted that, present invention is primarily aimed at the generation for solving the unfixed document of content structure, and In document generating process, document may be made up of plurality of kinds of contents structure type, such as also be fixed including content structure, structure is fixed The unfixed document of content, the generation for the part document, the generation of its document is realized using prior art, the present invention is not Limit.
It is to be generated written obtaining in this step in order to obtain the content information that document generation needs as a kind of example After the related data of book, document related data need to be labeled, specifically, can be to each mark list in related data Member, such as sentence, sentence group, paragraph etc., are labeled, and are then taken out further according to document type to be generated from the data marked Take out the content information that document generation needs.
By taking judicial document as an example, it is assumed that the related data of document to be generated includes former defendant's information, case please by, lawsuit Ask, the part such as true and reason, then each part is labeled respectively, specific mask method can use artificial mark side Method, i.e., manually it is labeled by staff, naturally it is also possible to which the semantic understanding based on neutral net is labeled, the present invention It is not limited.
Because the content information required for different document types is different, therefore, it is labeled to document related data Afterwards, it is also necessary to extract the content information of document generation needs from the data marked according to document type to be generated.For example, If document type to be generated is criminal judgment, the content according to required for criminal judgment is from the related data marked Content corresponding to extraction, specifically, such as rule of thumb know criminal judgment need comprising defendant's information, case details, Prosecute reason, fact-finding process and court verdict etc., then can pre-set corresponding label, according to the label set from mark The content information that document to be generated needs is extracted in the document related data being poured in.
As another example, in order to improve the accuracy of Extracting Information, it is contemplated that existing semantic understanding marks to obtain Information, the markup information such as obtained based on neutral net, it is understood that there may be error, further can also enter to the content extracted Row similarity analysis, the low information of similarity is rejected, in general, similarity is low to be represented needed for the content and text to be generated Content is unrelated, should then be rejected to improve accuracy.Specifically, the present invention can use LSA (Latent Semantic Analysis, latent semantic analysis) similarity analysis is carried out to the content that extracts, reject similarity it is low (with required content without Close) information, certain present invention is not limited, will not be described here.
Step 102, the representation of knowledge is carried out to the content information;
Fig. 2 is the thin portion flow chart of step 102 in the specific embodiment of the invention.As shown in Fig. 2 step 102 specifically includes:
Step S1, Structure of Knowledge Representation is determined to described information;
As a kind of example, in step S1, with reference to Schema patterns, the representation of knowledge is determined to the content information extracted Structure.Schema patterns mainly describe the rule of text structure using rhetorical predicates and certain operations symbol.The representation of knowledge knot Structure includes:Root root nodes, represent a document, and Root root nodes have n Schema child node;Schema child nodes, represent One paragraph, its child node are sentence group node or sentence node;Sentence group node, including n sentence node;Sentence node Predicate.It can be seen that using sentence node as final stage minimum unit node, sentence node Predicate in the Structure of Knowledge Representation It is the base unit of a document, each BSR semantic component in sentence is represented with Argument, if Argument is repaiied Composition is adornd, then is represented with Modify, will subsequently gone on to say.
It is illustrated in figure 3 the representation of knowledge of every document (content information each extracted) in the specific embodiment of the invention Structure chart.Wherein, Root is root node, represents a document, Root root nodes are included below have n Schema_i (i=[1, N]) child node, one paragraph of each Schema child nodes expression;Child node below Schema_i can be sentence group node Schema_i.j (j=[1, m]) or sentence node (such as Predicate_1~Predicate_h, represent h sentence Son).Child node below sentence group node Schema_i.j for different sentence node (predicate_i.j.1~ Predicate_i.j.g represents g sentence, i.e. a Schema_i.j sentence group node contains g sentence).
Step S2, grammer and constituent analysis are carried out to particular content corresponding to each node in Structure of Knowledge Representation, according to point Each node in result filling Structure of Knowledge Representation is analysed, obtains the preliminary Structure of Knowledge Representation of each node;
As described in step S1, Structure of Knowledge Representation is using sentence as minimum unit node in the present embodiment, therefore to each section When the corresponding particular content of point carries out grammer and constituent analysis, sentence is analyzed.
In the present embodiment, syntactic analysis mainly needs to carry out syntactic analysis to the phrase in sentence and sentence, word, specifically Ground, the syntactic analysis for sentence are needed according to sentence pattern structure, temporal feature and the tone type analysis rule pre-established, Distich clause type, temporal feature and tone type are analyzed, and in general, sentence pattern structure includes subject-predicate, non-subject-predicate, and special Clause, and in special clause, have words and expressions, by words and expressions, interlock sentence, pivotal sentence etc., in subject-predicate phrase, have again actively and by Dynamic point, temporal feature include three kinds of past tense, present tense and following tense, and tone type includes declarative sentence, interrogative sentence, Imperative sentence, exclamative sentence etc.;For syntactic analysis function type, structure type and the fixed type that then basis pre-establishes of phrase Three signature analysis rules, the function, structure and fixed type of phrase are analyzed, such as can be divided into according to function type nominal Three kinds of phrase, verb and adjective, subject-predicate is divided into according to structure type, moves guest, joint, the type of polarization four, fixed type Then include symmetrical configuration, four word idioms etc.;For word syntactic analysis then according to the word analysis rule pre-established, by word point For notional word and function word, and verb and thing or not as good as information such as thing information, the odd number of noun or plural numbers.Specific syntactic analysis Method can use the existing parsing methods such as SFG (SFG), functional unification grammar (FUG), and the present invention does not make Limit.
Sentence element analysis refers to the relation meaning from syntactic structure, and sentence element function or effect are divided Analysis, in the present embodiment, the composition of sentence mainly include BSR semantic component and ornamental equivalent, and BSR semantic component is used Argument represents that Argument ornamental equivalent is identified with Modify.Illustrate:For sentence, " Zhang San steals pearl item One, chain ", predicate are theft, and Argument 1 is Zhang San, and Argument 2 is necklace, and Modify 1 is pearl, Modify 2 For one.Sentence element analysis can use existing semantic understanding side to realize or manually marked, not superfluous to this present invention State.
Step S3, the node in the Structure of Knowledge Representation tentatively obtained is subjected to polymerization restructuring, obtains the content letter The representation of knowledge of breath, that is to say, that can obtain the section that each content information includes in text to be generated after going polymerization to recombinate Fall and each paragraph in the content of the sentence that includes.
Specifically, as shown in figure 4, step S3 further comprises following steps:
Step S31, a document is chosen as documents based on, the initial knowledge for choosing the documents based on successively represents structure In each node in most penult (i.e. sentence node);
Step S32, the initial knowledge for traveling through other documents represent the penult of structure, compare two nodes and correspond to sentence Semantic similarity, specifically, the semantic vector similarity realization that sentence can be corresponded to by calculating two nodes compare;
Step S33, polymerization reorganization is carried out to two nodes according to comparative result.Specifically, saved in comparative result for two When the corresponding sentence semantics of point are associated, such as semantic vector similarity is more than predetermined threshold value, then according to syntactic rule in the benchmark Origin node is merged with other document nodes contrasted in the representation of knowledge of document;In comparative result corresponding to two nodes Sentence it is semantic identical when, then the identical semantic node in other documents is deleted;When two node semantics are unrelated, Such as semantic vector similarity is less than predetermined threshold value, then by each superior node corresponding to the node in other documents, in benchmark Increase in the Structure of Knowledge Representation of document, for example, being respectively documents based on 1- paragraph 1- sentence group's 1- sentences corresponding to two nodes 1, other documents 2- paragraph 2- sentence group 2- sentence 2, if sentence 1 and sentence 2 are semantic unrelated, in the knowledge of documents based on 1 Represent to increase other documents 2- paragraph 2- sentence group 2- sentences 2 in structure.
Specifically, when sentence semantics corresponding to two nodes are associated, need to according to syntactic rule knowing in the documents based on Know in representing and merge origin node with other document nodes contrasted.Because the Structure of Knowledge Representation of the present invention is basis Schema patterns obtain, and Schema patterns are to describe the rule of text structure using rhetorical predicates and certain operations symbol, Therefore, syntactic rule here is mainly whether the predicate of sentence according to corresponding to two nodes is identical corresponding to the progress of two nodes Merging treatment, comprise the following steps that:
If the predicate of sentence is identical corresponding to two nodes, determine whether the subject of two sentences is consistent, if two The subject of individual sentence is consistent, then two sentences is merged into a sentence, for example, " Zhang San steals pearl necklace one " and " Three theft mobile phones one ", then a sentence " Zhang San steals pearl necklace one, mobile phone one " is can be merged into, if sentence In also have other compositions, then repeating part is omitted, such as " Zhang San steal Mrs Wang pearl necklace one " and " Zhang San steals King's women mobile phone one ", then carry out the omission " Zhang San steals Mrs's Wang pearl necklace one, mobile phone one " of repeating part;If The subject of two sentences is inconsistent and other parts are identical, then two sentences is merged into the sentence with composite parts, Such as " Zhang San steals Mrs's Wang pearl necklace one " and " Li Si steals Mrs's Wang pearl necklace one ", then it is merged into " Zhang San, Li Si respectively steal Mrs's Wang pearl necklace one ";
If the predicate of sentence differs corresponding to two nodes, the subject of two sentences is determined whether;If two sentences Subject it is identical, then omit the subject of second sentence (i.e. sentence corresponding to other document nodes) to avoid repeating, such as " open Three theft pearl necklaces one " and " Zhang San plunders mobile phone one ", then omit second subject and merge into " Zhang San theft pearl One, necklace, plunder mobile phone one ".Preferably, if the subject and adverbial modifier's all same of two sentences, second sentence is omitted (i.e. sentence corresponding to other document nodes) subject and the adverbial modifier are to avoid repeating, such as " Zhang San steals pearl item in Wang Nvshijia One, chain " and " Zhang San plunders mobile phone one in Wang Nvshijia ", then second adverbial modifier and subject are omitted as " Zhang San is in Mrs Wang Family's theft pearl necklace one, plunder mobile phone one ".
Preferably, when the predicate of sentence corresponding to two nodes differs, if in two sentences, first sentence (i.e. base Sentence corresponding to quasi- document node) agent (person of sending acted) (i.e. other document nodes are corresponding with second sentence Sentence) modified composition it is identical, the corresponding composition of second sentence can be changed into the word for referring to composition, as a kind of example, The agent of first sentence is the subject of sentence, such as " theft gang A has plundered bank ", the ornamental equivalent of second sentence be Attribute, such as " theft gang A eldest child absconds to foreign countries ", the corresponding composition of second sentence can then be changed when both are identical Into the word for referring to composition, as its, he, it is such, then two sentences can merge into " theft gang A has plundered bank, and its eldest child dives Escape to foreign countries ".
Step 103, the representation of knowledge based on the content information, automatically generates document.
The content information each extracted represents the content needed in document, the content information each extracted The particular content of Extracting Information is contained in the representation of knowledge, specifically includes paragraph, sentence group, sentence, and the grammer of sentence, into Divide information, after known above- mentioned information, you can automatically generate specific document.
A kind of document generation method of the present invention, device, storage medium and electronic equipment from text to be generated is related by providing The content information that document generation needs is extracted in material, and the representation of knowledge is carried out to the content information, document is automatically generated, realizes For the purpose automatically generated of the unfixed document of content structure, manual method next life can only be used by solving prior art Into take time and effort caused by the unfixed document of content structure, efficiency is low the problems such as.
In another embodiment of the present invention, as shown in figure 5, a kind of document generating means of the present invention, including:Information is taken out Take unit 51, representation of knowledge unit 52 and document generation unit 53.
Wherein, information extracting unit 51, for carrying out information extraction to the related data of document to be generated, document life is obtained Into the content information of needs;
Specifically, in order to generate document to be generated, information extracting unit 51 is firstly the need of the correlation for obtaining document to be generated Data, then need to be labeled document related data, extracted further according to document type to be generated from the data marked The content information that document generation needs.
Representation of knowledge unit 52, for carrying out the representation of knowledge to the content information;
Fig. 6 is the detail structure chart of representation of knowledge unit 52 in the specific embodiment of the invention.As shown in fig. 6, the representation of knowledge Unit 52 specifically includes:
Structure of Knowledge Representation determining unit 521, for determining Structure of Knowledge Representation to the content information;
Analytic unit 522, for carrying out grammer and composition point to particular content corresponding to each node in Structure of Knowledge Representation Analysis, each node in Structure of Knowledge Representation is filled according to analysis result, obtains the preliminary Structure of Knowledge Representation of each node;
Going to polymerize recomposition unit 523, the node for representing obtained rudimental knowledge in structure carries out polymerization restructuring, Obtain the representation of knowledge of each extraction content information, that is to say, that can obtain after going polymerization to recombinate every in text to be generated The content of the sentence included in paragraph and each paragraph that individual content information includes.
Specifically, polymerization recomposition unit 523 is gone to be specifically used for:
A document is chosen as documents based on, the initial knowledge for choosing the documents based on successively represents most final stage in structure Each node in node (i.e. sentence node);
The penult that other document initial knowledges represent structure is traveled through, compares two nodes and corresponds to the semantic similar of sentence Degree;
Polymerization reorganization is carried out to two nodes according to comparative result.Specifically, it is that two nodes correspond to sentence in comparative result When sub semantic associated, according to syntactic rule in the Structure of Knowledge Representation of the documents based on by origin node and contrast other Document node merges;In comparative result be two nodes correspond to sentence it is semantic identical when, in other documents this is identical Semantic node is deleted;In comparative result be two nodes correspond to sentence semantics it is unrelated when, by the node institute in other documents Corresponding each superior node, increases in the Structure of Knowledge Representation of the documents based on.
, need to will in the representation of knowledge of the documents based on according to syntactic rule when sentence semantics corresponding to two nodes are associated Origin node merges with other document nodes contrasted, and the predicate of sentence is corresponding to two nodes according to the syntactic rule It is no identical to the corresponding merging treatment of two nodes progress, be specially:
If the predicate of sentence is identical corresponding to two nodes, determine whether the subject of two sentences is consistent;If two The subject of individual sentence is consistent, then two sentences is merged into a sentence, if there is other compositions in sentence, is repeated Partial omission;If the subject of two sentences is inconsistent and other parts are identical, two sentences are merged into multiple The sentence of synthesis point;
If the predicate of sentence differs corresponding to two nodes, determine whether the subject of two sentences is identical, if The subject of two sentences is identical, then omit second sentence (i.e. sentence corresponding to other document nodes) subject with avoid again It is multiple, it is preferable that if the subject and adverbial modifier's all same of two sentences, to omit second sentence (i.e. corresponding to other document nodes Sentence) subject and the adverbial modifier to be to avoid repeating.
Preferably, when the predicate of sentence corresponding to two nodes differs, if in two sentences, first sentence (i.e. base Sentence corresponding to quasi- document node) agent (person of sending acted) (i.e. other document nodes are corresponding with second sentence Sentence) modified composition it is identical, the corresponding composition of second sentence can be changed into refer to composition word.
Referring to Fig. 7, show that the present invention is used for the structural representation of the electronic equipment 300 of document generation method.Reference picture 7, electronic equipment 300 includes processing component 301, and it further comprises one or more processors, and by the institute of storage medium 302 The storage device resource of representative, can be by the instruction of the execution of processing component 301, such as application program for storing.Storage medium The application program stored in 302 can include it is one or more each correspond to the module of one group of instruction.In addition, place Reason component 301 is configured as execute instruction, to perform each step of above-mentioned document generation method.
Electronic equipment 300 can also include a power supply module 303, be configured as performing the power supply pipe of electronic equipment 300 Reason;One wired or wireless network interface 304, it is configured as electronic equipment 300 being connected to network;With an input and output (I/O) interface 305.Electronic equipment 300 can be operated based on the operating system for being stored in storage device 302, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM or similar.
In summary, a kind of document generation method of the present invention and device, storage medium, electronic equipment are by be generated written The related content that document generation needs is extracted in this related data, and initial knowledge expression, integration are carried out to the content of extraction, it is real The purpose automatically generated for the unfixed document of content structure is showed, manual method can only be used by solving prior art Come generate taken time and effort caused by the unfixed document of content structure, efficiency is low the problems such as.
It should be noted that above-described embodiment can independent assortment as needed.Described above is only the preferred of the present invention Embodiment, it is noted that for those skilled in the art, do not departing from the premise of the principle of the invention Under, some improvements and modifications can also be made, these improvements and modifications also should be regarded as protection scope of the present invention.

Claims (16)

1. a kind of document generation method, including:
Step 1, information extraction is carried out to the related data of document to be generated, obtain the content information that document generation needs;
Step 2, the representation of knowledge is carried out to the content information;
Step 3, based on the representation of knowledge of the content information, automatically generate document.
2. a kind of document generation method as claimed in claim 1, it is characterised in that step 2 further comprises:
Structure of Knowledge Representation is determined to the content information;
By carrying out grammer and constituent analysis to particular content corresponding to each node in the Structure of Knowledge Representation, know described in filling Know and represent each node in structure, obtain rudimental knowledge and represent structure;
Obtained rudimental knowledge being represented to, each node in structure carries out polymerization restructuring, obtains the knowledge table of the content information Show.
3. a kind of document generation method as claimed in claim 1, it is characterised in that step 1 further comprises:
To each mark unit is labeled in the document related data to be generated;
The information of document generation needs is extracted from the data marked according to document type to be generated.
A kind of 4. document generation method as claimed in claim 3, it is characterised in that in step 1, in addition to:To extracting Content carry out similarity analysis, reject the low content information of similarity.
A kind of 5. document generation method as claimed in claim 2, it is characterised in that the preliminary representation of knowledge that will be obtained Interior joint further comprise the step of polymerizeing restructuring:
A document is chosen as documents based on, the initial knowledge for choosing the documents based on successively is represented in structure penult Each node;
The initial knowledge for traveling through other documents represents the penult of structure, compares the semantic similarity that two nodes correspond to sentence;
According to comparative result two nodes are carried out with polymerization restructuring.
6. a kind of document generation method as claimed in claim 5, it is characterised in that described according to comparative result polymerize The step of restructuring is:
Correspond to sentence semantics if comparative result is two nodes and be associated, according to syntactic rule the documents based on knowledge table Show in structure and merge origin node with other document nodes contrasted;
If comparative result, which is two nodes, corresponds to the semantic identical of sentence, the identical semantic node in other documents is carried out Delete;
If comparative result corresponds to for two nodes, sentence semantics are unrelated, and each higher level corresponding to the node in other documents is saved Point, increase in the Structure of Knowledge Representation of the documents based on.
A kind of 7. document generation method as claimed in claim 6, it is characterised in that:Two nodes pair according to the syntactic rule Whether the predicate for the sentence answered is identical to carry out corresponding merging treatment to two nodes.
8. a kind of document generation method as claimed in claim 7, it is characterised in that the sentence according to corresponding to two nodes Whether predicate is identical to specifically include to the corresponding merging treatment step of two nodes progress:
If the predicate that two nodes correspond to sentence is identical, determine whether the subject of two sentences is consistent;
If the subject of two sentences is consistent, a sentence is merged into;Other portions if the subject of two sentences is inconsistent It is point identical, then two sentences are merged into the sentence with composite parts.
A kind of 9. document generation method as claimed in claim 7, it is characterised in that:The sentence according to corresponding to two nodes Whether predicate is identical to specifically include to the corresponding merging treatment step of two nodes progress:
If predicate corresponding to two nodes differs, determine whether the subject of two sentences is identical;
If the subject of two sentences is identical, the subject for omitting sentence corresponding to other document nodes merges.
A kind of 10. document generation method as claimed in claim 9, it is characterised in that:
If the subject of two sentences is identical, determine whether the adverbial modifier of two sentences is identical;
If the adverbial modifier of two sentences is also identical, the adverbial modifier for omitting sentence corresponding to other document nodes merges.
A kind of 11. document generation method as claimed in claim 7, it is characterised in that:When predicate corresponding to two nodes differs When, if in two sentences, the agent sentence corresponding with other document nodes of sentence repaiies corresponding to the documents based on node Decorations property composition is identical, then the corresponding composition of sentence corresponding to other document nodes is changed into the word for referring to composition.
12. a kind of document generating means, including:
Information extracting unit, for carrying out information extraction to the related data of document to be generated, obtain the interior of document generation needs Hold information;
Representation of knowledge unit, for carrying out the representation of knowledge to the content information;
Document generation unit, for the representation of knowledge based on the content information, automatically generate document.
A kind of 13. document generating means as claimed in claim 12, it is characterised in that:The representation of knowledge unit further wraps Include:
Structure of Knowledge Representation determining unit, for determining Structure of Knowledge Representation to the content information;
Analytic unit, for by carrying out grammer and composition point to particular content corresponding to each node in the Structure of Knowledge Representation Analysis, fills each node in the Structure of Knowledge Representation, obtains rudimental knowledge and represents structure;
Go to polymerize recomposition unit, each node for representing obtained rudimental knowledge in structure carries out polymerization restructuring, obtains The representation of knowledge of the content information.
14. a kind of document generating means as claimed in claim 13, it is characterised in that described to go polymerization recomposition unit to include:
Node selection unit, for choosing a document as documents based on, documents based on initial knowledge expression is chosen successively Each node in structure in most penult;
Comparing unit is traveled through, penult is represented for traveling through other document initial knowledges, compares the language that two nodes correspond to sentence Adopted similarity;
Comparative result processing unit, for being carried out polymerizeing reorganization according to comparative result.
15. a kind of storage medium, wherein being stored with a plurality of instruction, it is characterised in that the instruction is loaded by processor, right of execution Profit requires the step of any one of 1 to 11 methods described.
16. a kind of electronic equipment, it is characterised in that the electronic equipment includes;
Storage medium, a plurality of instruction is stored with, the instruction is loaded by processor, and perform claim is required described in 1 to 11 any one The step of method;And
Processor, for performing the instruction in the storage medium.
CN201710758045.9A 2017-08-29 2017-08-29 Document generation method and device, storage medium and electronic equipment Active CN107622042B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710758045.9A CN107622042B (en) 2017-08-29 2017-08-29 Document generation method and device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710758045.9A CN107622042B (en) 2017-08-29 2017-08-29 Document generation method and device, storage medium and electronic equipment

Publications (2)

Publication Number Publication Date
CN107622042A true CN107622042A (en) 2018-01-23
CN107622042B CN107622042B (en) 2021-07-06

Family

ID=61089243

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710758045.9A Active CN107622042B (en) 2017-08-29 2017-08-29 Document generation method and device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN107622042B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110245210A (en) * 2019-06-25 2019-09-17 北京市律典通科技有限公司 A kind of element fusion method and system
CN110895568A (en) * 2018-09-13 2020-03-20 阿里巴巴集团控股有限公司 Method and system for processing court trial records
CN112348714A (en) * 2020-11-05 2021-02-09 科大讯飞股份有限公司 Evidence chain construction method, electronic device and storage medium
CN113689176A (en) * 2021-07-15 2021-11-23 东风汽车集团股份有限公司 Method and system for establishing vehicle function safety management process
CN113850570A (en) * 2021-09-30 2021-12-28 中国建筑第七工程局有限公司 AI-based professional scheme aided decision-making expert system construction method
CN113689176B (en) * 2021-07-15 2024-08-02 东风汽车集团股份有限公司 Method and system for establishing vehicle function safety management flow

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104050615A (en) * 2014-07-10 2014-09-17 首都医科大学附属北京佑安医院 System for generating structured electronic medical record
CN104090863A (en) * 2014-07-24 2014-10-08 高德良 Intelligent legal instrument generating method and system
US20150081715A1 (en) * 2013-09-17 2015-03-19 Fujitsu Limited Retrieval device and method
CN104699758A (en) * 2015-02-04 2015-06-10 中国人民解放军装甲兵工程学院 Intelligent generation system and method of graphic library-associated command document
CN106227722A (en) * 2016-09-12 2016-12-14 中山大学 A kind of extraction method based on listed company's bulletin summary

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150081715A1 (en) * 2013-09-17 2015-03-19 Fujitsu Limited Retrieval device and method
CN104050615A (en) * 2014-07-10 2014-09-17 首都医科大学附属北京佑安医院 System for generating structured electronic medical record
CN104090863A (en) * 2014-07-24 2014-10-08 高德良 Intelligent legal instrument generating method and system
CN104699758A (en) * 2015-02-04 2015-06-10 中国人民解放军装甲兵工程学院 Intelligent generation system and method of graphic library-associated command document
CN106227722A (en) * 2016-09-12 2016-12-14 中山大学 A kind of extraction method based on listed company's bulletin summary

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ANJU KUNDU 等: "Semantic Similarity between Documents Using Tree View Ontology", 《INTERNATIONAL RESEARCH JOURNAL OF ADVANCED ENGINEERING AND SCIENCE》 *
郭忠伟: "作战文书自动生成理论及方法研究", 《中国优秀博硕士学位论文全文数据库(博士)工程科技II辑》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110895568A (en) * 2018-09-13 2020-03-20 阿里巴巴集团控股有限公司 Method and system for processing court trial records
CN110895568B (en) * 2018-09-13 2023-07-21 阿里巴巴集团控股有限公司 Method and system for processing court trial records
CN110245210A (en) * 2019-06-25 2019-09-17 北京市律典通科技有限公司 A kind of element fusion method and system
CN112348714A (en) * 2020-11-05 2021-02-09 科大讯飞股份有限公司 Evidence chain construction method, electronic device and storage medium
CN112348714B (en) * 2020-11-05 2024-07-05 科大讯飞股份有限公司 Evidence chain construction method, electronic device and storage medium
CN113689176A (en) * 2021-07-15 2021-11-23 东风汽车集团股份有限公司 Method and system for establishing vehicle function safety management process
CN113689176B (en) * 2021-07-15 2024-08-02 东风汽车集团股份有限公司 Method and system for establishing vehicle function safety management flow
CN113850570A (en) * 2021-09-30 2021-12-28 中国建筑第七工程局有限公司 AI-based professional scheme aided decision-making expert system construction method

Also Published As

Publication number Publication date
CN107622042B (en) 2021-07-06

Similar Documents

Publication Publication Date Title
Gardent et al. Creating training corpora for nlg micro-planning
CN111177569B (en) Recommendation processing method, device and equipment based on artificial intelligence
CN103207860B (en) The entity relation extraction method and apparatus of public sentiment event
CN107622042A (en) Document generation method and device, storage medium and electronic equipment
CN106484664A (en) Similarity calculating method between a kind of short text
CN106547742B (en) Semantic parsing result treating method and apparatus based on artificial intelligence
CN110633577B (en) Text desensitization method and device
JP2018190188A (en) Summary creating device, summary creating method and computer program
CN106202010A (en) The method and apparatus building Law Text syntax tree based on deep neural network
CN107526834A (en) Joint part of speech and the word2vec improved methods of the correlation factor of word order training
ATE377801T1 (en) OBJECT-ORIENTED SIMULATION OF A HYDROCARBON RESERVOIR
CN106897559A (en) A kind of symptom and sign class entity recognition method and device towards multi-data source
CN108763191A (en) A kind of text snippet generation method and system
CN112767386B (en) Image aesthetic quality evaluation method and system based on theme feature and score distribution
CN109408633A (en) A kind of construction method of the Recognition with Recurrent Neural Network model of multilayer attention mechanism
CN110427480A (en) Personalized text intelligent recommendation method, apparatus and computer readable storage medium
CN106909537A (en) A kind of polysemy analysis method based on topic model and vector space
Richetti et al. Declarative process mining: Reducing discovered models complexity by pre-processing event logs
CN107977364A (en) Tie up language word segmentation method and device
CN109344246B (en) Electronic questionnaire generating method, computer readable storage medium and terminal device
CN108345612A (en) A kind of question processing method and device, a kind of device for issue handling
CN109726282A (en) A kind of method, apparatus, equipment and storage medium generating article abstract
CN110162297A (en) A kind of source code fragment natural language description automatic generation method and system
Kathuria et al. Real time sentiment analysis on twitter data using deep learning (Keras)
CN116341519A (en) Event causal relation extraction method, device and storage medium based on background knowledge

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20190515

Address after: 230088 666 Wangjiang West Road, Hefei hi tech Development Zone, Anhui

Applicant after: Iflytek Co., Ltd.

Applicant after: IFLYTEK Shanghai Mdt InfoTech Ltd

Address before: 230000 666 Wangjiang West Road, Hefei hi tech Development Zone, Anhui

Applicant before: Iflytek Co., Ltd.

GR01 Patent grant
GR01 Patent grant