CN110489755A - Document creation method and device - Google Patents

Document creation method and device Download PDF

Info

Publication number
CN110489755A
CN110489755A CN201910775353.1A CN201910775353A CN110489755A CN 110489755 A CN110489755 A CN 110489755A CN 201910775353 A CN201910775353 A CN 201910775353A CN 110489755 A CN110489755 A CN 110489755A
Authority
CN
China
Prior art keywords
vector
entity
attribute
text
attribute value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910775353.1A
Other languages
Chinese (zh)
Inventor
吴智东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Shiyuan Electronics Thecnology Co Ltd
Original Assignee
Guangzhou Shiyuan Electronics Thecnology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Shiyuan Electronics Thecnology Co Ltd filed Critical Guangzhou Shiyuan Electronics Thecnology Co Ltd
Priority to CN201910775353.1A priority Critical patent/CN110489755A/en
Publication of CN110489755A publication Critical patent/CN110489755A/en
Priority to PCT/CN2019/126797 priority patent/WO2021031480A1/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Animal Behavior & Ethology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a kind of document creation method and devices.Wherein, this method comprises: concentrating the object knowledge map of selection target entity from knowledge mapping, wherein knowledge mapping collection is object to be evaluated for characterizing attribute value of at least one entity on preset attribute, target entity;The entity vector, attribute vector and attribute value vector of target entity are determined based on object knowledge map, wherein entity vector, attribute vector and attribute value vector are characterized with triple vector;It is generated and the matched text of target entity according to entity vector, attribute vector and attribute value vector.The present invention solves the text information shortage generated in the related technology merely with deep learning algorithm and comments to the personalization of entity, the technical problem for causing the practical manifestation matching degree of text information and entity not high.

Description

Document creation method and device
Technical field
The present invention relates to natural language processing fields, in particular to a kind of document creation method and device.
Background technique
Text generation technology is the one of the field natural language processing (Natural Language Processing, NLP) A important research direction, it is intended to the sentence for meeting human language rule, not no syntax error is automatically generated by rule, algorithm etc. Son.
The application of text generation technology is very more.For example, at the end of every term, teacher needs in education sector According to the daily performance of student, one section of descriptive, suggestiveness comment about student performance is write out.Tradition generates each student The method of comment rely on teacher to write manually mostly, such mode not only consumes teacher's a large amount of time, moreover, teacher The different daily performance for surely accurately remembering all students.Therefore, the settling mode of existing comparative maturity is, by the student of input Information carries out similarity calculation with the comment template of manual construction, chooses the highest template of similarity as the comment generated.
However, the comment of the above method be it is artificial constructed come out, rather than pass through algorithm generate, therefore, the above method It is not able to batch, is intelligent, generating different comments to each student to personalization.In addition, since comment is to pass through numerology For similarity between raw information and comment template come what is obtained, this mode only accounts for the information on character surface, does not account for To the semantic layer information of comment text.To solve this problem, deep learning algorithm considers system of the text in multiple dimensions Score cloth, and comment is generated using the mode of probability.But deep learning algorithm lacks education information, to particular student Daily behavior performance and comment between potential relationship Deficiency of learning ability, lack and personalized comment generated to particular student Ability, so that the practical manifestation matching degree of comment and student that deep learning algorithm generates is not high, inaccurate.
Lack for the text information generated in the related technology merely with deep learning algorithm and the personalization of entity commented, The technical problem for causing the practical manifestation matching degree of text information and entity not high, currently no effective solution has been proposed.
Summary of the invention
The present invention provides a kind of document creation method and devices, at least to solve in the related technology merely with deep learning The text information that algorithm generates, which lacks, comments the personalization of student, leads to the practical manifestation matching degree of text information and student not High technical problem.
According to an aspect of an embodiment of the present invention, a kind of document creation method is provided, comprising: concentrate from knowledge mapping The object knowledge map of selection target entity, wherein knowledge mapping collection is for characterizing at least one entity on preset attribute Attribute value, target entity are object to be evaluated;Entity vector, the attribute vector of target entity are determined based on object knowledge map With attribute value vector, wherein entity vector, attribute vector and attribute value vector are characterized with triple vector;Foundation entity vector, Attribute vector and attribute value vector generate and the matched text of target entity.
Optionally, before the object knowledge map from knowledge mapping concentration selection target entity, the above method further include: Generate knowledge mapping collection, wherein the step of generating knowledge mapping collection includes: the planning layer for constructing knowledge mapping collection, wherein planning Layer includes at least: entity type, attribute type and attribute Value Types;Record information is obtained, wherein record information includes: at least one Attribute value of a entity on preset attribute;Information input will be recorded into planning layer, generate knowledge mapping collection.
Optionally, before it will record information input to planning layer, the above method further include: record information is located in advance Reason, the record information that obtains that treated, wherein pretreatment includes at least one following: entity extraction, attribute extraction, attribute value It extracts and entity disambiguates.
Optionally, the entity vector, attribute vector and attribute value vector of target entity are determined based on object knowledge map, are wrapped It includes: extracting entity information, attribute information and the attribute value information of the target entity in object knowledge map;It will using preset algorithm Entity information is converted to boolean vector, using preset model by attribute information and attribute value information be converted into high latitude numeric type to Amount, obtains triple vector.
Optionally, it is generated and the matched text of target entity, packet according to entity vector, attribute vector and attribute value vector It includes: entity vector, attribute vector and attribute value vector is input in text generation model, wherein text generation model includes Deep neural network model, deep neural network model are obtained according to triple sample and samples of text training;It is raw based on text It is generated and target entity matched text at model.
Optionally, before entity vector, attribute vector and attribute value vector are input to text generation model, above-mentioned side Method further include: generate text generation model, wherein the step of generating text generation model includes: to obtain triple sample and text This sample;The entity sample in triple sample is converted into boolean vector using preset algorithm, using preset model by ternary Attribute sample, attribute value sample in group sample are converted into high latitude numeric type vector, obtain triple vector sample;Based on three Tuple vector sample and samples of text training text generate model, obtain trained text generation model.
Optionally, model is generated based on triple vector sample and samples of text training text, obtains trained text Generate model, comprising: using the coder processes triple vector sample and samples of text for combining attention mechanism, obtain up and down Literary vector;Using the decoder processes context vector for combining attention mechanism, text information is obtained;Based on text information, with It minimizes loss function and carrys out training text generation model.
Other side according to an embodiment of the present invention additionally provides a kind of document creation method, comprising: finger is chosen in reception It enables, wherein choose instruction for choosing target entity to be evaluated;Display with the matched text of target entity, wherein text according to Entity vector, attribute vector and the attribute value vector for the target entity determined according to the object knowledge map of target entity generate, Object knowledge map comes from knowledge mapping collection, and knowledge mapping collection is for characterizing attribute of at least one entity on preset attribute Value, entity vector, attribute vector and attribute value vector are characterized with triple vector.
Other side according to an embodiment of the present invention additionally provides a kind of text generating apparatus, comprising: selecting module, For concentrating the object knowledge map of selection target entity from knowledge mapping, wherein knowledge mapping collection is for characterizing at least one Attribute value of the entity on preset attribute, target entity are object to be evaluated;Determining module, for being based on object knowledge map Determine the entity vector, attribute vector and attribute value vector of target entity, wherein entity vector, attribute vector and attribute value to Amount is characterized with triple vector;Text generation module, for according to entity vector, attribute vector and the generation of attribute value vector and mesh Mark the text of Entities Matching.
Other side according to an embodiment of the present invention, additionally provides a kind of storage medium, and storage medium includes storage Program, wherein equipment where control storage medium executes any one of the above document creation method in program operation.
Other side according to an embodiment of the present invention additionally provides a kind of processor, and processor is used to run program, In, program executes any one of the above document creation method when running.
In embodiments of the present invention, the object knowledge map of selection target entity is concentrated from knowledge mapping, wherein knowledge graph Spectrum collection is object to be evaluated for characterizing attribute value of at least one entity on preset attribute, target entity;Based on target Knowledge mapping determines the entity vector, attribute vector and attribute value vector of target entity, wherein entity vector, attribute vector and Attribute value vector is characterized with triple vector;It is generated and target entity according to entity vector, attribute vector and attribute value vector The text matched.Compared with prior art, the application establishes knowledge mapping collection using the usually performance of multiple entities, then therefrom mentions The triple vector of object knowledge map is taken, then deep learning algorithm is combined to generate comment.The program is by by knowledge mapping It combines with deep learning, so that deep learning algorithm is connected to all properties of entity, and then solves in the related technology only Lacked using the text information that deep learning algorithm generates and the personalization of entity is commented, leads to the reality of text information and entity The not high technical problem of matching degree is showed, has achieved the purpose that farthest generation meets the comment that entity usually shows, and mentions The high matching degree of comment.
Detailed description of the invention
The drawings described herein are used to provide a further understanding of the present invention, constitutes part of this application, this hair Bright illustrative embodiments and their description are used to explain the present invention, and are not constituted improper limitations of the present invention.In the accompanying drawings:
Fig. 1 is the flow chart according to a kind of document creation method of the embodiment of the present application 1;
Fig. 2 is the basic principle block diagram according to a kind of comment generation method of the embodiment of the present application 1;
Fig. 3 is the detailed schematic diagram based on the basic principle of comment generation method shown in Fig. 2;
Fig. 4 is the flow chart according to a kind of document creation method of the embodiment of the present application 2;
Fig. 5 is the structural schematic diagram according to a kind of text generating apparatus of the embodiment of the present application 3;And
Fig. 6 is the structural schematic diagram according to a kind of text generating apparatus of the embodiment of the present application 4.
Specific embodiment
In order to enable those skilled in the art to better understand the solution of the present invention, below in conjunction in the embodiment of the present invention Attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is only The embodiment of a part of the invention, instead of all the embodiments.Based on the embodiments of the present invention, ordinary skill people The model that the present invention protects all should belong in member's every other embodiment obtained without making creative work It encloses.
It should be noted that description and claims of this specification and term " first " in above-mentioned attached drawing, " Two " etc. be to be used to distinguish similar objects, without being used to describe a particular order or precedence order.It should be understood that using in this way Data be interchangeable under appropriate circumstances, so as to the embodiment of the present invention described herein can in addition to illustrating herein or Sequence other than those of description is implemented.In addition, term " includes " and " having " and their any deformation, it is intended that cover Cover it is non-exclusive include, for example, the process, method, system, product or equipment for containing a series of steps or units are not necessarily limited to Step or unit those of is clearly listed, but may include be not clearly listed or for these process, methods, product Or other step or units that equipment is intrinsic.
Embodiment 1
According to embodiments of the present invention, a kind of embodiment of document creation method is provided, it should be noted that in attached drawing The step of process illustrates can execute in a computer system such as a set of computer executable instructions, although also, Logical order is shown in flow chart, but in some cases, it can be to be different from shown by sequence execution herein or retouch The step of stating.
Fig. 1 is document creation method according to an embodiment of the present invention, as shown in Figure 1, this method comprises the following steps:
Step S102 concentrates the object knowledge map of selection target entity from knowledge mapping, wherein knowledge mapping collection is used for Attribute value of at least one entity on preset attribute is characterized, target entity is object to be evaluated.
In a kind of optinal plan, above-mentioned entity can be the object of any required evaluation such as student, mechanism, company personnel; For student, above-mentioned preset attribute can for classroom performance, self-image, social activity performance, Emotion expression, examine in week achievement, Final grade etc., corresponding attribute value can be positive, clean and tidy, active, stable, big rise and fall, excellent etc.;For mechanism, on Stating preset attribute can be brand image, granted patent quantity, annual return, public and social interest etc., and corresponding attribute value can be shadow Ring it is big, be greater than 100,200,000,000, it is active etc..
Knowledge mapping (Knowledge Graph, KG) is used as knowledge organization and retrieval technique new under big data era In describing concept and its correlation in physical world with sign format.Knowledge mapping collection summarizes the knowledge graph of multiple entities Spectrum, the knowledge mapping of each entity record the daily behavior performance of the entity, since each entity is one independent The knowledge mapping of body, each entity is naturally different.When some entity of needs assessment, i.e. target entity, from knowledge mapping collection The object knowledge map of middle selection target entity.
By taking student as an example, in the summing-up comment for needing to generate student A, is concentrated from knowledge mapping and extract knowing for student A Know map, which recites attribute value of the student A on all properties, that is, record the daily of student's A various aspects Behavior expression.
Step S104 determines the entity vector, attribute vector and attribute value vector of target entity based on object knowledge map, Wherein, entity vector, attribute vector and attribute value vector are characterized with triple vector.
In above-mentioned steps, by the entity information, attribute information and the attribute that extract target entity in object knowledge map Value information, and it is converted into the entity vector, attribute vector and attribute value vector for being easy to text generation model treatment, Ke Yi great It is big to improve the matching degree for generating text.
It should be noted that triple is a kind of generic representation form of knowledge mapping, the present embodiment is carried out with triple Citing, does not constitute the limitation to the application.
Step S106 is generated and the matched text of target entity according to entity vector, attribute vector and attribute value vector.
In a kind of optinal plan, the text generation model of above-mentioned generation text can be deep neural network model.
Deep neural network is the interdisciplinary study combined about mathematics and computer, depth different from machine learning Neural network is able to achieve the high latitude feature extraction of end-to-end data, abstract, and solve that feature in machine learning is difficult to extract asks Topic.Such as typical Seq2Seq model, generation confrontation network model etc..
Seq2Seq is the model of an Encoder-Deocder structure, and basic thought is to utilize two circulation nerve nets Network, one is used as encoder, and one is used as decoder, and the list entries of a variable-length is become regular length by encoder Vector, this vector are considered as the semanteme of the sequence, and decoder is by the vector decoding of this regular length at variable-length Output sequence;It generates in confrontation network (Generative Adversarial Networks, GAN) model and includes at least two Module, a generation model, the mutual Game Learning of a confrontation model, two models generate fairly good output.Therefore, on It states two kinds of deep neural network algorithms to apply in comment generation field, and robust more more accurate than machine learning method can be reached Effect.
In above-mentioned steps, by by object knowledge map determine by entity vector, attribute vector and attribute value vector The triple vector of composition, is input in deep neural network model, can be generated and shows phase with the daily behavior of target entity Matched comment text.
It is easily noted that, in existing text generation field, even if there is the text generation of knowledge based map, nor complete The information such as the entity information, attribute information and attribute value of this province of knowledge mapping are entirely used, but using knowledge mapping as intermediary, then By search, or the method for calculating similitude searches suitable text.However, the present invention is by knowledge mapping and deep neural network It combines, it is contemplated that the daily behavior of target entity shows, and to different entities, can automatically generate and meet the practical table of the entity The comment of existing situation, improves the matching degree and accuracy of comment.
Still by taking student as an example, teacher needs to write when winter and summer vacation the comment of one section of summing-up for every student.Teacher It can be by clicking knowledge mapping of the mouse from knowledge mapping concentration extraction student to be evaluated, which recites the student The information such as daily performance, such as classroom performance, self-image, social performance, Emotion expression, final grade.Execute this implementation The terminal of example method determines the triple vector of the student based on the knowledge mapping of the student, is input to depth nerve In network model, the display interface of terminal can automatically generate the matched comment of daily performance with the student.Using upper Scheme is stated, the time and efforts of teacher is greatly saved, avoids the daily behavior performance that infull student is not allowed or is remembered to teacher's note, Comment and the not high problem of the matching degree of student is caused to occur.
Based on the scheme that the above embodiments of the present application provide, the object knowledge figure of selection target entity is concentrated from knowledge mapping Spectrum, wherein knowledge mapping collection is to be evaluated for characterizing attribute value of at least one entity on preset attribute, target entity Object;The entity vector, attribute vector and attribute value vector of target entity are determined based on object knowledge map, wherein entity to Amount, attribute vector and attribute value vector are characterized with triple vector;It is generated according to entity vector, attribute vector and attribute value vector With the matched text of target entity.Compared with prior art, the application establishes knowledge mapping using the usually performance of multiple entities Then collection therefrom extracts the triple vector of object knowledge map, then deep learning algorithm is combined to generate comment.The program is logical It crosses and combines knowledge mapping and deep learning, so that deep learning algorithm is connected to all properties of entity, and then solve Lack in the related technology merely with the text information that deep learning algorithm generates and the personalization of entity is commented, leads to text information The not high technical problem with the practical manifestation matching degree of entity has reached and has farthest generated commenting of meeting that entity usually shows The purpose of language improves the matching degree of comment.
Optionally, step S102 is being executed before the object knowledge map that knowledge mapping concentrates selection target entity, on The method of stating can also include step S101, generate knowledge mapping collection, wherein the step of generating knowledge mapping collection can specifically include Following steps:
Step S1012 constructs the planning layer of knowledge mapping collection, wherein planning layer includes at least: entity type, Attribute class Type and attribute Value Types.
In a kind of optinal plan, above-mentioned planning layer can pass through ontology construction tool Prot é g é software editing.Protégé Software is ontology editing and knowledge acquisition software based on Java language exploitation, and user only needs to carry out ontology on concept hierarchy The building of model, it is simple to operation.
Planning layer is equivalent to the framework of knowledge mapping, includes at least entity type, attribute type and attribute value in planning layer Type, it is of course also possible to the information such as including the time.
Step S1014 obtains record information, wherein record information includes: category of at least one entity on preset attribute Property value.
In a kind of optinal plan, above-mentioned record information can be by being manually entered to the computer for executing the present embodiment method In terminal.For example, the classroom Li Ming shows positive, good, final grade A of image etc., big classroom performance love doze, social activity performance Not positive, final grade B etc..In this way, can consider the daily behavior of target entity comprehensively when generating the text of target entity Performance, avoids missing feature.
Step S1016 generates knowledge mapping collection by record information input into planning layer.
In above-mentioned steps, the entity information, attribute information, attribute value information that obtain in step S1014 correspondence are filled into In the entity type of planning layer of step S1012 building, attribute type and attribute Value Types, the knowledge of all entities is constructed with this Atlas, and store into graphic data base Neo4j.
Optionally, before executing step S1016 and will record information input to planning layer, the above method can also include: Step S1015 pre-processes record information, the record information that obtains that treated, wherein pretreatment include it is following at least it One: entity extraction, attribute extraction, attribute value extracts and entity disambiguates.
In a kind of optinal plan, above-mentioned entity extraction, attribute extraction, attribute value is extracted can know for Entity recognition, attribute Not, attribute value identifies, detection and classification including entity, attribute, attribute value.
It should be noted that being handled by entity disambiguation, two different names can be distinguished and represent the same entity, Or the case where identical two different entity of name reference.
Optionally, step S104 determines entity vector, attribute vector and the attribute of target entity based on object knowledge map It is worth vector, can specifically include following steps:
Step S1042 extracts entity information, attribute information and the attribute value letter of the target entity in object knowledge map Breath.
Entity information is converted to boolean vector using preset algorithm by step S1044, using preset model by attribute information It is converted into high latitude numeric type vector with attribute value information, obtains triple vector.
In a kind of optinal plan, above-mentioned preset algorithm can be able to be for solely hot (OneHot) algorithm, above-mentioned preset model BERT model or Word2Vector model.Wherein, BERT model is indicated with the alternating binary coding device of Transformer, is suitable for The building of the most advanced model of extensive task.
When carrying out information expression to the triple in object knowledge map, by entity information, attribute information and attribute value Information is converted to the numerical value vector for being easy to neural network model processing, and neural network model is connected to all categories of target entity Property, it can then extract high latitude attribute vector feature.Specifically, multiple triples of target entity in object knowledge map are extracted (ei, pij, vij), wherein ei、pij、vijRespectively indicate i-th of entity information, j-th of attribute information of i-th entity, i-th J-th of attribute value information of entity, then by ei, pij, vijV is characterized into respectivelyei, Vpi, VviVector.
In an alternative embodiment, using OneHot algorithm by entity eiBoolean vector is characterized into, BERT mould is used Type is by attribute pij, attribute value vijCheng Gaowei numeric type vector is characterized, i.e.,
Wherein, t, s indicate the mapping function of feature extraction function and a neural network structure.
Optionally, step S106 is generated matched with target entity according to entity vector, attribute vector and attribute value vector Text can specifically include following steps:
Entity vector, attribute vector and attribute value vector are input in text generation model by step S1062, wherein text This generation model includes deep neural network model, and deep neural network model is trained according to triple sample and samples of text It arrives.
As previously mentioned, above-mentioned deep neural network model can be Seq2Seq model, generate and fight network model etc..
Step S1064 is generated and target entity matched text based on text generation model.
In above-mentioned steps, by entity vector Vei, attribute vector VpiWith attribute value vector VviIt is input to text generation model In, that is, produce the summing-up comment text y about target entity*
In a kind of optinal plan, above-mentioned summing-up comment text y*It is represented by output sequence y1,…yT’, wherein yt’It indicates The output character at t ' moment, i.e.,
In above formula, t' ∈ { 1 ..., T'}, ct′Indicate the context vector at t ' moment, P (yt'|y1,...,yt'-1,ct') table Show the probability vector of the t ' moment all candidate texts, arg max indicates to choose in the candidate text generated, and probability vector value is most Big text.
Optionally, entity vector, attribute vector and attribute value vector are input to text generation mould in execution step S1062 Before type, the above method can also include step S1061, generate text generation model, wherein generate the step of text generation model Suddenly may include:
Step S10611 obtains triple sample and samples of text.
In a kind of optinal plan, above-mentioned triple sample and samples of text can form alignment corpus, be expressed as ((e, p, v),y)|((e1,p1,v1),y1),…((ei,pi,vi),yi)}。
Entity sample in triple sample is converted to boolean vector using preset algorithm, using pre- by step S10612 If attribute sample, the attribute value sample in triple sample are converted into high latitude numeric type vector by model, obtain triple to Measure sample.
As previously mentioned, above-mentioned preset algorithm may be only hot algorithm, above-mentioned preset model may be alternating binary coding device Characterization model, the process that triple sample is converted to triple vector sample is similar with step S1044, and details are not described herein.
Step S10613 generates model based on triple vector sample and samples of text training text, obtains trained Text generation model.
After construction is good by the alignment corpus of triple and comment formed, so that it may which the corpus based on construction uses depth The algorithm of neural network is spent, training text generates model.Since text generation model acquires the daily behavior table of all entities Existing data, and in this, as training corpus, training text generates model, and therefore, above scheme can be according to the day of specific entity Normal behavior expression generates the summing-up comment for meeting the entity.
In an alternative embodiment, step S10613 is based on triple vector sample and samples of text training text is raw At model, trained text generation model is obtained, can specifically include following steps:
Step S106131 is obtained using the coder processes triple vector sample and samples of text for combining attention mechanism To context vector.
In the model of Encoder-Deocder structure, there are two Recognition with Recurrent Neural Network, one is used as encoder, and one A to be used as decoder, the list entries of a variable-length is become the vector of regular length by encoder, this vector can be seen The semanteme of the sequence is done, decoder is by the vector decoding of this regular length at the output sequence of variable-length.However, if defeated The length for entering sequence is very long, and the vector effect of regular length is rather bad, and combines the coding of attention (Attention) mechanism Device can solve this ineffective problem.Specifically, the context vector of the encoder coding in conjunction with attention mechanism are as follows:
ct'=f (ht, yt'-1, st'-1, ct')
Wherein, f presentation code function, ht、yt′-1、st′-1、ct′Respectively indicate hidden layer output, the decoding of encoder t moment The output at -1 moment of device t ', the implicit layer state at -1 moment of decoder t ', the context vector at t ' moment.
Step S106132 obtains text information using the decoder processes context vector for combining attention mechanism.
In the final context vector extracted in view of encoder, characteristic information is limited, and the more difficult office for capturing input Portion's feature, therefore need to combine the output of attention mechanism in encoder as a result, input parameter as decoder.Specifically, it ties Close the decoder output of attention mechanism are as follows:
P(yt'|y1,...,yt'-1,ct')=g (yt'-1,st',ct')
Wherein, g indicates decoding functions, yt′、yt′-1、st′、ct′Respectively indicate the output at t ' moment, the output at -1 moment of t ', The implicit layer state at decoder t ' moment, the context vector at t ' moment.
Step S106133 is based on text information, carrys out training text generation model to minimize loss function.
It should be noted that the target of text generation model training are as follows: minimize the negative logarithm of text generation model seemingly Right loss function:
Wherein, xi、yiI-th of input text, output text are respectively represented, i ∈ { 1 ..., I }, θ are model parameters.Training The result is that generation text and urtext strong correlation, and minimize text grammer mistake.
Optionally, the preset algorithm in step S1044 and step S10612 is only hot algorithm, and preset model is BERT model Or Word2Vector model.
Still by taking student as an example, Fig. 2 is the basic principle block diagram according to a kind of comment generation method of the embodiment of the present application.Such as Shown in Fig. 2, then record case of the acquisition teacher to the daily behavior data of each student first fills it into designed In knowledge mapping planning layer, the knowledge mapping collection of all student performances is constructed with this.In the comment for needing to generate student to be evaluated When, the object knowledge map for extracting student to be evaluated is concentrated from knowledge mapping, is then enter into trained text generation In model, and then the summing-up comment about the daily performance of student is exported automatically.Detailed principle as shown in figure 3, student day Normal behavioral data includes classroom performance, self-image, social performance, Emotion expression etc., and the planning layer planning of knowledge mapping has reality Body type, attribute type and attribute Value Types take out the daily behavior data of student into entity is crossed when constructing knowledge mapping collection Take, attribute extraction, attribute value extract, entity disambiguate etc. operations pre-processed, be subsequently filled in corresponding planning layer. When evaluating student ID, the knowledge subgraph of student ID is extracted first, triplet information is then extracted, is converted into ternary The form of group vector is characterized, and is recently entered in trained text generation model, candidate student's comment is generated, by teacher Reaffirm whether need to modify to the comment, to obtain final student's comment.Wherein, text generation model is by ternary Group sample and comment sample obtain the Encoder-Deocder model training for combining attention mechanism.
From the foregoing, it will be observed that the above embodiments of the present application, the object knowledge map of selection target entity is concentrated from knowledge mapping, In, knowledge mapping collection is object to be evaluated for characterizing attribute value of at least one entity on preset attribute, target entity; The entity vector, attribute vector and attribute value vector of target entity are determined based on object knowledge map, wherein entity vector belongs to Property vector sum attribute value vector with triple vector characterize;According to entity vector, attribute vector and the generation of attribute value vector and mesh Mark the text of Entities Matching.Compared with prior art, the application establishes knowledge mapping collection using the usually performance of multiple entities, so The triple vector for therefrom extracting object knowledge map afterwards, then combines deep learning algorithm to generate comment;By the way that entity is believed Breath, attribute information and attribute value information are converted to the numerical value vector for being easy to neural network model processing, neural network model connection To all properties of target entity, high latitude attribute vector feature can be then extracted;By the Encoder- for combining attention mechanism Deocder model can optimize the effect of text output;And then it solves and is generated in the related technology merely with deep learning algorithm Text information lack to entity personalization comment, the technology for causing the practical manifestation matching degree of text information and entity not high Problem has achieved the purpose that farthest generation meets the comment that entity usually shows, and improves the matching degree of comment.
Embodiment 2
According to embodiments of the present invention, the embodiment of another document creation method is provided from the angle of display interface, needed It is noted that step shown in the flowchart of the accompanying drawings can be in the computer system of such as a group of computer-executable instructions Middle execution, although also, logical order is shown in flow charts, and it in some cases, can be to be different from herein Sequence executes shown or described step.
Fig. 4 is the method for another text generation according to an embodiment of the present invention, as shown in figure 4, this method includes as follows Step:
Instruction is chosen in step S402, reception, wherein chooses instruction for choosing target entity to be evaluated.
In a kind of optinal plan, above-mentioned selection instruction can also pass through touch by teacher by mouse clicking trigger Screen touches triggering;In a kind of optinal plan, above-mentioned target entity can be any to be evaluated for student, mechanism, company personnel etc. Object.
Step S404, display and the matched text of target entity, wherein object knowledge map of the text according to target entity Entity vector, attribute vector and the attribute value vector for the target entity determined generate, and object knowledge map comes from knowledge mapping Collection, knowledge mapping collection is for characterizing attribute value of at least one entity on preset attribute, entity vector, attribute vector and attribute It is worth vector to be characterized with triple vector.
In a kind of optinal plan, in a kind of above-mentioned optinal plan, above-mentioned entity can be student, mechanism, company personnel etc. The object of any need evaluation;For student, above-mentioned preset attribute can for classroom performance, self-image, it is social show, Emotion expression, week examine achievement, final grade etc., corresponding attribute value can for positive, clean and tidy, active, stable, big rise and fall, It is excellent etc.;For mechanism, above-mentioned preset attribute can be right for brand image, granted patent quantity, annual return, public and social interest etc. The attribute value answered can for influence it is big, be greater than 100,200,000,000, it is active etc.;The text generation model of above-mentioned generation text can be Deep neural network model.
Knowledge mapping (Knowledge Graph, KG) is used as knowledge organization and retrieval technique new under big data era In describing concept and its correlation in physical world with sign format.Knowledge mapping collection summarizes the knowledge graph of multiple entities Spectrum, the knowledge mapping of each entity record the daily behavior performance of the entity, since each entity is one independent The knowledge mapping of body, each entity is naturally different.When some entity of needs assessment, i.e. target entity, from knowledge mapping collection The object knowledge map of middle selection target entity.
By extracting entity information, attribute information and the attribute value information of target entity in object knowledge map, and will It is converted to the entity vector, attribute vector and attribute value vector for being easy to text generation model treatment, can greatly improve generation The matching degree of text.
It should be noted that deep neural network is the interdisciplinary study combined about mathematics and computer, with machine Study is different, and deep neural network is able to achieve the high latitude feature extraction of end-to-end data, is abstracted, and solves feature in machine learning It is difficult to the problem of extracting.Such as typical Seq2Seq model, generation confrontation network model etc..
Seq2Seq is the model of an Encoder-Deocder structure, and basic thought is to utilize two circulation nerve nets Network, one is used as encoder, and one is used as decoder, and the list entries of a variable-length is become regular length by encoder Vector, this vector can regard the semanteme of the sequence as, and decoder is by the vector decoding of this regular length at variable-length Output sequence;It generates in confrontation network (Generative Adversarial Networks, GAN) model and includes at least two Module, a generation model, the mutual Game Learning of a confrontation model, two models generate fairly good output.Therefore, on It states two kinds of deep neural network algorithms to apply in comment generation field, and robust more more accurate than machine learning method can be reached Effect.
In above-mentioned steps, terminal detect the click target entity from display interface choose instruction after, just It can be shown and the matched comment text of target entity in display interface.
It is easily noted that, in existing text generation field, even if there is the text generation of knowledge based map, nor complete The information such as the entity information, attribute information and attribute value of this province of knowledge mapping are entirely used, but using knowledge mapping as intermediary, then By search, or the method for calculating similitude searches suitable text.However, the present invention is by knowledge mapping and deep neural network It combines, it is contemplated that the daily behavior of target entity shows, and to different entities, can automatically generate and meet the practical table of the entity The comment of existing situation, improves the matching degree and accuracy of comment.
It based on the scheme that the above embodiments of the present application provide, receives choose instruction first, wherein choose instruction for choosing Then target entity to be evaluated is shown and the matched text of target entity, wherein object knowledge of the text according to target entity Entity vector, attribute vector and the attribute value vector for the target entity that map is determined generate, and object knowledge map comes from knowledge Atlas, knowledge mapping collection for characterizing attribute value of at least one entity on preset attribute, entity vector, attribute vector and Attribute value vector is characterized with triple vector.Compared with prior art, the application is known using the usually performance foundation of multiple entities Know atlas, then therefrom extract the triple vector of object knowledge map, then deep learning algorithm is combined to generate comment.It should Scheme is by combining knowledge mapping and deep learning, so that deep learning algorithm is connected to all properties of entity, in turn It solves the text information shortage generated merely with deep learning algorithm in the related technology to comment the personalization of entity, leads to text This information and the not high technical problem of the practical manifestation matching degree of entity, reach farthest to generate and meet entity and usually show Comment purpose, improve the matching degree of comment.
Optionally, show that the above method can also include with before the matched text of target entity in execution step S404 Step S403 generates knowledge mapping collection, wherein the step of generating knowledge mapping collection can specifically include following steps:
Step S4032 constructs the planning layer of knowledge mapping collection, wherein planning layer includes at least: entity type, Attribute class Type and attribute Value Types.
In a kind of optinal plan, above-mentioned planning layer can pass through ontology construction tool Prot é g é software editing.Protégé Software is ontology editing and knowledge acquisition software based on Java language exploitation, and user only needs to carry out ontology on concept hierarchy The building of model, it is simple to operation.
Planning layer is equivalent to the framework of knowledge mapping, includes at least entity type, attribute type and attribute value in planning layer Type, it is of course also possible to the information such as including the time.
Step S4034 obtains record information, wherein record information includes: category of at least one entity on preset attribute Property value.
In a kind of optinal plan, above-mentioned record information can be by being manually entered to the computer for executing the present embodiment method In terminal.For example, the classroom Li Ming shows positive, good, final grade A of image etc., big classroom performance love doze, social activity performance Not positive, final grade B etc..In this way, can consider the daily behavior of target entity comprehensively when generating the text of target entity Performance, avoids missing feature.
Step S4036 generates knowledge mapping collection by record information input into planning layer.
In above-mentioned steps, by entity information, attribute information, the corresponding reality for being filled into the planning layer built of attribute value information In body type, attribute type and attribute Value Types, the knowledge mapping collection of all entities is constructed with this, and store and arrive graphic data base In Neo4j.
Optionally, before executing step S4036 and will record information input to planning layer, the above method can also include: Step S4035 pre-processes record information, the record information that obtains that treated, wherein pretreatment include it is following at least it One: entity extraction, attribute extraction, attribute value extracts and entity disambiguates.
In a kind of optinal plan, above-mentioned entity extraction, attribute extraction, attribute value is extracted can know for Entity recognition, attribute Not, attribute value identifies, detection and classification including entity, attribute, attribute value.
It should be noted that being handled by entity disambiguation, two different names can be distinguished and represent the same entity, Or the case where identical two different entity of name reference.
Optionally, entity vector, attribute vector and the category for the target entity that object knowledge map is determined in step S404 Property value vector, can specifically include following steps:
Step S4041 extracts entity information, attribute information and the attribute value letter of the target entity in object knowledge map Breath.
Entity information is converted to boolean vector using preset algorithm by step S4042, using preset model by attribute information It is converted into high latitude numeric type vector with attribute value information, obtains triple vector.
In a kind of optinal plan, above-mentioned preset algorithm can be only hot algorithm, and above-mentioned preset model can be BERT model Or Word2Vector model.Wherein, BERT model is indicated with the alternating binary coding device of Transformer, is suitable for extensive task Most advanced model building.
When carrying out information expression to the triple in object knowledge map, by entity information, attribute information and attribute value Information is converted to the numerical value vector for being easy to neural network model processing, and neural network model is connected to all categories of target entity Property, it can then extract high latitude attribute vector feature.Specifically, multiple triples of target entity in object knowledge map are extracted (ei, pij, vij), wherein ei、pij、vijRespectively indicate i-th of entity information, j-th of attribute information of i-th entity, i-th J-th of attribute value information of entity, then by ei, pij, vijV is characterized into respectivelyei, Vpi, VviVector.
In an alternative embodiment, using OneHot algorithm by entity eiBoolean vector is characterized into, BERT mould is used Type is by attribute pij, attribute value vijCheng Gaowei numeric type vector is characterized, i.e.,
Wherein, t, s indicate the mapping function of feature extraction function and a neural network structure.
Optionally, the step of generating text according to entity vector, attribute vector and attribute value vector in step S404 is specific It may comprise steps of:
Entity vector, attribute vector and attribute value vector are input in text generation model by step S4046, wherein text This generation model includes deep neural network model, and deep neural network model is trained according to triple sample and samples of text It arrives.
As previously mentioned, above-mentioned deep neural network model can be Seq2Seq model, generate and fight network model etc..
Step S4047 is generated and target entity matched text based on text generation model.
In above-mentioned steps, by entity vector Vei, attribute vector VpiWith attribute value vector VviIt is input to text generation model In, that is, produce the summing-up comment text y* about target entity.
In a kind of optinal plan, above-mentioned summing-up comment text y* is represented by output sequence y1,…yT’, wherein yt’It indicates The output character at t ' moment, i.e.,
In above formula, t' ∈ { 1 ..., T'}, ct′Indicate the context vector at t ' moment, P (yt'|y1,...,yt'-1,ct') table Show the probability vector of the t ' moment all candidate texts, arg max indicates to choose in the candidate text generated, and probability vector value is most Big text.
Optionally, entity vector, attribute vector and attribute value vector are input to text generation mould in execution step S4046 Before type, the above method can also include step S4045, generate text generation model, wherein generate the step of text generation model Suddenly may include:
Step S40451 obtains triple sample and samples of text.
In a kind of optinal plan, above-mentioned triple sample and samples of text can form alignment corpus, be expressed as ((e, p, v),y)|((e1,p1,v1),y1),…((ei,pi,vi),yi)}。
Entity sample in triple sample is converted to boolean vector using preset algorithm, using pre- by step S40452 If attribute sample, the attribute value sample in triple sample are converted into high latitude numeric type vector by model, obtain triple to Measure sample.
As previously mentioned, above-mentioned preset algorithm may be only hot algorithm, above-mentioned preset model may be alternating binary coding device Characterization model, the process that triple sample is converted to triple vector sample is similar with step S1044, and details are not described herein.
Step S40453 generates model based on triple vector sample and samples of text training text, obtains trained Text generation model.
After construction is good by the alignment corpus of triple and comment formed, so that it may which the corpus based on construction uses depth The algorithm of neural network is spent, training text generates model.Since text generation model acquires the daily behavior table of all entities Existing data, and in this, as training corpus, training text generates model, and therefore, above scheme can be according to the day of specific entity Normal behavior expression generates the summing-up comment for meeting the entity.
In an alternative embodiment, step S40453 is based on triple vector sample and samples of text training text is raw At model, trained text generation model is obtained, can specifically include following steps:
Step S404531 is obtained using the coder processes triple vector sample and samples of text for combining attention mechanism To context vector.
In the model of Encoder-Deocder structure, there are two Recognition with Recurrent Neural Network, one is used as encoder, and one A to be used as decoder, the list entries of a variable-length is become the vector of regular length by encoder, this vector can be seen The semanteme of the sequence is done, decoder is by the vector decoding of this regular length at the output sequence of variable-length.However, if defeated The length for entering sequence is very long, and the vector effect of regular length is rather bad, and combines the coding of attention (Attention) mechanism Device can solve this ineffective problem.Specifically, the context vector of the encoder coding in conjunction with attention mechanism are as follows:
ct'=f (ht, yt'-1, st'-1, ct')
Wherein, f presentation code function, ht、yt′-1、st′-1、ct′Respectively indicate hidden layer output, the decoding of encoder t moment The output at -1 moment of device t ', the implicit layer state at -1 moment of decoder t ', the context vector at t ' moment.
Step S404532 obtains text information using the decoder processes context vector for combining attention mechanism.
In the final context vector extracted in view of encoder, characteristic information is limited, and the more difficult office for capturing input Portion's feature, therefore need to combine the output of attention mechanism in encoder as a result, input parameter as decoder.Specifically, it ties Close the decoder output of attention mechanism are as follows:
P(yt'|y1,...,yt'-1,ct')=g (yt'-1,st',ct')
Wherein, g indicates decoding functions, yt′、yt′-1、st′、ct′Respectively indicate the output at t ' moment, the output at -1 moment of t ', The implicit layer state at decoder t ' moment, the context vector at t ' moment.
Step S404533 is based on text information, carrys out training text generation model to minimize loss function.
It should be noted that the target of text generation model training are as follows: minimize the negative logarithm of text generation model seemingly Right loss function:
Wherein, xi、yiI-th of input text, output text are respectively represented, i ∈ { 1 ..., I }, θ are model parameters.Training The result is that generation text and urtext strong correlation, and minimize text grammer mistake.
Optionally, the preset algorithm in step S4042 and step S40452 is only hot algorithm, and preset model is BERT model Or Word2Vector model.
Through the above description of the embodiments, those skilled in the art can be understood that according to above-mentioned implementation The method of example can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but it is very much In the case of the former be more preferably embodiment.Based on this understanding, technical solution of the present invention is substantially in other words to existing The part that technology contributes can be embodied in the form of software products, which is stored in a storage In medium (such as ROM/RAM, magnetic disk, CD), including some instructions are used so that a terminal device (can be mobile phone, calculate Machine, server or network equipment etc.) execute method described in each embodiment of the present invention.
Embodiment 3
According to embodiments of the present invention, a kind of device of text generation is provided, Fig. 5 is the text according to the embodiment of the present application The schematic diagram of generating means.As shown in figure 5, the device 500 includes selecting module 502, determining module 504 and text generation module 506。
Wherein, selecting module 502, for concentrating the object knowledge map of selection target entity from knowledge mapping, wherein know Knowing atlas for characterizing attribute value of at least one entity on preset attribute, target entity is object to be evaluated;It determines Module 504, for determining the entity vector, attribute vector and attribute value vector of target entity based on object knowledge map, wherein Entity vector, attribute vector and attribute value vector are characterized with triple vector;Text generation module 506, for according to entity to Amount, attribute vector and attribute value vector generate and the matched text of target entity.
Optionally, above-mentioned apparatus can also include: map generation module, for concentrating selection target real from knowledge mapping Before the object knowledge map of body, knowledge mapping collection is generated, wherein map generation module includes: building module, is known for constructing Know the planning layer of atlas, wherein planning layer includes at least: entity type, attribute type and attribute Value Types;First obtains mould Block, for obtaining record information, wherein record information includes: attribute value of at least one entity on preset attribute;It will record Information input is into planning layer, and map generates submodule, for generating knowledge mapping collection.
Optionally, above-mentioned apparatus can also include: preprocessing module, for will record information input to planning layer it Before, record information is pre-processed, the record information that obtains that treated, wherein pretreatment includes at least one following: entity Extraction, attribute extraction, attribute value extracts and entity disambiguates.
Optionally it is determined that module includes: extraction module, the entity for extracting the target entity in object knowledge map is believed Breath, attribute information and attribute value information;First conversion module, for using preset algorithm by entity information be converted to boolean to Amount, is converted into higher-dimension numeric type vector for attribute information and attribute value information using preset model, obtains triple vector.
Optionally, text generation module includes: input module, is used for entity vector, attribute vector and attribute value vector It is input in text generation model, wherein text generation model includes deep neural network model, deep neural network model root It is obtained according to triple sample and samples of text training;Text generation submodule, for based on the generation of text generation model and target Entities Matching text.
Optionally, above-mentioned apparatus can also include: model generation module, for by entity vector, attribute vector and category Property value vector be input to before text generation model, generate text generation model, wherein model generation module includes: second to obtain Modulus block, for obtaining triple sample and samples of text;Second conversion module, for using preset algorithm by triple sample In entity sample be converted to boolean vector, using preset model by attribute sample, the attribute value sample standard deviation in triple sample Higher-dimension numeric type vector is converted to, triple vector sample is obtained;Training module, for being based on triple vector sample and text Sample training text generation model obtains trained text generation model.
Optionally, training module includes: coding module, for utilizing the coder processes triple for combining attention mechanism Vector sample and samples of text, obtain context vector;Decoder module, for utilizing the decoder processes for combining attention mechanism Context vector obtains text information;Training submodule trains text for being based on text information to minimize loss function This generation model.
Optionally, above-mentioned preset algorithm is only hot algorithm, and preset model is BERT model or Word2Vector model.
It should be noted that above-mentioned selecting module 502, determining module 504 and text generation module 506 correspond to embodiment Step S102 to step S106 in 1, three modules are identical as example and application scenarios that corresponding step is realized, but not It is limited to 1 disclosure of that of above-described embodiment.
Embodiment 4
According to embodiments of the present invention, the device of another text generation is provided, Fig. 6 is the text according to the embodiment of the present application The schematic diagram of this generating means.As shown in fig. 6, the device 600 includes receiving module 602 and display module 604.
Wherein, instruction is chosen for receiving in receiving module 602, wherein choose instruction for choosing target to be evaluated real Body;Display module 604, for showing and the matched text of target entity, wherein object knowledge figure of the text according to target entity The entity vector, attribute vector and attribute value vector for composing the target entity determined generate, and object knowledge map comes from knowledge graph Spectrum collection, knowledge mapping collection is for characterizing attribute value of at least one entity on preset attribute, entity vector, attribute vector and category Property value vector with triple vector characterize.
Optionally, above-mentioned apparatus can also include map generation module, in display and the matched text of target entity Before, knowledge mapping collection is generated, wherein map generation module may include: building module, for constructing the rule of knowledge mapping collection Draw layer, wherein planning layer includes at least: entity type, attribute type and attribute Value Types;First obtains module, for obtaining Information is recorded, wherein record information includes: attribute value of at least one entity on preset attribute;Map generates submodule, uses In information input will be recorded into planning layer, knowledge mapping collection is generated.
Optionally, above-mentioned apparatus can also include preprocessing module, be used for before it will record information input to planning layer, Record information is pre-processed, the record information that obtains that treated, wherein pretreatment includes at least one following: entity is taken out It takes, attribute extraction, attribute value extract and entity disambiguates.
It optionally, further include determining module in display module, for determining the reality of target entity according to object knowledge map Body vector, attribute vector and attribute value vector, wherein determining module may include: extraction module, for extracting object knowledge figure Entity information, attribute information and the attribute value information of target entity in spectrum;First conversion module, for using preset algorithm will Entity information is converted to boolean vector, using preset model by attribute information and attribute value information be converted into high latitude numeric type to Amount, obtains triple vector.
It optionally, further include text generation module in display module, for according to entity vector, attribute vector and attribute value Vector generates text, wherein text generation module may include: input module, be used for entity vector, attribute vector and attribute Value vector is input in text generation model, wherein text generation model includes deep neural network model, deep neural network Model is obtained according to triple sample and samples of text training;Text generation submodule, for being generated based on text generation model With target entity matched text.
Optionally, above-mentioned apparatus can also include model generation module, for by entity vector, attribute vector and attribute Value vector is input to before text generation model, generates text generation model, wherein model generation module may include: second Module is obtained, for obtaining triple sample and samples of text;Second conversion module, for using preset algorithm by triple sample Entity sample in this is converted to boolean vector, using preset model by attribute sample, the attribute value sample in triple sample It is converted into high latitude numeric type vector, obtains triple vector sample;Training module, for being based on triple vector sample and text This sample training text generation model obtains trained text generation model.
Optionally, training module may include: coding module, for utilizing the coder processes three for combining attention mechanism Tuple vector sample and samples of text, obtain context vector;Decoder module, for utilizing the decoder for combining attention mechanism Context vector is handled, text information is obtained;Training submodule is instructed for being based on text information with minimizing loss function Practice text generation model.
Optionally, above-mentioned preset algorithm is only hot algorithm, and preset model is BERT model or Word2Vector model.
It should be noted that above-mentioned receiving module 602 and display module 604 correspond to the step S402 in embodiment 2 extremely Step S404, two modules are identical as example and application scenarios that corresponding step is realized, but are not limited to the above embodiments 2 Disclosure of that.
Embodiment 5
According to embodiments of the present invention, a kind of storage medium is provided, storage medium includes the program of storage, wherein in journey Equipment executes the document creation method in embodiment 1 or 2 where controlling storage medium when sort run.
Embodiment 6
According to embodiments of the present invention, a kind of processor is provided, processor is for running program, wherein run in program Document creation method in Shi Zhihang embodiment 1 or 2.
The serial number of the above embodiments of the invention is only for description, does not represent the advantages or disadvantages of the embodiments.
In the above embodiment of the invention, it all emphasizes particularly on different fields to the description of each embodiment, does not have in some embodiment The part of detailed description, reference can be made to the related descriptions of other embodiments.
In several embodiments provided herein, it should be understood that disclosed technology contents can pass through others Mode is realized.Wherein, the apparatus embodiments described above are merely exemplary, such as the division of the unit, Ke Yiwei A kind of logical function partition, there may be another division manner in actual implementation, for example, multiple units or components can combine or Person is desirably integrated into another system, or some features can be ignored or not executed.Another point, shown or discussed is mutual Between coupling, direct-coupling or communication connection can be through some interfaces, the INDIRECT COUPLING or communication link of unit or module It connects, can be electrical or other forms.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple On unit.It can some or all of the units may be selected to achieve the purpose of the solution of this embodiment according to the actual needs.
It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list Member both can take the form of hardware realization, can also realize in the form of software functional units.
If the integrated unit is realized in the form of SFU software functional unit and sells or use as independent product When, it can store in a computer readable storage medium.Based on this understanding, technical solution of the present invention is substantially The all or part of the part that contributes to existing technology or the technical solution can be in the form of software products in other words It embodies, which is stored in a storage medium, including some instructions are used so that a computer Equipment (can for personal computer, server or network equipment etc.) execute each embodiment the method for the present invention whole or Part steps.And storage medium above-mentioned includes: that USB flash disk, read-only memory (ROM, Read-Only Memory), arbitrary access are deposited Reservoir (RAM, Random Access Memory), mobile hard disk, magnetic or disk etc. be various to can store program code Medium.
The above is only a preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art For member, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications are also answered It is considered as protection scope of the present invention.

Claims (11)

1. a kind of document creation method characterized by comprising
The object knowledge map of selection target entity is concentrated from knowledge mapping, wherein the knowledge mapping collection is for characterizing at least Attribute value of one entity on preset attribute, the target entity are object to be evaluated;
The entity vector, attribute vector and attribute value vector of the target entity are determined based on the object knowledge map, wherein The entity vector, attribute vector and attribute value vector are characterized with triple vector;
It is generated and the matched text of the target entity according to the entity vector, attribute vector and attribute value vector.
2. the method according to claim 1, wherein concentrating the target of selection target entity to know from knowledge mapping Before knowing map, the method also includes: generate the knowledge mapping collection, wherein generate the knowledge mapping Ji Buzhoubao It includes:
Construct the planning layer of the knowledge mapping collection, wherein the planning layer includes at least: entity type, attribute type and category Property Value Types;
Record information is obtained, wherein the record information includes: attribute value of at least one entity on preset attribute;
By the record information input into the planning layer, the knowledge mapping collection is generated.
3. according to the method described in claim 2, it is characterized in that, by the record information input to the planning layer it Before, the method also includes:
The record information is pre-processed, the record information that obtains that treated, wherein the pretreatment include it is following at least One of: entity extraction, attribute extraction, attribute value extracts and entity disambiguates.
4. the method according to claim 1, wherein determining the target entity based on the object knowledge map Entity vector, attribute vector and attribute value vector, comprising:
Extract entity information, attribute information and the attribute value information of the target entity in the object knowledge map;
The entity information is converted into boolean vector using preset algorithm, using preset model by the attribute information and described Attribute value information is converted into high latitude numeric type vector, obtains the triple vector.
5. the method according to claim 1, wherein according to the entity vector, attribute vector and attribute value to Amount generates and the matched text of the target entity, comprising:
The entity vector, attribute vector and attribute value vector are input in text generation model, wherein the text generation Model includes deep neural network model, and the deep neural network model is according to triple sample and samples of text trained It arrives;
It is generated and the matched text of the target entity based on the text generation model.
6. according to the method described in claim 5, it is characterized in that, by the entity vector, attribute vector and attribute value to Amount is input to before text generation model, the method also includes: generate the text generation model, wherein generate the text The step of this generation model includes:
Obtain the triple sample and samples of text;
The entity sample in the triple sample is converted into boolean vector using preset algorithm, it will be described using preset model Attribute sample, attribute value sample in triple sample are converted into high latitude numeric type vector, obtain triple vector sample;
Based on the triple vector sample and the samples of text training text generation model, trained text is obtained Generate model.
7. according to the method described in claim 5, it is characterized in that, being based on the triple vector sample and the samples of text The training text generation model, obtains trained text generation model, comprising:
Using triple vector sample described in the coder processes in conjunction with attention mechanism and the samples of text, context is obtained Vector;
Using context vector described in the decoder processes in conjunction with attention mechanism, text information is obtained;
Based on the text information, the text generation model is trained to minimize loss function.
8. a kind of document creation method characterized by comprising
Instruction is chosen in reception, wherein described to choose instruction for choosing target entity to be evaluated;
Display and the matched text of the target entity, wherein object knowledge map of the text according to the target entity Entity vector, attribute vector and the attribute value vector for the target entity determined generate, and the object knowledge map comes from Knowledge mapping collection, the knowledge mapping collection for characterizing attribute value of at least one entity on preset attribute, the entity to Amount, attribute vector and attribute value vector are characterized with triple vector.
9. a kind of text generating apparatus characterized by comprising
Selecting module, for concentrating the object knowledge map of selection target entity from knowledge mapping, wherein the knowledge mapping collection For characterizing attribute value of at least one entity on preset attribute, the target entity is object to be evaluated;
Determining module, for determining entity vector, attribute vector and the category of the target entity based on the object knowledge map Property value vector, wherein the entity vector, attribute vector and attribute value vector with triple vector characterize;
Text generation module, for being generated and the target entity according to the entity vector, attribute vector and attribute value vector Matched text.
10. a kind of storage medium, which is characterized in that the storage medium includes the program of storage, wherein run in described program When control the storage medium where equipment perform claim require 1 or 8 document creation method.
11. a kind of processor, which is characterized in that the processor is for running program, wherein right of execution when described program is run Benefit requires 1 or 8 document creation method.
CN201910775353.1A 2019-08-21 2019-08-21 Document creation method and device Pending CN110489755A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910775353.1A CN110489755A (en) 2019-08-21 2019-08-21 Document creation method and device
PCT/CN2019/126797 WO2021031480A1 (en) 2019-08-21 2019-12-20 Text generation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910775353.1A CN110489755A (en) 2019-08-21 2019-08-21 Document creation method and device

Publications (1)

Publication Number Publication Date
CN110489755A true CN110489755A (en) 2019-11-22

Family

ID=68552697

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910775353.1A Pending CN110489755A (en) 2019-08-21 2019-08-21 Document creation method and device

Country Status (2)

Country Link
CN (1) CN110489755A (en)
WO (1) WO2021031480A1 (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111061152A (en) * 2019-12-23 2020-04-24 深圳供电局有限公司 Attack recognition method based on deep neural network and intelligent energy power control device
CN111209389A (en) * 2019-12-31 2020-05-29 天津外国语大学 Movie story generation method
CN111897955A (en) * 2020-07-13 2020-11-06 广州视源电子科技股份有限公司 Comment generation method, device and equipment based on coding and decoding and storage medium
CN111930959A (en) * 2020-07-14 2020-11-13 上海明略人工智能(集团)有限公司 Method and device for generating text by using map knowledge
CN112036146A (en) * 2020-08-25 2020-12-04 广州视源电子科技股份有限公司 Comment generation method and device, terminal device and storage medium
CN112069781A (en) * 2020-08-27 2020-12-11 广州视源电子科技股份有限公司 Comment generation method and device, terminal device and storage medium
WO2021031480A1 (en) * 2019-08-21 2021-02-25 广州视源电子科技股份有限公司 Text generation method and device
CN113111188A (en) * 2021-04-14 2021-07-13 清华大学 Text generation method and system
CN113157941A (en) * 2021-04-08 2021-07-23 支付宝(杭州)信息技术有限公司 Service characteristic data processing method, service characteristic data processing device, text generating method, text generating device and electronic equipment
CN113488165A (en) * 2021-07-26 2021-10-08 平安科技(深圳)有限公司 Text matching method, device and equipment based on knowledge graph and storage medium
CN113569554A (en) * 2021-09-24 2021-10-29 北京明略软件***有限公司 Entity pair matching method and device in database, electronic equipment and storage medium

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113158189B (en) * 2021-04-28 2023-12-26 绿盟科技集团股份有限公司 Method, device, equipment and medium for generating malicious software analysis report
CN113239203A (en) * 2021-06-02 2021-08-10 北京金山数字娱乐科技有限公司 Knowledge graph-based screening method and device
CN113609291A (en) * 2021-07-27 2021-11-05 科大讯飞(苏州)科技有限公司 Entity classification method and device, electronic equipment and storage medium
CN113761167B (en) * 2021-09-09 2023-10-20 上海明略人工智能(集团)有限公司 Session information extraction method, system, electronic equipment and storage medium
CN116306925B (en) * 2023-03-14 2024-05-03 中国人民解放军总医院 Method and system for generating end-to-end entity link
CN116150929B (en) * 2023-04-17 2023-07-07 中南大学 Construction method of railway route selection knowledge graph
CN116452072B (en) * 2023-06-19 2023-08-29 华南师范大学 Teaching evaluation method, system, equipment and readable storage medium
CN117332282B (en) * 2023-11-29 2024-03-08 之江实验室 Knowledge graph-based event matching method and device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170039275A1 (en) * 2015-08-03 2017-02-09 International Business Machines Corporation Automated Article Summarization, Visualization and Analysis Using Cognitive Services
CN108763336A (en) * 2018-05-12 2018-11-06 北京无忧创新科技有限公司 A kind of visa self-help serving system
CN109684394A (en) * 2018-12-13 2019-04-26 北京百度网讯科技有限公司 Document creation method, device, equipment and storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11163836B2 (en) * 2018-02-12 2021-11-02 International Business Machines Corporation Extraction of information and smart annotation of relevant information within complex documents
CN108345690B (en) * 2018-03-09 2020-11-13 广州杰赛科技股份有限公司 Intelligent question and answer method and system
CN109189944A (en) * 2018-09-27 2019-01-11 桂林电子科技大学 Personalized recommending scenery spot method and system based on user's positive and negative feedback portrait coding
CN110489755A (en) * 2019-08-21 2019-11-22 广州视源电子科技股份有限公司 Document creation method and device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170039275A1 (en) * 2015-08-03 2017-02-09 International Business Machines Corporation Automated Article Summarization, Visualization and Analysis Using Cognitive Services
CN108763336A (en) * 2018-05-12 2018-11-06 北京无忧创新科技有限公司 A kind of visa self-help serving system
CN109684394A (en) * 2018-12-13 2019-04-26 北京百度网讯科技有限公司 Document creation method, device, equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
蔡圆媛: "《大数据环境下基于知识整合的语义计算技术与应用》", 31 August 2018, 北京理工大学出版社 *

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021031480A1 (en) * 2019-08-21 2021-02-25 广州视源电子科技股份有限公司 Text generation method and device
CN111061152A (en) * 2019-12-23 2020-04-24 深圳供电局有限公司 Attack recognition method based on deep neural network and intelligent energy power control device
CN111209389B (en) * 2019-12-31 2023-08-11 天津外国语大学 Movie story generation method
CN111209389A (en) * 2019-12-31 2020-05-29 天津外国语大学 Movie story generation method
CN111897955A (en) * 2020-07-13 2020-11-06 广州视源电子科技股份有限公司 Comment generation method, device and equipment based on coding and decoding and storage medium
CN111897955B (en) * 2020-07-13 2024-04-09 广州视源电子科技股份有限公司 Comment generation method, device, equipment and storage medium based on encoding and decoding
CN111930959A (en) * 2020-07-14 2020-11-13 上海明略人工智能(集团)有限公司 Method and device for generating text by using map knowledge
CN111930959B (en) * 2020-07-14 2024-02-09 上海明略人工智能(集团)有限公司 Method and device for generating text by map knowledge
CN112036146A (en) * 2020-08-25 2020-12-04 广州视源电子科技股份有限公司 Comment generation method and device, terminal device and storage medium
CN112069781B (en) * 2020-08-27 2024-01-02 广州视源电子科技股份有限公司 Comment generation method and device, terminal equipment and storage medium
CN112069781A (en) * 2020-08-27 2020-12-11 广州视源电子科技股份有限公司 Comment generation method and device, terminal device and storage medium
CN113157941A (en) * 2021-04-08 2021-07-23 支付宝(杭州)信息技术有限公司 Service characteristic data processing method, service characteristic data processing device, text generating method, text generating device and electronic equipment
CN113111188B (en) * 2021-04-14 2022-08-09 清华大学 Text generation method and system
CN113111188A (en) * 2021-04-14 2021-07-13 清华大学 Text generation method and system
CN113488165A (en) * 2021-07-26 2021-10-08 平安科技(深圳)有限公司 Text matching method, device and equipment based on knowledge graph and storage medium
CN113488165B (en) * 2021-07-26 2023-08-22 平安科技(深圳)有限公司 Text matching method, device, equipment and storage medium based on knowledge graph
CN113569554A (en) * 2021-09-24 2021-10-29 北京明略软件***有限公司 Entity pair matching method and device in database, electronic equipment and storage medium

Also Published As

Publication number Publication date
WO2021031480A1 (en) 2021-02-25

Similar Documents

Publication Publication Date Title
CN110489755A (en) Document creation method and device
Bang et al. Explaining a black-box by using a deep variational information bottleneck approach
CN110750959B (en) Text information processing method, model training method and related device
CN111026842B (en) Natural language processing method, natural language processing device and intelligent question-answering system
CN109783657A (en) Multistep based on limited text space is from attention cross-media retrieval method and system
CN108984530A (en) A kind of detection method and detection system of network sensitive content
CN108536681A (en) Intelligent answer method, apparatus, equipment and storage medium based on sentiment analysis
CN110516245A (en) Fine granularity sentiment analysis method, apparatus, computer equipment and storage medium
CN107330011A (en) The recognition methods of the name entity of many strategy fusions and device
CN109543034B (en) Text clustering method and device based on knowledge graph and readable storage medium
CN108416065A (en) Image based on level neural network-sentence description generates system and method
CN107423398A (en) Exchange method, device, storage medium and computer equipment
CN108549658A (en) A kind of deep learning video answering method and system based on the upper attention mechanism of syntactic analysis tree
CN106897559A (en) A kind of symptom and sign class entity recognition method and device towards multi-data source
CN108491421A (en) A kind of method, apparatus, equipment and computer storage media generating question and answer
CN109446505A (en) A kind of model essay generation method and system
CN109710744A (en) A kind of data matching method, device, equipment and storage medium
CN109710769A (en) A kind of waterborne troops's comment detection system and method based on capsule network
CN110134954A (en) A kind of name entity recognition method based on Attention mechanism
CN110795565A (en) Semantic recognition-based alias mining method, device, medium and electronic equipment
CN115455171B (en) Text video mutual inspection rope and model training method, device, equipment and medium
CN117055724A (en) Generating type teaching resource system in virtual teaching scene and working method thereof
CN110532393A (en) Text handling method, device and its intelligent electronic device
CN110018823A (en) Processing method and system, the generation method and system of interactive application
CN111931503B (en) Information extraction method and device, equipment and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20191122