CN110489755A

CN110489755A - Document creation method and device

Info

Publication number: CN110489755A
Application number: CN201910775353.1A
Authority: CN
Inventors: 吴智东
Original assignee: Guangzhou Shiyuan Electronics Thecnology Co Ltd
Current assignee: Guangzhou Shiyuan Electronics Thecnology Co Ltd
Priority date: 2019-08-21
Filing date: 2019-08-21
Publication date: 2019-11-22
Also published as: WO2021031480A1

Abstract

The invention discloses a kind of document creation method and devices.Wherein, this method comprises: concentrating the object knowledge map of selection target entity from knowledge mapping, wherein knowledge mapping collection is object to be evaluated for characterizing attribute value of at least one entity on preset attribute, target entity；The entity vector, attribute vector and attribute value vector of target entity are determined based on object knowledge map, wherein entity vector, attribute vector and attribute value vector are characterized with triple vector；It is generated and the matched text of target entity according to entity vector, attribute vector and attribute value vector.The present invention solves the text information shortage generated in the related technology merely with deep learning algorithm and comments to the personalization of entity, the technical problem for causing the practical manifestation matching degree of text information and entity not high.

Description

Document creation method and device

Technical field

The present invention relates to natural language processing fields, in particular to a kind of document creation method and device.

Background technique

Text generation technology is the one of the field natural language processing (Natural Language Processing, NLP) A important research direction, it is intended to the sentence for meeting human language rule, not no syntax error is automatically generated by rule, algorithm etc. Son.

The application of text generation technology is very more.For example, at the end of every term, teacher needs in education sector According to the daily performance of student, one section of descriptive, suggestiveness comment about student performance is write out.Tradition generates each student The method of comment rely on teacher to write manually mostly, such mode not only consumes teacher's a large amount of time, moreover, teacher The different daily performance for surely accurately remembering all students.Therefore, the settling mode of existing comparative maturity is, by the student of input Information carries out similarity calculation with the comment template of manual construction, chooses the highest template of similarity as the comment generated.

However, the comment of the above method be it is artificial constructed come out, rather than pass through algorithm generate, therefore, the above method It is not able to batch, is intelligent, generating different comments to each student to personalization.In addition, since comment is to pass through numerology For similarity between raw information and comment template come what is obtained, this mode only accounts for the information on character surface, does not account for To the semantic layer information of comment text.To solve this problem, deep learning algorithm considers system of the text in multiple dimensions Score cloth, and comment is generated using the mode of probability.But deep learning algorithm lacks education information, to particular student Daily behavior performance and comment between potential relationship Deficiency of learning ability, lack and personalized comment generated to particular student Ability, so that the practical manifestation matching degree of comment and student that deep learning algorithm generates is not high, inaccurate.

Lack for the text information generated in the related technology merely with deep learning algorithm and the personalization of entity commented, The technical problem for causing the practical manifestation matching degree of text information and entity not high, currently no effective solution has been proposed.

Summary of the invention

The present invention provides a kind of document creation method and devices, at least to solve in the related technology merely with deep learning The text information that algorithm generates, which lacks, comments the personalization of student, leads to the practical manifestation matching degree of text information and student not High technical problem.

According to an aspect of an embodiment of the present invention, a kind of document creation method is provided, comprising: concentrate from knowledge mapping The object knowledge map of selection target entity, wherein knowledge mapping collection is for characterizing at least one entity on preset attribute Attribute value, target entity are object to be evaluated；Entity vector, the attribute vector of target entity are determined based on object knowledge map With attribute value vector, wherein entity vector, attribute vector and attribute value vector are characterized with triple vector；Foundation entity vector, Attribute vector and attribute value vector generate and the matched text of target entity.

Optionally, before the object knowledge map from knowledge mapping concentration selection target entity, the above method further include: Generate knowledge mapping collection, wherein the step of generating knowledge mapping collection includes: the planning layer for constructing knowledge mapping collection, wherein planning Layer includes at least: entity type, attribute type and attribute Value Types；Record information is obtained, wherein record information includes: at least one Attribute value of a entity on preset attribute；Information input will be recorded into planning layer, generate knowledge mapping collection.

Optionally, before it will record information input to planning layer, the above method further include: record information is located in advance Reason, the record information that obtains that treated, wherein pretreatment includes at least one following: entity extraction, attribute extraction, attribute value It extracts and entity disambiguates.

Optionally, the entity vector, attribute vector and attribute value vector of target entity are determined based on object knowledge map, are wrapped It includes: extracting entity information, attribute information and the attribute value information of the target entity in object knowledge map；It will using preset algorithm Entity information is converted to boolean vector, using preset model by attribute information and attribute value information be converted into high latitude numeric type to Amount, obtains triple vector.

Optionally, it is generated and the matched text of target entity, packet according to entity vector, attribute vector and attribute value vector It includes: entity vector, attribute vector and attribute value vector is input in text generation model, wherein text generation model includes Deep neural network model, deep neural network model are obtained according to triple sample and samples of text training；It is raw based on text It is generated and target entity matched text at model.

Optionally, before entity vector, attribute vector and attribute value vector are input to text generation model, above-mentioned side Method further include: generate text generation model, wherein the step of generating text generation model includes: to obtain triple sample and text This sample；The entity sample in triple sample is converted into boolean vector using preset algorithm, using preset model by ternary Attribute sample, attribute value sample in group sample are converted into high latitude numeric type vector, obtain triple vector sample；Based on three Tuple vector sample and samples of text training text generate model, obtain trained text generation model.

Optionally, model is generated based on triple vector sample and samples of text training text, obtains trained text Generate model, comprising: using the coder processes triple vector sample and samples of text for combining attention mechanism, obtain up and down Literary vector；Using the decoder processes context vector for combining attention mechanism, text information is obtained；Based on text information, with It minimizes loss function and carrys out training text generation model.

Other side according to an embodiment of the present invention additionally provides a kind of document creation method, comprising: finger is chosen in reception It enables, wherein choose instruction for choosing target entity to be evaluated；Display with the matched text of target entity, wherein text according to Entity vector, attribute vector and the attribute value vector for the target entity determined according to the object knowledge map of target entity generate, Object knowledge map comes from knowledge mapping collection, and knowledge mapping collection is for characterizing attribute of at least one entity on preset attribute Value, entity vector, attribute vector and attribute value vector are characterized with triple vector.

Other side according to an embodiment of the present invention additionally provides a kind of text generating apparatus, comprising: selecting module, For concentrating the object knowledge map of selection target entity from knowledge mapping, wherein knowledge mapping collection is for characterizing at least one Attribute value of the entity on preset attribute, target entity are object to be evaluated；Determining module, for being based on object knowledge map Determine the entity vector, attribute vector and attribute value vector of target entity, wherein entity vector, attribute vector and attribute value to Amount is characterized with triple vector；Text generation module, for according to entity vector, attribute vector and the generation of attribute value vector and mesh Mark the text of Entities Matching.

Other side according to an embodiment of the present invention, additionally provides a kind of storage medium, and storage medium includes storage Program, wherein equipment where control storage medium executes any one of the above document creation method in program operation.

Other side according to an embodiment of the present invention additionally provides a kind of processor, and processor is used to run program, In, program executes any one of the above document creation method when running.

In embodiments of the present invention, the object knowledge map of selection target entity is concentrated from knowledge mapping, wherein knowledge graph Spectrum collection is object to be evaluated for characterizing attribute value of at least one entity on preset attribute, target entity；Based on target Knowledge mapping determines the entity vector, attribute vector and attribute value vector of target entity, wherein entity vector, attribute vector and Attribute value vector is characterized with triple vector；It is generated and target entity according to entity vector, attribute vector and attribute value vector The text matched.Compared with prior art, the application establishes knowledge mapping collection using the usually performance of multiple entities, then therefrom mentions The triple vector of object knowledge map is taken, then deep learning algorithm is combined to generate comment.The program is by by knowledge mapping It combines with deep learning, so that deep learning algorithm is connected to all properties of entity, and then solves in the related technology only Lacked using the text information that deep learning algorithm generates and the personalization of entity is commented, leads to the reality of text information and entity The not high technical problem of matching degree is showed, has achieved the purpose that farthest generation meets the comment that entity usually shows, and mentions The high matching degree of comment.

Detailed description of the invention

The drawings described herein are used to provide a further understanding of the present invention, constitutes part of this application, this hair Bright illustrative embodiments and their description are used to explain the present invention, and are not constituted improper limitations of the present invention.In the accompanying drawings:

Fig. 1 is the flow chart according to a kind of document creation method of the embodiment of the present application 1；

Fig. 2 is the basic principle block diagram according to a kind of comment generation method of the embodiment of the present application 1；

Fig. 3 is the detailed schematic diagram based on the basic principle of comment generation method shown in Fig. 2；

Fig. 4 is the flow chart according to a kind of document creation method of the embodiment of the present application 2；

Fig. 5 is the structural schematic diagram according to a kind of text generating apparatus of the embodiment of the present application 3；And

Fig. 6 is the structural schematic diagram according to a kind of text generating apparatus of the embodiment of the present application 4.

Specific embodiment

In order to enable those skilled in the art to better understand the solution of the present invention, below in conjunction in the embodiment of the present invention Attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is only The embodiment of a part of the invention, instead of all the embodiments.Based on the embodiments of the present invention, ordinary skill people The model that the present invention protects all should belong in member's every other embodiment obtained without making creative work It encloses.

It should be noted that description and claims of this specification and term " first " in above-mentioned attached drawing, " Two " etc. be to be used to distinguish similar objects, without being used to describe a particular order or precedence order.It should be understood that using in this way Data be interchangeable under appropriate circumstances, so as to the embodiment of the present invention described herein can in addition to illustrating herein or Sequence other than those of description is implemented.In addition, term " includes " and " having " and their any deformation, it is intended that cover Cover it is non-exclusive include, for example, the process, method, system, product or equipment for containing a series of steps or units are not necessarily limited to Step or unit those of is clearly listed, but may include be not clearly listed or for these process, methods, product Or other step or units that equipment is intrinsic.

Embodiment 1

According to embodiments of the present invention, a kind of embodiment of document creation method is provided, it should be noted that in attached drawing The step of process illustrates can execute in a computer system such as a set of computer executable instructions, although also, Logical order is shown in flow chart, but in some cases, it can be to be different from shown by sequence execution herein or retouch The step of stating.

Fig. 1 is document creation method according to an embodiment of the present invention, as shown in Figure 1, this method comprises the following steps:

Step S102 concentrates the object knowledge map of selection target entity from knowledge mapping, wherein knowledge mapping collection is used for Attribute value of at least one entity on preset attribute is characterized, target entity is object to be evaluated.

In a kind of optinal plan, above-mentioned entity can be the object of any required evaluation such as student, mechanism, company personnel； For student, above-mentioned preset attribute can for classroom performance, self-image, social activity performance, Emotion expression, examine in week achievement, Final grade etc., corresponding attribute value can be positive, clean and tidy, active, stable, big rise and fall, excellent etc.；For mechanism, on Stating preset attribute can be brand image, granted patent quantity, annual return, public and social interest etc., and corresponding attribute value can be shadow Ring it is big, be greater than 100,200,000,000, it is active etc..

Knowledge mapping (Knowledge Graph, KG) is used as knowledge organization and retrieval technique new under big data era In describing concept and its correlation in physical world with sign format.Knowledge mapping collection summarizes the knowledge graph of multiple entities Spectrum, the knowledge mapping of each entity record the daily behavior performance of the entity, since each entity is one independent The knowledge mapping of body, each entity is naturally different.When some entity of needs assessment, i.e. target entity, from knowledge mapping collection The object knowledge map of middle selection target entity.

By taking student as an example, in the summing-up comment for needing to generate student A, is concentrated from knowledge mapping and extract knowing for student A Know map, which recites attribute value of the student A on all properties, that is, record the daily of student's A various aspects Behavior expression.

Step S104 determines the entity vector, attribute vector and attribute value vector of target entity based on object knowledge map, Wherein, entity vector, attribute vector and attribute value vector are characterized with triple vector.

In above-mentioned steps, by the entity information, attribute information and the attribute that extract target entity in object knowledge map Value information, and it is converted into the entity vector, attribute vector and attribute value vector for being easy to text generation model treatment, Ke Yi great It is big to improve the matching degree for generating text.

It should be noted that triple is a kind of generic representation form of knowledge mapping, the present embodiment is carried out with triple Citing, does not constitute the limitation to the application.

Step S106 is generated and the matched text of target entity according to entity vector, attribute vector and attribute value vector.

In a kind of optinal plan, the text generation model of above-mentioned generation text can be deep neural network model.

Deep neural network is the interdisciplinary study combined about mathematics and computer, depth different from machine learning Neural network is able to achieve the high latitude feature extraction of end-to-end data, abstract, and solve that feature in machine learning is difficult to extract asks Topic.Such as typical Seq2Seq model, generation confrontation network model etc..

Seq2Seq is the model of an Encoder-Deocder structure, and basic thought is to utilize two circulation nerve nets Network, one is used as encoder, and one is used as decoder, and the list entries of a variable-length is become regular length by encoder Vector, this vector are considered as the semanteme of the sequence, and decoder is by the vector decoding of this regular length at variable-length Output sequence；It generates in confrontation network (Generative Adversarial Networks, GAN) model and includes at least two Module, a generation model, the mutual Game Learning of a confrontation model, two models generate fairly good output.Therefore, on It states two kinds of deep neural network algorithms to apply in comment generation field, and robust more more accurate than machine learning method can be reached Effect.

In above-mentioned steps, by by object knowledge map determine by entity vector, attribute vector and attribute value vector The triple vector of composition, is input in deep neural network model, can be generated and shows phase with the daily behavior of target entity Matched comment text.

It is easily noted that, in existing text generation field, even if there is the text generation of knowledge based map, nor complete The information such as the entity information, attribute information and attribute value of this province of knowledge mapping are entirely used, but using knowledge mapping as intermediary, then By search, or the method for calculating similitude searches suitable text.However, the present invention is by knowledge mapping and deep neural network It combines, it is contemplated that the daily behavior of target entity shows, and to different entities, can automatically generate and meet the practical table of the entity The comment of existing situation, improves the matching degree and accuracy of comment.

Still by taking student as an example, teacher needs to write when winter and summer vacation the comment of one section of summing-up for every student.Teacher It can be by clicking knowledge mapping of the mouse from knowledge mapping concentration extraction student to be evaluated, which recites the student The information such as daily performance, such as classroom performance, self-image, social performance, Emotion expression, final grade.Execute this implementation The terminal of example method determines the triple vector of the student based on the knowledge mapping of the student, is input to depth nerve In network model, the display interface of terminal can automatically generate the matched comment of daily performance with the student.Using upper Scheme is stated, the time and efforts of teacher is greatly saved, avoids the daily behavior performance that infull student is not allowed or is remembered to teacher's note, Comment and the not high problem of the matching degree of student is caused to occur.

Based on the scheme that the above embodiments of the present application provide, the object knowledge figure of selection target entity is concentrated from knowledge mapping Spectrum, wherein knowledge mapping collection is to be evaluated for characterizing attribute value of at least one entity on preset attribute, target entity Object；The entity vector, attribute vector and attribute value vector of target entity are determined based on object knowledge map, wherein entity to Amount, attribute vector and attribute value vector are characterized with triple vector；It is generated according to entity vector, attribute vector and attribute value vector With the matched text of target entity.Compared with prior art, the application establishes knowledge mapping using the usually performance of multiple entities Then collection therefrom extracts the triple vector of object knowledge map, then deep learning algorithm is combined to generate comment.The program is logical It crosses and combines knowledge mapping and deep learning, so that deep learning algorithm is connected to all properties of entity, and then solve Lack in the related technology merely with the text information that deep learning algorithm generates and the personalization of entity is commented, leads to text information The not high technical problem with the practical manifestation matching degree of entity has reached and has farthest generated commenting of meeting that entity usually shows The purpose of language improves the matching degree of comment.

Optionally, step S102 is being executed before the object knowledge map that knowledge mapping concentrates selection target entity, on The method of stating can also include step S101, generate knowledge mapping collection, wherein the step of generating knowledge mapping collection can specifically include Following steps:

Step S1012 constructs the planning layer of knowledge mapping collection, wherein planning layer includes at least: entity type, Attribute class Type and attribute Value Types.

In a kind of optinal plan, above-mentioned planning layer can pass through ontology construction tool Prot é g é software editing.Protégé Software is ontology editing and knowledge acquisition software based on Java language exploitation, and user only needs to carry out ontology on concept hierarchy The building of model, it is simple to operation.

Planning layer is equivalent to the framework of knowledge mapping, includes at least entity type, attribute type and attribute value in planning layer Type, it is of course also possible to the information such as including the time.

Step S1014 obtains record information, wherein record information includes: category of at least one entity on preset attribute Property value.

In a kind of optinal plan, above-mentioned record information can be by being manually entered to the computer for executing the present embodiment method In terminal.For example, the classroom Li Ming shows positive, good, final grade A of image etc., big classroom performance love doze, social activity performance Not positive, final grade B etc..In this way, can consider the daily behavior of target entity comprehensively when generating the text of target entity Performance, avoids missing feature.

Step S1016 generates knowledge mapping collection by record information input into planning layer.

In above-mentioned steps, the entity information, attribute information, attribute value information that obtain in step S1014 correspondence are filled into In the entity type of planning layer of step S1012 building, attribute type and attribute Value Types, the knowledge of all entities is constructed with this Atlas, and store into graphic data base Neo4j.

Optionally, before executing step S1016 and will record information input to planning layer, the above method can also include: Step S1015 pre-processes record information, the record information that obtains that treated, wherein pretreatment include it is following at least it One: entity extraction, attribute extraction, attribute value extracts and entity disambiguates.

In a kind of optinal plan, above-mentioned entity extraction, attribute extraction, attribute value is extracted can know for Entity recognition, attribute Not, attribute value identifies, detection and classification including entity, attribute, attribute value.

It should be noted that being handled by entity disambiguation, two different names can be distinguished and represent the same entity, Or the case where identical two different entity of name reference.

Optionally, step S104 determines entity vector, attribute vector and the attribute of target entity based on object knowledge map It is worth vector, can specifically include following steps:

Step S1042 extracts entity information, attribute information and the attribute value letter of the target entity in object knowledge map Breath.

Entity information is converted to boolean vector using preset algorithm by step S1044, using preset model by attribute information It is converted into high latitude numeric type vector with attribute value information, obtains triple vector.

In a kind of optinal plan, above-mentioned preset algorithm can be able to be for solely hot (OneHot) algorithm, above-mentioned preset model BERT model or Word2Vector model.Wherein, BERT model is indicated with the alternating binary coding device of Transformer, is suitable for The building of the most advanced model of extensive task.

When carrying out information expression to the triple in object knowledge map, by entity information, attribute information and attribute value Information is converted to the numerical value vector for being easy to neural network model processing, and neural network model is connected to all categories of target entity Property, it can then extract high latitude attribute vector feature.Specifically, multiple triples of target entity in object knowledge map are extracted (e_i, p_ij, v_ij), wherein e_i、p_ij、v_ijRespectively indicate i-th of entity information, j-th of attribute information of i-th entity, i-th J-th of attribute value information of entity, then by e_i, p_ij, v_ijV is characterized into respectively_ei, V_pi, V_viVector.

In an alternative embodiment, using OneHot algorithm by entity e_iBoolean vector is characterized into, BERT mould is used Type is by attribute p_ij, attribute value v_ijCheng Gaowei numeric type vector is characterized, i.e.,

Wherein, t, s indicate the mapping function of feature extraction function and a neural network structure.

Optionally, step S106 is generated matched with target entity according to entity vector, attribute vector and attribute value vector Text can specifically include following steps:

Entity vector, attribute vector and attribute value vector are input in text generation model by step S1062, wherein text This generation model includes deep neural network model, and deep neural network model is trained according to triple sample and samples of text It arrives.

As previously mentioned, above-mentioned deep neural network model can be Seq2Seq model, generate and fight network model etc..

Step S1064 is generated and target entity matched text based on text generation model.

In above-mentioned steps, by entity vector V_ei, attribute vector V_piWith attribute value vector V_viIt is input to text generation model In, that is, produce the summing-up comment text y about target entity^*。

In a kind of optinal plan, above-mentioned summing-up comment text y^*It is represented by output sequence y₁,…y_T’, wherein y_t’It indicates The output character at t ' moment, i.e.,

In above formula, t' ∈ { 1 ..., T'}, c_t′Indicate the context vector at t ' moment, P (y_t'|y₁,...,y_t'-1,c_t') table Show the probability vector of the t ' moment all candidate texts, arg max indicates to choose in the candidate text generated, and probability vector value is most Big text.

Optionally, entity vector, attribute vector and attribute value vector are input to text generation mould in execution step S1062 Before type, the above method can also include step S1061, generate text generation model, wherein generate the step of text generation model Suddenly may include:

Step S10611 obtains triple sample and samples of text.

In a kind of optinal plan, above-mentioned triple sample and samples of text can form alignment corpus, be expressed as ((e, p, v),y)|((e₁,p₁,v₁),y₁),…((e_i,p_i,v_i),y_i)}。

Entity sample in triple sample is converted to boolean vector using preset algorithm, using pre- by step S10612 If attribute sample, the attribute value sample in triple sample are converted into high latitude numeric type vector by model, obtain triple to Measure sample.

As previously mentioned, above-mentioned preset algorithm may be only hot algorithm, above-mentioned preset model may be alternating binary coding device Characterization model, the process that triple sample is converted to triple vector sample is similar with step S1044, and details are not described herein.

Step S10613 generates model based on triple vector sample and samples of text training text, obtains trained Text generation model.

After construction is good by the alignment corpus of triple and comment formed, so that it may which the corpus based on construction uses depth The algorithm of neural network is spent, training text generates model.Since text generation model acquires the daily behavior table of all entities Existing data, and in this, as training corpus, training text generates model, and therefore, above scheme can be according to the day of specific entity Normal behavior expression generates the summing-up comment for meeting the entity.

In an alternative embodiment, step S10613 is based on triple vector sample and samples of text training text is raw At model, trained text generation model is obtained, can specifically include following steps:

Step S106131 is obtained using the coder processes triple vector sample and samples of text for combining attention mechanism To context vector.

In the model of Encoder-Deocder structure, there are two Recognition with Recurrent Neural Network, one is used as encoder, and one A to be used as decoder, the list entries of a variable-length is become the vector of regular length by encoder, this vector can be seen The semanteme of the sequence is done, decoder is by the vector decoding of this regular length at the output sequence of variable-length.However, if defeated The length for entering sequence is very long, and the vector effect of regular length is rather bad, and combines the coding of attention (Attention) mechanism Device can solve this ineffective problem.Specifically, the context vector of the encoder coding in conjunction with attention mechanism are as follows:

c_t'=f (h_t, y_t'-1, s_t'-1, c_t')

Wherein, f presentation code function, h_t、y_t′-1、s_t′-1、c_t′Respectively indicate hidden layer output, the decoding of encoder t moment The output at -1 moment of device t ', the implicit layer state at -1 moment of decoder t ', the context vector at t ' moment.

Step S106132 obtains text information using the decoder processes context vector for combining attention mechanism.

In the final context vector extracted in view of encoder, characteristic information is limited, and the more difficult office for capturing input Portion's feature, therefore need to combine the output of attention mechanism in encoder as a result, input parameter as decoder.Specifically, it ties Close the decoder output of attention mechanism are as follows:

P(y_t'|y₁,...,y_t'-1,c_t')=g (y_t'-1,s_t',c_t')

Wherein, g indicates decoding functions, y_t′、y_t′-1、s_t′、c_t′Respectively indicate the output at t ' moment, the output at -1 moment of t ', The implicit layer state at decoder t ' moment, the context vector at t ' moment.

Step S106133 is based on text information, carrys out training text generation model to minimize loss function.

It should be noted that the target of text generation model training are as follows: minimize the negative logarithm of text generation model seemingly Right loss function:

Wherein, xⁱ、yⁱI-th of input text, output text are respectively represented, i ∈ { 1 ..., I }, θ are model parameters.Training The result is that generation text and urtext strong correlation, and minimize text grammer mistake.

Optionally, the preset algorithm in step S1044 and step S10612 is only hot algorithm, and preset model is BERT model Or Word2Vector model.

Still by taking student as an example, Fig. 2 is the basic principle block diagram according to a kind of comment generation method of the embodiment of the present application.Such as Shown in Fig. 2, then record case of the acquisition teacher to the daily behavior data of each student first fills it into designed In knowledge mapping planning layer, the knowledge mapping collection of all student performances is constructed with this.In the comment for needing to generate student to be evaluated When, the object knowledge map for extracting student to be evaluated is concentrated from knowledge mapping, is then enter into trained text generation In model, and then the summing-up comment about the daily performance of student is exported automatically.Detailed principle as shown in figure 3, student day Normal behavioral data includes classroom performance, self-image, social performance, Emotion expression etc., and the planning layer planning of knowledge mapping has reality Body type, attribute type and attribute Value Types take out the daily behavior data of student into entity is crossed when constructing knowledge mapping collection Take, attribute extraction, attribute value extract, entity disambiguate etc. operations pre-processed, be subsequently filled in corresponding planning layer. When evaluating student ID, the knowledge subgraph of student ID is extracted first, triplet information is then extracted, is converted into ternary The form of group vector is characterized, and is recently entered in trained text generation model, candidate student's comment is generated, by teacher Reaffirm whether need to modify to the comment, to obtain final student's comment.Wherein, text generation model is by ternary Group sample and comment sample obtain the Encoder-Deocder model training for combining attention mechanism.

From the foregoing, it will be observed that the above embodiments of the present application, the object knowledge map of selection target entity is concentrated from knowledge mapping, In, knowledge mapping collection is object to be evaluated for characterizing attribute value of at least one entity on preset attribute, target entity； The entity vector, attribute vector and attribute value vector of target entity are determined based on object knowledge map, wherein entity vector belongs to Property vector sum attribute value vector with triple vector characterize；According to entity vector, attribute vector and the generation of attribute value vector and mesh Mark the text of Entities Matching.Compared with prior art, the application establishes knowledge mapping collection using the usually performance of multiple entities, so The triple vector for therefrom extracting object knowledge map afterwards, then combines deep learning algorithm to generate comment；By the way that entity is believed Breath, attribute information and attribute value information are converted to the numerical value vector for being easy to neural network model processing, neural network model connection To all properties of target entity, high latitude attribute vector feature can be then extracted；By the Encoder- for combining attention mechanism Deocder model can optimize the effect of text output；And then it solves and is generated in the related technology merely with deep learning algorithm Text information lack to entity personalization comment, the technology for causing the practical manifestation matching degree of text information and entity not high Problem has achieved the purpose that farthest generation meets the comment that entity usually shows, and improves the matching degree of comment.

Embodiment 2

According to embodiments of the present invention, the embodiment of another document creation method is provided from the angle of display interface, needed It is noted that step shown in the flowchart of the accompanying drawings can be in the computer system of such as a group of computer-executable instructions Middle execution, although also, logical order is shown in flow charts, and it in some cases, can be to be different from herein Sequence executes shown or described step.

Fig. 4 is the method for another text generation according to an embodiment of the present invention, as shown in figure 4, this method includes as follows Step:

Instruction is chosen in step S402, reception, wherein chooses instruction for choosing target entity to be evaluated.

In a kind of optinal plan, above-mentioned selection instruction can also pass through touch by teacher by mouse clicking trigger Screen touches triggering；In a kind of optinal plan, above-mentioned target entity can be any to be evaluated for student, mechanism, company personnel etc. Object.

Step S404, display and the matched text of target entity, wherein object knowledge map of the text according to target entity Entity vector, attribute vector and the attribute value vector for the target entity determined generate, and object knowledge map comes from knowledge mapping Collection, knowledge mapping collection is for characterizing attribute value of at least one entity on preset attribute, entity vector, attribute vector and attribute It is worth vector to be characterized with triple vector.

In a kind of optinal plan, in a kind of above-mentioned optinal plan, above-mentioned entity can be student, mechanism, company personnel etc. The object of any need evaluation；For student, above-mentioned preset attribute can for classroom performance, self-image, it is social show, Emotion expression, week examine achievement, final grade etc., corresponding attribute value can for positive, clean and tidy, active, stable, big rise and fall, It is excellent etc.；For mechanism, above-mentioned preset attribute can be right for brand image, granted patent quantity, annual return, public and social interest etc. The attribute value answered can for influence it is big, be greater than 100,200,000,000, it is active etc.；The text generation model of above-mentioned generation text can be Deep neural network model.

By extracting entity information, attribute information and the attribute value information of target entity in object knowledge map, and will It is converted to the entity vector, attribute vector and attribute value vector for being easy to text generation model treatment, can greatly improve generation The matching degree of text.

It should be noted that deep neural network is the interdisciplinary study combined about mathematics and computer, with machine Study is different, and deep neural network is able to achieve the high latitude feature extraction of end-to-end data, is abstracted, and solves feature in machine learning It is difficult to the problem of extracting.Such as typical Seq2Seq model, generation confrontation network model etc..

Seq2Seq is the model of an Encoder-Deocder structure, and basic thought is to utilize two circulation nerve nets Network, one is used as encoder, and one is used as decoder, and the list entries of a variable-length is become regular length by encoder Vector, this vector can regard the semanteme of the sequence as, and decoder is by the vector decoding of this regular length at variable-length Output sequence；It generates in confrontation network (Generative Adversarial Networks, GAN) model and includes at least two Module, a generation model, the mutual Game Learning of a confrontation model, two models generate fairly good output.Therefore, on It states two kinds of deep neural network algorithms to apply in comment generation field, and robust more more accurate than machine learning method can be reached Effect.

In above-mentioned steps, terminal detect the click target entity from display interface choose instruction after, just It can be shown and the matched comment text of target entity in display interface.

It based on the scheme that the above embodiments of the present application provide, receives choose instruction first, wherein choose instruction for choosing Then target entity to be evaluated is shown and the matched text of target entity, wherein object knowledge of the text according to target entity Entity vector, attribute vector and the attribute value vector for the target entity that map is determined generate, and object knowledge map comes from knowledge Atlas, knowledge mapping collection for characterizing attribute value of at least one entity on preset attribute, entity vector, attribute vector and Attribute value vector is characterized with triple vector.Compared with prior art, the application is known using the usually performance foundation of multiple entities Know atlas, then therefrom extract the triple vector of object knowledge map, then deep learning algorithm is combined to generate comment.It should Scheme is by combining knowledge mapping and deep learning, so that deep learning algorithm is connected to all properties of entity, in turn It solves the text information shortage generated merely with deep learning algorithm in the related technology to comment the personalization of entity, leads to text This information and the not high technical problem of the practical manifestation matching degree of entity, reach farthest to generate and meet entity and usually show Comment purpose, improve the matching degree of comment.

Optionally, show that the above method can also include with before the matched text of target entity in execution step S404 Step S403 generates knowledge mapping collection, wherein the step of generating knowledge mapping collection can specifically include following steps:

Step S4032 constructs the planning layer of knowledge mapping collection, wherein planning layer includes at least: entity type, Attribute class Type and attribute Value Types.

Step S4034 obtains record information, wherein record information includes: category of at least one entity on preset attribute Property value.

Step S4036 generates knowledge mapping collection by record information input into planning layer.

In above-mentioned steps, by entity information, attribute information, the corresponding reality for being filled into the planning layer built of attribute value information In body type, attribute type and attribute Value Types, the knowledge mapping collection of all entities is constructed with this, and store and arrive graphic data base In Neo4j.

Optionally, before executing step S4036 and will record information input to planning layer, the above method can also include: Step S4035 pre-processes record information, the record information that obtains that treated, wherein pretreatment include it is following at least it One: entity extraction, attribute extraction, attribute value extracts and entity disambiguates.

Optionally, entity vector, attribute vector and the category for the target entity that object knowledge map is determined in step S404 Property value vector, can specifically include following steps:

Step S4041 extracts entity information, attribute information and the attribute value letter of the target entity in object knowledge map Breath.

Entity information is converted to boolean vector using preset algorithm by step S4042, using preset model by attribute information It is converted into high latitude numeric type vector with attribute value information, obtains triple vector.

In a kind of optinal plan, above-mentioned preset algorithm can be only hot algorithm, and above-mentioned preset model can be BERT model Or Word2Vector model.Wherein, BERT model is indicated with the alternating binary coding device of Transformer, is suitable for extensive task Most advanced model building.

Optionally, the step of generating text according to entity vector, attribute vector and attribute value vector in step S404 is specific It may comprise steps of:

Entity vector, attribute vector and attribute value vector are input in text generation model by step S4046, wherein text This generation model includes deep neural network model, and deep neural network model is trained according to triple sample and samples of text It arrives.

Step S4047 is generated and target entity matched text based on text generation model.

In above-mentioned steps, by entity vector V_ei, attribute vector V_piWith attribute value vector V_viIt is input to text generation model In, that is, produce the summing-up comment text y* about target entity.

In a kind of optinal plan, above-mentioned summing-up comment text y* is represented by output sequence y₁,…y_T’, wherein y_t’It indicates The output character at t ' moment, i.e.,

Optionally, entity vector, attribute vector and attribute value vector are input to text generation mould in execution step S4046 Before type, the above method can also include step S4045, generate text generation model, wherein generate the step of text generation model Suddenly may include:

Step S40451 obtains triple sample and samples of text.

Entity sample in triple sample is converted to boolean vector using preset algorithm, using pre- by step S40452 If attribute sample, the attribute value sample in triple sample are converted into high latitude numeric type vector by model, obtain triple to Measure sample.

Step S40453 generates model based on triple vector sample and samples of text training text, obtains trained Text generation model.

In an alternative embodiment, step S40453 is based on triple vector sample and samples of text training text is raw At model, trained text generation model is obtained, can specifically include following steps:

Step S404531 is obtained using the coder processes triple vector sample and samples of text for combining attention mechanism To context vector.

c_t'=f (h_t, y_t'-1, s_t'-1, c_t')

Step S404532 obtains text information using the decoder processes context vector for combining attention mechanism.

P(y_t'|y₁,...,y_t'-1,c_t')=g (y_t'-1,s_t',c_t')

Step S404533 is based on text information, carrys out training text generation model to minimize loss function.

Optionally, the preset algorithm in step S4042 and step S40452 is only hot algorithm, and preset model is BERT model Or Word2Vector model.

Through the above description of the embodiments, those skilled in the art can be understood that according to above-mentioned implementation The method of example can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but it is very much In the case of the former be more preferably embodiment.Based on this understanding, technical solution of the present invention is substantially in other words to existing The part that technology contributes can be embodied in the form of software products, which is stored in a storage In medium (such as ROM/RAM, magnetic disk, CD), including some instructions are used so that a terminal device (can be mobile phone, calculate Machine, server or network equipment etc.) execute method described in each embodiment of the present invention.

Embodiment 3

According to embodiments of the present invention, a kind of device of text generation is provided, Fig. 5 is the text according to the embodiment of the present application The schematic diagram of generating means.As shown in figure 5, the device 500 includes selecting module 502, determining module 504 and text generation module 506。

Wherein, selecting module 502, for concentrating the object knowledge map of selection target entity from knowledge mapping, wherein know Knowing atlas for characterizing attribute value of at least one entity on preset attribute, target entity is object to be evaluated；It determines Module 504, for determining the entity vector, attribute vector and attribute value vector of target entity based on object knowledge map, wherein Entity vector, attribute vector and attribute value vector are characterized with triple vector；Text generation module 506, for according to entity to Amount, attribute vector and attribute value vector generate and the matched text of target entity.

Optionally, above-mentioned apparatus can also include: map generation module, for concentrating selection target real from knowledge mapping Before the object knowledge map of body, knowledge mapping collection is generated, wherein map generation module includes: building module, is known for constructing Know the planning layer of atlas, wherein planning layer includes at least: entity type, attribute type and attribute Value Types；First obtains mould Block, for obtaining record information, wherein record information includes: attribute value of at least one entity on preset attribute；It will record Information input is into planning layer, and map generates submodule, for generating knowledge mapping collection.

Optionally, above-mentioned apparatus can also include: preprocessing module, for will record information input to planning layer it Before, record information is pre-processed, the record information that obtains that treated, wherein pretreatment includes at least one following: entity Extraction, attribute extraction, attribute value extracts and entity disambiguates.

Optionally it is determined that module includes: extraction module, the entity for extracting the target entity in object knowledge map is believed Breath, attribute information and attribute value information；First conversion module, for using preset algorithm by entity information be converted to boolean to Amount, is converted into higher-dimension numeric type vector for attribute information and attribute value information using preset model, obtains triple vector.

Optionally, text generation module includes: input module, is used for entity vector, attribute vector and attribute value vector It is input in text generation model, wherein text generation model includes deep neural network model, deep neural network model root It is obtained according to triple sample and samples of text training；Text generation submodule, for based on the generation of text generation model and target Entities Matching text.

Optionally, above-mentioned apparatus can also include: model generation module, for by entity vector, attribute vector and category Property value vector be input to before text generation model, generate text generation model, wherein model generation module includes: second to obtain Modulus block, for obtaining triple sample and samples of text；Second conversion module, for using preset algorithm by triple sample In entity sample be converted to boolean vector, using preset model by attribute sample, the attribute value sample standard deviation in triple sample Higher-dimension numeric type vector is converted to, triple vector sample is obtained；Training module, for being based on triple vector sample and text Sample training text generation model obtains trained text generation model.

Optionally, training module includes: coding module, for utilizing the coder processes triple for combining attention mechanism Vector sample and samples of text, obtain context vector；Decoder module, for utilizing the decoder processes for combining attention mechanism Context vector obtains text information；Training submodule trains text for being based on text information to minimize loss function This generation model.

Optionally, above-mentioned preset algorithm is only hot algorithm, and preset model is BERT model or Word2Vector model.

It should be noted that above-mentioned selecting module 502, determining module 504 and text generation module 506 correspond to embodiment Step S102 to step S106 in 1, three modules are identical as example and application scenarios that corresponding step is realized, but not It is limited to 1 disclosure of that of above-described embodiment.

Embodiment 4

According to embodiments of the present invention, the device of another text generation is provided, Fig. 6 is the text according to the embodiment of the present application The schematic diagram of this generating means.As shown in fig. 6, the device 600 includes receiving module 602 and display module 604.

Wherein, instruction is chosen for receiving in receiving module 602, wherein choose instruction for choosing target to be evaluated real Body；Display module 604, for showing and the matched text of target entity, wherein object knowledge figure of the text according to target entity The entity vector, attribute vector and attribute value vector for composing the target entity determined generate, and object knowledge map comes from knowledge graph Spectrum collection, knowledge mapping collection is for characterizing attribute value of at least one entity on preset attribute, entity vector, attribute vector and category Property value vector with triple vector characterize.

Optionally, above-mentioned apparatus can also include map generation module, in display and the matched text of target entity Before, knowledge mapping collection is generated, wherein map generation module may include: building module, for constructing the rule of knowledge mapping collection Draw layer, wherein planning layer includes at least: entity type, attribute type and attribute Value Types；First obtains module, for obtaining Information is recorded, wherein record information includes: attribute value of at least one entity on preset attribute；Map generates submodule, uses In information input will be recorded into planning layer, knowledge mapping collection is generated.

Optionally, above-mentioned apparatus can also include preprocessing module, be used for before it will record information input to planning layer, Record information is pre-processed, the record information that obtains that treated, wherein pretreatment includes at least one following: entity is taken out It takes, attribute extraction, attribute value extract and entity disambiguates.

It optionally, further include determining module in display module, for determining the reality of target entity according to object knowledge map Body vector, attribute vector and attribute value vector, wherein determining module may include: extraction module, for extracting object knowledge figure Entity information, attribute information and the attribute value information of target entity in spectrum；First conversion module, for using preset algorithm will Entity information is converted to boolean vector, using preset model by attribute information and attribute value information be converted into high latitude numeric type to Amount, obtains triple vector.

It optionally, further include text generation module in display module, for according to entity vector, attribute vector and attribute value Vector generates text, wherein text generation module may include: input module, be used for entity vector, attribute vector and attribute Value vector is input in text generation model, wherein text generation model includes deep neural network model, deep neural network Model is obtained according to triple sample and samples of text training；Text generation submodule, for being generated based on text generation model With target entity matched text.

Optionally, above-mentioned apparatus can also include model generation module, for by entity vector, attribute vector and attribute Value vector is input to before text generation model, generates text generation model, wherein model generation module may include: second Module is obtained, for obtaining triple sample and samples of text；Second conversion module, for using preset algorithm by triple sample Entity sample in this is converted to boolean vector, using preset model by attribute sample, the attribute value sample in triple sample It is converted into high latitude numeric type vector, obtains triple vector sample；Training module, for being based on triple vector sample and text This sample training text generation model obtains trained text generation model.

Optionally, training module may include: coding module, for utilizing the coder processes three for combining attention mechanism Tuple vector sample and samples of text, obtain context vector；Decoder module, for utilizing the decoder for combining attention mechanism Context vector is handled, text information is obtained；Training submodule is instructed for being based on text information with minimizing loss function Practice text generation model.

It should be noted that above-mentioned receiving module 602 and display module 604 correspond to the step S402 in embodiment 2 extremely Step S404, two modules are identical as example and application scenarios that corresponding step is realized, but are not limited to the above embodiments 2 Disclosure of that.

Embodiment 5

According to embodiments of the present invention, a kind of storage medium is provided, storage medium includes the program of storage, wherein in journey Equipment executes the document creation method in embodiment 1 or 2 where controlling storage medium when sort run.

Embodiment 6

According to embodiments of the present invention, a kind of processor is provided, processor is for running program, wherein run in program Document creation method in Shi Zhihang embodiment 1 or 2.

The serial number of the above embodiments of the invention is only for description, does not represent the advantages or disadvantages of the embodiments.

In the above embodiment of the invention, it all emphasizes particularly on different fields to the description of each embodiment, does not have in some embodiment The part of detailed description, reference can be made to the related descriptions of other embodiments.

In several embodiments provided herein, it should be understood that disclosed technology contents can pass through others Mode is realized.Wherein, the apparatus embodiments described above are merely exemplary, such as the division of the unit, Ke Yiwei A kind of logical function partition, there may be another division manner in actual implementation, for example, multiple units or components can combine or Person is desirably integrated into another system, or some features can be ignored or not executed.Another point, shown or discussed is mutual Between coupling, direct-coupling or communication connection can be through some interfaces, the INDIRECT COUPLING or communication link of unit or module It connects, can be electrical or other forms.

The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple On unit.It can some or all of the units may be selected to achieve the purpose of the solution of this embodiment according to the actual needs.

It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list Member both can take the form of hardware realization, can also realize in the form of software functional units.

If the integrated unit is realized in the form of SFU software functional unit and sells or use as independent product When, it can store in a computer readable storage medium.Based on this understanding, technical solution of the present invention is substantially The all or part of the part that contributes to existing technology or the technical solution can be in the form of software products in other words It embodies, which is stored in a storage medium, including some instructions are used so that a computer Equipment (can for personal computer, server or network equipment etc.) execute each embodiment the method for the present invention whole or Part steps.And storage medium above-mentioned includes: that USB flash disk, read-only memory (ROM, Read-Only Memory), arbitrary access are deposited Reservoir (RAM, Random Access Memory), mobile hard disk, magnetic or disk etc. be various to can store program code Medium.

The above is only a preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art For member, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications are also answered It is considered as protection scope of the present invention.

Claims

1. a kind of document creation method characterized by comprising

The object knowledge map of selection target entity is concentrated from knowledge mapping, wherein the knowledge mapping collection is for characterizing at least Attribute value of one entity on preset attribute, the target entity are object to be evaluated；

The entity vector, attribute vector and attribute value vector of the target entity are determined based on the object knowledge map, wherein The entity vector, attribute vector and attribute value vector are characterized with triple vector；

It is generated and the matched text of the target entity according to the entity vector, attribute vector and attribute value vector.

2. the method according to claim 1, wherein concentrating the target of selection target entity to know from knowledge mapping Before knowing map, the method also includes: generate the knowledge mapping collection, wherein generate the knowledge mapping Ji Buzhoubao It includes:

Construct the planning layer of the knowledge mapping collection, wherein the planning layer includes at least: entity type, attribute type and category Property Value Types；

Record information is obtained, wherein the record information includes: attribute value of at least one entity on preset attribute；

By the record information input into the planning layer, the knowledge mapping collection is generated.

3. according to the method described in claim 2, it is characterized in that, by the record information input to the planning layer it Before, the method also includes:

The record information is pre-processed, the record information that obtains that treated, wherein the pretreatment include it is following at least One of: entity extraction, attribute extraction, attribute value extracts and entity disambiguates.

4. the method according to claim 1, wherein determining the target entity based on the object knowledge map Entity vector, attribute vector and attribute value vector, comprising:

Extract entity information, attribute information and the attribute value information of the target entity in the object knowledge map；

The entity information is converted into boolean vector using preset algorithm, using preset model by the attribute information and described Attribute value information is converted into high latitude numeric type vector, obtains the triple vector.

5. the method according to claim 1, wherein according to the entity vector, attribute vector and attribute value to Amount generates and the matched text of the target entity, comprising:

The entity vector, attribute vector and attribute value vector are input in text generation model, wherein the text generation Model includes deep neural network model, and the deep neural network model is according to triple sample and samples of text trained It arrives；

It is generated and the matched text of the target entity based on the text generation model.

6. according to the method described in claim 5, it is characterized in that, by the entity vector, attribute vector and attribute value to Amount is input to before text generation model, the method also includes: generate the text generation model, wherein generate the text The step of this generation model includes:

Obtain the triple sample and samples of text；

The entity sample in the triple sample is converted into boolean vector using preset algorithm, it will be described using preset model Attribute sample, attribute value sample in triple sample are converted into high latitude numeric type vector, obtain triple vector sample；

Based on the triple vector sample and the samples of text training text generation model, trained text is obtained Generate model.

7. according to the method described in claim 5, it is characterized in that, being based on the triple vector sample and the samples of text The training text generation model, obtains trained text generation model, comprising:

Using triple vector sample described in the coder processes in conjunction with attention mechanism and the samples of text, context is obtained Vector；

Using context vector described in the decoder processes in conjunction with attention mechanism, text information is obtained；

Based on the text information, the text generation model is trained to minimize loss function.

8. a kind of document creation method characterized by comprising

Instruction is chosen in reception, wherein described to choose instruction for choosing target entity to be evaluated；

Display and the matched text of the target entity, wherein object knowledge map of the text according to the target entity Entity vector, attribute vector and the attribute value vector for the target entity determined generate, and the object knowledge map comes from Knowledge mapping collection, the knowledge mapping collection for characterizing attribute value of at least one entity on preset attribute, the entity to Amount, attribute vector and attribute value vector are characterized with triple vector.

9. a kind of text generating apparatus characterized by comprising

Selecting module, for concentrating the object knowledge map of selection target entity from knowledge mapping, wherein the knowledge mapping collection For characterizing attribute value of at least one entity on preset attribute, the target entity is object to be evaluated；

Determining module, for determining entity vector, attribute vector and the category of the target entity based on the object knowledge map Property value vector, wherein the entity vector, attribute vector and attribute value vector with triple vector characterize；

Text generation module, for being generated and the target entity according to the entity vector, attribute vector and attribute value vector Matched text.

10. a kind of storage medium, which is characterized in that the storage medium includes the program of storage, wherein run in described program When control the storage medium where equipment perform claim require 1 or 8 document creation method.

11. a kind of processor, which is characterized in that the processor is for running program, wherein right of execution when described program is run Benefit requires 1 or 8 document creation method.