CN106503035A

CN106503035A - A kind of data processing method of knowledge mapping and device

Info

Publication number: CN106503035A
Application number: CN201610825067.8A
Authority: CN
Inventors: 袁丽; 甘信军
Original assignee: Hisense Group Co Ltd
Current assignee: Hisense Group Co Ltd
Priority date: 2016-09-14
Filing date: 2016-09-14
Publication date: 2017-03-15

Abstract

A kind of data processing method of knowledge mapping and device is embodiments provided, methods described includes：Currently processed target entity is selected from other entities described；The first eigenvector of the initial solid is obtained, and, obtain the second feature vector of the target entity；According to the first eigenvector and second feature vector, the corresponding characteristic value of the target entity is calculated；For the eigenvalue of maximum, classification information and the relation information for updating the initial solid and other entities using its corresponding first eigenvector and second feature vector.In the embodiment of the present invention, the vectorization that is converted into of entity in knowledge mapping and relation information is represented, it is easy to classification information and the relation information prediction of entity, and further carry out more intelligent semantic analysis and process, semantic analysis is automated with processing, the structure of rule is not based on, manual maintenance cost is reduced, applicable scope is more extensive.

Description

A kind of data processing method of knowledge mapping and device

Technical field

The present invention relates to the technical field of data processing, more particularly to a kind of data processing method of knowledge mapping and Plant the data processing equipment of knowledge mapping.

Background technology

Knowledge mapping is also referred to as mapping knowledge domains, is referred to as knowledge domain visualization in books and information group or ken reflects Map is penetrated, is a series of a variety of figures of explicit knowledge's development process and structural relation, known with visualization technique description Know resource and its carrier, excavate, analyze, build, draw and explicit knowledge and connecting each other between them.

Specifically, knowledge mapping is by learning applied mathematics, graphics, Information Visualization Technology, information science etc. The theory of section is combined with the method such as method and meterological citation analysis, Co-occurrence Analysis, and is visually opened up using visual collection of illustrative plates The core texture of dendrography section, developing history, Disciplinary Frontiers and overall Knowledge framework reach the modern reason of Multidisciplinary Integration purpose By.Its complicated ken is shown by data mining, information processing, knowledge measure and graphic plotting.

With the development that knowledge mapping is studied, knowledge mapping can be good at assisting natural Language Processing and semantic analysis. But as accumulation of knowledge, the data volume of knowledge mapping increase, structure becomes increasingly complex, and will carry out accurate semantic analysis needs Carry out query logic constantly to add with rule and build.The message part disappearance when knowledge mapping builds, knowledge information be not full-time, It is extremely difficult loaded down with trivial details that completion knowledge mapping is carried out using rule.

Content of the invention

In view of the above problems, it is proposed that the embodiment of the present invention overcomes the problems referred to above or at least in part to provide one kind The data processing method dress of a kind of data processing method of the knowledge mapping for solving the above problems and accordingly a kind of knowledge mapping Put.

In order to solve the above problems, the embodiment of the invention discloses a kind of data processing method of knowledge mapping, described know Knowing collection of illustrative plates includes that initial solid and other entities, the initial solid and other entities have classification information and relation information, institute The method of stating includes：

Currently processed target entity is selected from other entities described；

The first eigenvector of the initial solid is obtained, and, obtain the second feature vector of the target entity；

According to the first eigenvector and second feature vector, the corresponding characteristic value of the target entity is calculated；

Determine the eigenvalue of maximum in the characteristic value；

For the eigenvalue of maximum, updated using its corresponding first eigenvector and second feature vector described initial The classification information and relation information of entity and other entities.

Preferably, the initial solid and other entities are respectively provided with corresponding term vector data, described from described other The step of selecting currently processed target entity in entity includes：

Using the term vector data of the initial solid, and, the term vector data of other entities calculate transfer general Rate value；

Judge the transition probability value whether more than the first predetermined threshold value；

When the transition probability value is more than the first predetermined threshold value, determine that the transition probability value other entities corresponding are Target entity.

Preferably, described vectorial according to the first eigenvector and the second feature, calculate the target entity pair The step of characteristic value that answers, includes：

According to the first eigenvector and second feature vector, the corresponding conditional probability value of the target entity is calculated；

Tire out and take advantage of the conditional probability value, obtain to tire out and takes advantage of conditional probability value；

Tired conditional probability value is taken advantage of to carry out operation of taking the logarithm, acquisition logarithm conditional probability value for described；

Add up the logarithm conditional probability value, obtains characteristic value.

Preferably, described for the eigenvalue of maximum, using its corresponding first eigenvector and second feature vector, The step of classification information and relation information for updating the initial solid and other entities, includes：

For the eigenvalue of maximum, its corresponding first eigenvector and second feature vector is obtained；

According to the first eigenvector and second feature vector, for the initial solid and other entities mark classification Information.

According to the first eigenvector and second feature vector, add relation letter for initial solid and other entities Breath.

Preferably, described according to the first eigenvector and second feature vector, for the initial solid and other The step of entity mark classification information, includes：

Using the first eigenvector and second feature vector of the initial solid and other entities, training first is classified Device；

Using first grader, the classification information of the initial solid and other entities is calculated；

The initial solid and other entities are marked using the classification information.

Preferably, described vectorial according to the first eigenvector and second feature, for initial solid and other entities The step of adding relation information includes：

Using the first eigenvector and second feature vector of the initial solid and other entities, training second is classified Device；

Using second grader, the relation information of the initial solid and other entities is calculated；

The relation information is added to the initial solid and other entities.

Preferably, it is characterised in that methods described also includes：

For off-peak characteristic value, its corresponding first eigenvector and second feature vector is updated.

Preferably, described determine in the characteristic value eigenvalue of maximum the step of include：

Record described vectorial according to the first eigenvector and second feature, the corresponding feature of the calculating target entity The execution number of times of the step of value；

Judge the execution number of times whether more than the second predetermined threshold value；

When the number of times is more than the second predetermined threshold value, the eigenvalue of maximum in the characteristic value is selected.

The embodiment of the invention also discloses a kind of data processing equipment of knowledge mapping, the knowledge mapping includes initial reality There is classification information and relation information, described device to include for body and other entities, the initial solid and other entities：

Target entity chosen module, for selecting currently processed target entity from other entities described；

First and second characteristic vector acquisition module, for obtaining the first eigenvector of the initial solid, and, obtain Take the second feature vector of the target entity；

Characteristic value calculating module, for according to the first eigenvector and second feature vector, calculating the mesh The corresponding characteristic value of mark entity；

Eigenvalue of maximum determining module, for determining the eigenvalue of maximum in the characteristic value；

Classification information and relation information update module, for being directed to the eigenvalue of maximum, using its corresponding first spy Levy classification information and relation information that vector sum second feature vector updates the initial solid and other entities.

Preferably, the target entity chosen module includes：

Transition probability value calculating sub module, for the term vector data using the initial solid, and, other realities described The term vector data of body, calculate transition probability value；

Whether the first predetermined threshold value judging submodule, for judging the transition probability value more than the first predetermined threshold value；

Target entity determination sub-module, for when the transition probability value is more than the first predetermined threshold value, determining described turning It is target entity to move probable value other entities corresponding.

Preferably, the characteristic value calculating module includes：

Conditional probability value calculating sub module, for according to the first eigenvector and second feature vector, calculating described The corresponding conditional probability value of target entity；

Tire out and take advantage of conditional probability value to obtain submodule, the conditional probability value is taken advantage of for tired, obtain to tire out and takes advantage of conditional probability value；

Logarithm conditional probability value obtains submodule, for for described tired take advantage of conditional probability value to carry out operation of taking the logarithm, obtain Obtain logarithm conditional probability value；

Characteristic value obtains submodule, for the logarithm conditional probability value that adds up, obtains characteristic value.

Preferably, first and second characteristic vector update module described includes：

First and second characteristic vector pickup submodule, for for the eigenvalue of maximum, extracting corresponding first Characteristic vector and second feature vector；

Classification information marks submodule, for according to the first eigenvector and second feature vector, for described first Beginning entity and other entities mark classification information.

Relation information adds submodule, for according to the first eigenvector and second feature vector, for initial reality Body and other entities add relation information.

Preferably, the classification information mark submodule includes：

First classifier training unit, for the first eigenvector and second using the initial solid and other entities Characteristic vector, trains the first grader；

Classification information computing unit, for using first grader, the calculating initial solid and other entities Classification information；

Classification information marks unit, for marking the initial solid and other entities using the classification information.

Preferably, the relation information adds submodule and includes：

Second classifier training unit, for the first eigenvector and second using the initial solid and other entities Characteristic vector, trains the second grader；

Relation information computing unit, for using second grader, the calculating initial solid and other entities Relation information；

Relation information adding device, for being added to the initial solid and other entities by the relation information.

Preferably, described device also includes：

First and second vectorial update module, for for off-peak characteristic value, update its corresponding fisrt feature to Amount and second feature vector.

Preferably, the eigenvalue of maximum determining module includes：

Number of times record sub module is executed, for recording vectorial, the meter according to the first eigenvector and second feature The execution number of times of the step of calculating the target entity corresponding characteristic value；

Whether the second predetermined threshold value judging submodule, for judging the execution number of times more than the second predetermined threshold value；

Eigenvalue of maximum chooses submodule, for when the number of times is more than the second predetermined threshold value, selecting the feature Eigenvalue of maximum in value.

The embodiment of the present invention includes advantages below：

In the embodiment of the present invention, currently processed target entity is selected from other entities the plurality of；Using described One characteristic vector and second feature vector, select the characteristic value of maximum, special using its corresponding first eigenvector and second Levy classification information and relation information that vector updates the initial solid and other entities multiple.In the embodiment of the present invention, will know The vectorization that is converted into for knowing entity (point) and relation information (side) in collection of illustrative plates is represented, by complicated network graphic knot in knowledge mapping Structure is mapped as the characteristic vector of low-dimensional and represents, is easy to classification information and the relation information prediction of entity, and further enters The more intelligent semantic analysis of row and process, semantic analysis is automated with processing, the structure of rule is not based on, and reduces artificial dimension Shield cost, applicable scope are more extensive.

Further, the embodiment of the present invention is vectorial according to the first eigenvector and second feature, for initial solid And other entities multiple add relation information and classification information, the entity and relation letter in knowledge mapping is represented using vectorization Breath, represents according to the vectorization of entity, can close with the classification information of auto-complete entity and to adding between two entities automatically It is information, greatly reduces the workload and maintenance cost of ground maintenance knowledge collection of illustrative plates.

Description of the drawings

The step of Fig. 1 is a kind of data processing method embodiment one of knowledge mapping of embodiment of the present invention flow chart；

Fig. 2 is a kind of schematic diagram of knowledge mapping of prior art；

The step of Fig. 3 is a kind of data processing method embodiment two of knowledge mapping of embodiment of the present invention flow chart；

Fig. 4 A are a kind of the first schematic diagrames of the target entity set of knowledge mapping of the embodiment of the present invention；

Fig. 4 B are a kind of the second schematic diagrames of the target entity set of knowledge mapping of the embodiment of the present invention；

Fig. 5 is a kind of structured flowchart of the data processing equipment embodiment of knowledge mapping of the embodiment of the present invention.

Specific embodiment

Understandable for enabling the above-mentioned purpose of the embodiment of the present invention, feature and advantage to become apparent from, below in conjunction with the accompanying drawings and The present invention is further detailed explanation for specific embodiment.

One of core idea of the present invention is that the entity in the knowledge mapping that will be given is mapped as the vectorization table of low-dimensional Show, then between entity or entity, add the classification information or relation letter of disappearance by the vector in original knowledge mapping The entity of knowledge mapping is represented with vectorization, complicated network graphic structure in knowledge mapping is reflected by breath, the embodiment of the present invention Penetrate the vectorization for low-dimensional to represent, and no-go gage is then represented, is easy to the classification information of entity and the prediction of relation information.

With reference to Fig. 1, show the embodiment of the present invention a kind of data processing method embodiment one of knowledge mapping the step of Flow chart, the knowledge mapping can include that initial solid and other entities, the initial solid and other entities have classification Information and relation information, specifically may include steps of：

Step 101, selectes currently processed target entity from other entities described；

Knowledge mapping is made up of entity (point) and relation information (side), and each point has corresponding property value, property value Classification information can be included, be connected by side between 2 points, the process for building knowledge mapping is exactly constantly will by redaction rule Point or side are added to knowledge mapping, knowledge mapping is constantly expanded.

For given knowledge mapping, by an entity of knowledge mapping, initial solid is chosen to be, is considered as starting point, knowledge Collection of illustrative plates can include multiple entities, in addition to initial solid, also multiple other entities being associated with initial solid.Knowledge mapping It is intended to describe various entities present in real world.Wherein, ID of each entity with a globally unique determination (identifier, identifier) is representing.And pass through characteristic vector presentation-entity in embodiments of the present invention.Real in knowledge mapping Body is connected by relation information, and each entity is respectively provided with corresponding classification information, and classification information is used for the inherence for portraying entity Characteristic, and relation information is used for connecting two entities, portrays the association between them.Knowledge mapping be also regarded as one huge Big figure, the node presentation-entity in figure, and the side in figure is then made up of relation information.

A kind of example of knowledge mapping in prior art with reference to shown in Fig. 2, knowledge mapping is with entity " Deng Chao " and " grandson Centered on pari ", other entities are the films and television programs relevant with " Deng Chao ", " grandson pari ".Assume to select Deng Chao for initial solid, table F (n) is shown as, the entity that other are connected with " Deng Chao " is other entities.For example, the classification information of entity " Deng Chao " is performer, real Relation information between body " Deng Chao " and entity " grandson pari " is man and wife.

As a kind of example of concrete application of the present invention, can pass through to calculate transition probability value, from other entities multiple Currently processed target entity is selected, by calculating transition probability value, currently processed target entity is selected.Transition probability value is Can be with the size of the weight of the contact between presentation-entity.Transition probability value can also be the value of other expression weight sizes, example Such as, transition probability matrix, adjacency matrix, certainly, above it is merely meant that the size of the weight of relation information between entity is shown Example, any transition probability value can be can serve as with the value of the size of the weight of the relation information between presentation-entity, the present invention Embodiment does not make specific restriction to this.

Step 102, obtains the first eigenvector of the initial solid, and, obtain the target entity second is special Levy vector；

In the embodiment of the present invention, in given knowledge mapping, all of entity gives an initial vector, this initially to Amount can be the numerical value of multidimensional, and numerical value is random imparting, it is impossible to the contact and architectural feature between presentation-entity, but by this After updating initial vector after the method for inventive embodiments, can be very good to represent the contact between different entities and architectural feature, Initial vector can include the first eigenvector of initial solid, or the second feature vector of target entity.The of initial solid One characteristic vector is the vector of dimension more than being manually set, and the second feature of target entity is vectorial to be equally manually set The vector of dimension more than one.

The application embodiment of the present invention, the initial solid in knowledge mapping can be defined as the fisrt feature of dimension more than Vector, dimension can be 100 dimensions, or 50 dimensions, and for example, the vectorization of the initial solid " Deng Chao " of definition represents that f (n) is First eigenvector, i.e. f (Deng Chao)=[0.543,0.381,0.328 ... 0.182], wherein, dimension is 100 dimension (fisrt feature Number in vector is 100), and assume target entity for " grandson pari ", then the second feature vector representation of target entity " grandson pari " For f (grandson pari)=[0.337,0.169,0.401 ... 0.403], wherein, dimension is 100 dimensions.

Certainly, the setting for dimension can be determined according to actual conditions by those skilled in the art that the present invention is to this It is not restricted.

Step 103, according to the first eigenvector and second feature vector, calculates the target entity corresponding Characteristic value；

In a kind of preferred embodiment of concrete application of the present invention, can according to the fisrt feature to and second feature to Amount, calculates the conditional probability value of the second feature vector of target entity under conditions of the first eigenvector with initial solid, Because one in knowledge mapping initial solid be generally connected with multiple target entities, therefore multiple different target entities can be obtained Conditional probability value, the method for design conditions probable value can be represented using softmax functional expressions, tired take advantage of the conditional probability Value, obtain multiple conditional probability values takes advantage of value, to the plurality of conditional probability value take advantage of value take the logarithm then add up obtain Characteristic value.

Step 104, determines the eigenvalue of maximum in the characteristic value；

The embodiment of the present invention is applied to, described vectorial according to the first eigenvector and second feature, calculating institute is recorded The execution number of times of the step of stating target entity corresponding characteristic value, it can be understood as obtain the number of characteristic value, judge described in hold Whether places number is more than the second predetermined threshold value, when the number of times is more than the second predetermined threshold value, selects in the characteristic value Eigenvalue of maximum.Wherein, the second predetermined threshold value be set to those skilled in the art according to actual conditions depending on, the present invention implement Example is not restricted to this.

Step 105, for the eigenvalue of maximum, is updated using its corresponding first eigenvector and second feature vector The classification information and relation information of the initial solid and other entities multiple.

In embodiments of the present invention, it is calculated after characteristic value via step 103, constantly adjustment fisrt feature can be passed through Vector and the every one-dimensional value of second feature vector, so that obtain different characteristic values.For each characteristic value, maximum is selected The corresponding first eigenvector of characteristic value and second feature vector, used as initial solid and the vector of other entities (target entity) Change and represent, because after all entity vectorizations are represented in knowledge mapping, if represent can be fine for the vectorization of all entities Expression relation information and architectural feature of the entity in knowledge mapping, at this moment, other adjacent entities around entity " Deng Chao " The conditional probability value that set occurs will be maximum.

Because characteristic value is obtained through certain operations by conditional probability, thus, definable to all entity n in collection of illustrative plates most The characteristic value of bigization, characteristic value can include target function value.When characteristic value be not maximum when, represent now corresponding first Characteristic vector and second feature vector are not optimal solutions.

At this time, it may be necessary to continue to adjust (increase is reduced) first eigenvector and the one or more dimensions in second feature vector Number, obtain different characteristic values, choose the maximum corresponding first eigenvector value of characteristic value and multiple second feature vector Value represents as the vectorization of entity, otherwise, updates in first eigenvector and second feature vector, and return according to first Characteristic vector and second feature vector, the step of calculate target entity corresponding characteristic value.

It should be noted that an iterations can be arranged in the embodiment of the present invention, select in iterations most Big characteristic value, it is ensured that the performance of hardware is without prejudice, the setting of iterations can be by those skilled in the art according to actual Situation determining, the invention is not limited in this regard.

Further, the eigenvalue of maximum in iterations is selected, corresponding first spy of maximum characteristic value is extracted Vector and second feature vector is levied, initial solid and in knowledge mapping is updated by first eigenvector and second feature vector more The classification information of other entities individual and relation information, using first eigenvector and second feature vector and initial solid and multiple The classification information of other entities and relation information, training the first grader and the second grader, update unknown classification information and Relation information.

It should be noted that in the embodiment of the present invention, two can be represented by the product between two characteristic vectors Relation information between individual entity, can also pass through mean value or the standardization of L1 norms or L2 norms rule between two characteristic vectors Generalized represents that the relation information between two entities, the embodiment of the present invention do not make any restriction to this.

In the embodiment of the present invention, currently processed target entity is selected from other entities multiple；Using fisrt feature to Amount and second feature vector, calculate the corresponding characteristic value of target entity；If characteristic value is not maximum, adjust its corresponding first After characteristic vector and second feature vector, return according to first eigenvector and second feature vector, calculate target entity corresponding Characteristic value the step of；The characteristic value of maximum is selected, is updated using its corresponding first eigenvector and second feature vector The classification information and relation information of initial solid and other entities multiple.In the embodiment of the present invention, by entity in knowledge mapping The vectorization that is converted into of (point) and relation information (side) is represented, is low-dimensional by complicated network graphic structure mapping in knowledge mapping Characteristic vector represent, be easy to classification information and the relation information prediction of entity, and further carry out more intelligent language Justice analysis and process, by semantic analysis and process automation, are not based on the structure of rule, reduce manual maintenance cost, applicable Scope more extensive.

With reference to Fig. 3, show the embodiment of the present invention a kind of data processing method embodiment two of knowledge mapping the step of Flow chart, knowledge mapping include that initial solid and other entities multiple, initial solid and other entities multiple have classification information And relation information, initial solid and other entities are respectively provided with corresponding term vector data, the substantially side of embodiment of the method two The extension of method embodiment one, specifically may include steps of：

Step 201, using the term vector data of the initial solid, and, the term vector data of other entities, meter Calculate transition probability value；

In the embodiment of the present invention, the term vector data of initial solid and the term vector data of other entities multiple are obtained, is pressed Multiple transition probability values are calculated according to specific formulation, term vector data can be obtained using language model training.Common method has N-gram models, maximum entropy Markov model etc., the embodiment of the present invention do not make any restriction to this.

Whether step 202, judge the transition probability value more than the first predetermined threshold value；

Wherein, the first predetermined threshold value can be the artificial any numerical value for arranging, and for example, the first predetermined threshold value could be arranged to 0, when transition probability value is more than 0, the operation of execution step 203.

Step 203, when the transition probability value be more than the first predetermined threshold value when, determine the transition probability value corresponding its Its entity is target entity；

Specifically, when there is transition probability value to be more than the first predetermined threshold value, transition probability value other realities corresponding are determined Body is target entity, so, just can determine the target entity for having particular association with initial solid.

Step 204, obtains the first eigenvector of the initial solid, and, obtain the target entity second is special Levy vector；

For reality, obtain default definition the first eigenvector of initial solid and the second feature of target entity to Amount, first eigenvector f (Deng Chao)=[0.543,0.381,0.328 ... 0.182], and assume that target entity is " grandson pari ", then Target entity " grandson pari " is expressed as second feature vector f (grandson pari)=[0.337,0.169,0.401 ... 0.403].

Step 205, described vectorial according to the first eigenvector and second feature, calculate the target entity corresponding Characteristic value；

In a kind of preferred embodiment of the embodiment of the present invention, described according to the first eigenvector and second feature to Amount, includes following sub-step the step of calculate the target entity corresponding characteristic value：

Sub-step S2051, according to the first eigenvector and second feature vector, calculates the target entity pair The conditional probability value that answers；

Sub-step S2052, tires out and takes advantage of the conditional probability value, obtains to tire out and takes advantage of conditional probability value；

Sub-step S2053, tired takes advantage of conditional probability value to carry out operation of taking the logarithm, acquisition logarithm conditional probability value for described；

Sub-step S2054, add up the logarithm conditional probability value, obtains characteristic value.

In concrete application, according to fisrt feature to and second feature vector, calculate with the fisrt feature of initial solid to Under conditions of amount target entity second feature vector conditional probability value because one in knowledge mapping initial solid usual It is connected with multiple target entities, therefore the conditional probability value of multiple different target entities can be obtained.

Further, tire out and take advantage of conditional probability value, obtain a conditional probability value takes advantage of value, and the value of taking advantage of of conditional probability value is taken Logarithm is then cumulative, can obtain characteristic value, and wherein, characteristic value can include target function value.

Step 206, determines the eigenvalue of maximum in the characteristic value；

In a kind of preferred embodiment of the embodiment of the present invention, the sub-step for determining the eigenvalue of maximum in the characteristic value Suddenly include:

Sub-step S2061, records described vectorial according to the first eigenvector and second feature, the calculating target reality The execution number of times of the step of body corresponding characteristic value；

Whether sub-step S2062, judge the execution number of times more than the second predetermined threshold value；

Sub-step S2063, when the number of times is more than the second predetermined threshold value, selects the maximum feature in the characteristic value Value.

Specifically, the execution number of times of recording step 204, when number of times is executed more than Second Threshold, it is possible to obtain multiple Characteristic value, selects eigenvalue of maximum from multiple characteristic values, carries out the operation of next step.

Step 207, for the eigenvalue of maximum, obtains its corresponding first eigenvector and second feature vector；

Step 208, according to the first eigenvector and second feature vector, for the initial solid and other entities Mark classification information；

In the embodiment of the present invention, according to first eigenvector and second feature vector, for initial solid and other entities The step of mark classification information, includes：

Sub-step S2081, using the first eigenvector and second feature vector of the initial solid and other entities, instruction Practice the first grader；

Sub-step S2081, using first grader, calculates the classification information of the initial solid and other entities；

Sub-step S2083, marks the initial solid and other entities using the classification information.

Wherein, the first grader, can be decision tree, logistic regression, naive Bayesian, neutral net scheduling algorithm etc., make With the first classifier training initial solid and the known classification information in other entities, can calculate initial solid and other The classification information of entity.

Step 209, according to the first eigenvector and second feature vector, adds for initial solid and other entities Relation information.

In the embodiment of the present invention, according to first eigenvector and second feature vector, for initial solid and other entities The step of adding relation information includes：

Sub-step S2091, using the first eigenvector and second feature vector of the initial solid and other entities, instruction Practice the second grader；

Sub-step S2092, using second grader, calculates the relation information of the initial solid and other entities；

The relation information is added to the initial solid and other entities by sub-step S2093.

Wherein, the second grader, can be decision tree, logistic regression, naive Bayesian, neutral net scheduling algorithm etc., this Inventive embodiments are not intended to be limited in any.Known relation letter in using the second classifier training initial solid and other entities Breath, can calculate the relation information of initial solid and other entities.Initial reality relation information being added in knowledge mapping Body and other entities.

In a kind of preferred embodiment of the embodiment of the present invention, methods described also includes the steps：

Step S11, record described return described according to the first eigenvector and second feature vector, calculate the mesh The execution number of times of the step of mark entity corresponding characteristic value；

Whether step S12, judge the execution number of times more than the second predetermined threshold value；

Step S13, when the number of times is more than the second predetermined threshold value, returns using its corresponding first eigenvector and the The step of classification information and relation information of the two characteristic vectors renewal initial solid and other entities.

Wherein, the second predetermined threshold value can be the artificial iterations for arranging, and for example, execute calculating target entity corresponding The number of times of characteristic value is 1,000,000 times, then stop computing afterwards at 1,000,000 times, choose maximum characteristic value, extract corresponding first special Vectorization during vector and second feature vector are levied as knowledge mapping is represented.

In the embodiment of the present invention, using the term vector data of initial solid, and, the term vector data of other entities, meter Calculate transition probability value；When transition probability value is more than the first predetermined threshold value, determine that transition probability value other entities corresponding are target Entity；For eigenvalue of maximum, corresponding first eigenvector and second feature vector is extracted；According to first eigenvector and Second feature vector, for initial solid and other entities mark classification information, further, according to first eigenvector and the Two characteristic vectors, are added relation information for initial solid and other entities, are represented the entity in knowledge mapping using vectorization And relation information, represent according to the vectorization of entity, with the classification information of auto-complete entity and can give between two entities certainly The dynamic workload and maintenance cost that adds relation information, greatly reduce ground maintenance knowledge collection of illustrative plates.

For making those skilled in the art be better understood from the embodiment of the present invention, carry out below by way of a specific example Explanation.

First, the building process of object function during the vectorization of knowledge mapping is represented

Knowledge mapping is carried out vectorization expression, the embodiment of the present invention proposes the maximized target of structural environment probable value Function, by taking concrete collection of illustrative plates as an example.

With reference to Fig. 2, a kind of knowledge mapping of the prior art is shown, a part for video display knowledge mapping as shown in Figure 2, It can be seen that the relation and structure between Deng Chao, grandson pari and films and television programs, when the entity in figure is quantified expression, is still to imply and knows Know the information and feature of collection of illustrative plates.In scheming as a example by " Deng Chao " this entity, when in collection of illustrative plates, all of entity is all quantified expression, That is entity " Deng Chao " f (Deng Chao) carries out vectorization expression, and entity " grandson pari " carries out vectorization expression etc. with f (grandson pari), if The vectorization of each entity represents the architectural feature and relation information that can embody each entity well in collection of illustrative plates, then logical The vectorization for crossing each entity represents that the conditional probability value that the target entity calculated around Deng Chao occurs should reach maximum, i.e., Mermaid, Sun Li etc. its neighbouring target entity set N (Deng Chao), after in collection of illustrative plates, all entity vectorizations are represented, if all The vectorization of entity represents and can be good at expressing architectural feature of the entity in collection of illustrative plates, then entity " Deng Chao " surrounding objects entity The conditional probability value P (N (Deng Chao) | f (Deng Chao)) that set occurs will be maximum.Thus, definable is to all entity n in collection of illustrative plates Maximization target function value (characteristic value) max_f∑_n∈Vlog P(N(n)|f(n)).And the target entity around entity " Deng Chao " Between separate, i.e., mermaid and Chinese partner etc. are separate, so all targets around entity " Deng Chao " The conditional probability value that entity occurs is separate, then have

The conditional probability value that certain target entity of entity " Deng Chao " occurs can be taken advantage of with their respective characteristic vector entities Softmax function representations.For example,

General given knowledge mapping G=(V, E), all of entity in V representative graphs, all of relation letter in E representative graphs Breath.The skip-gram models of similar term vector, the probability of occurrence of a word are related to the word of its context.In figure is calculated During the character representation of entity, object function is maximized according to the conditional probability value definition that the target entity set of an entity occurs (function of characteristic value)：

Wherein f (n) is that the vectorization of entity n is represented, its dimension is d, maximizes characteristic value (target by the training of model Functional value) adjusting the parameter of f (n).So, vectorization represents that model just has | V a | × d parameter to need to estimate.N (n) is The target entity set of entity n.P (N (n) | f (n)) it is the target of entity n when all entities are quantified expression in collection of illustrative plates The conditional probability value that entity sets occurs.If the vectorization of all entities is represented to can be good at expressing each entity and is existed in collection of illustrative plates Relation and architectural feature in collection of illustrative plates, then in V, the conditional probability value of the target entity set of all entities is up to maximum, that is, go up State characteristic value (target function value) and reach maximization.

2nd, the vectorization of the entity of knowledge mapping is represented

Assume the given knowledge mapping such as Fig. 2, the process for carrying out knowledge mapping vectorization expression is as follows：

1. initialization first eigenvector and second feature are vectorial

Characteristic vector parameter f (n) of all entities in random initializtion collection of illustrative plates, vectorization representation dimension d are set to 100, N N () target entity set sizes are k, train iterations to be defined as iterations, all entities in random initializtion collection of illustrative plates 100 dimensional vectors represent.

In collection of illustrative plates the vectorization of all entities represent can random initializtion be 100 dimensions vector：

First eigenvector f (Deng Chao)=[0.543,0.381,0.328 ... 0.182]

Second feature vector f (grandson pari)=[0.337,0.169,0.401 ... 0.403]

Second feature vector ...

2. the acquisition of the target entity set in knowledge mapping in all entities

Modal graph search algorithm is BFS (BFS) and depth-first search (DFS).But entered with BFS The collection of row target entity is easily caused repeated sampling, and in figure, major part is not traversed.And the collection that carries out target entity with DFS is easy Cause to sample the physical distance source entity for obtaining too far, and lose representativeness.The embodiment of the present invention using a kind of comprehensive BFS and The method of sampling of DFS.

In reference picture 4A and Fig. 4 B, the size for defining target entity set is k, and N (n) sizes i.e. are k.Close in collection of illustrative plates It is that the transition probability value of information is sampled entity v__(i-1)Transition probability P (v to next entity v_i_i|v_i-1), it is defined as two The normalization term vector similarity of individual word, the acquisition of term vector model are obtained by building the big language material training of knowledge mapping, Transition probability value is bigger, and the degree of correlation for representing the two entities is higher, and the representativeness of the target entity is stronger, i.e., to the target reality The probability of body transfer is bigger.The embodiment of the present invention represents transition probability using cosine similarity, and the W in formula is for normalizing Transition probability value：

For example, in figure to be obtained between Deng Chao and grandson pari relation information transition probability, the term vector of Deng Chao is c (Deng Chao) =[0.500,0.249,0.069 ... 0.325], term vector c (grandson pari)=[0.196,0.121,0.207 ... 0.843] of grandson pari, Substitute into above formula

Transition probability between passing Deng Chao the and Mi months, as two inter-entity do not have relation information, then

P (the Deng Chao │ Mi months pass)=0

In the same manner, the transition probability of all relation informations in collection of illustrative plates can be calculated.

Target entity set N (n) sampling flow process：

(1) from initial solid n, the transition probability value according to every relation information of entity output carries out multinomial at random Profile samples obtain next entity, between two such entity transition probability sample more greatly the entity possibility bigger.

(2) from current entity, the sampling for carrying out a new round obtains next entity, note that and is transferred in N (n) The entity of the sampling for having existed, the entity are not counted, and continue repeated sampling process from the entity.

(3) repeat this process until sampling k entity, that is, obtain target entity set N (n).

In reference picture 4A and Fig. 4 B, two kinds of different target entity set in the embodiment of the present invention are shown, wherein, close Be information transition probability value be carried out according to the cosine similarity of two words of term vector model initialized, as shown in Figure 4 A, From entity, " Deng Chao "s, sample possible traversal order for Deng Chao → grandson pari → Zhen biography → grandson pari using random multinomial distribution → scoundrel angel → Deng Chao → mermaid, then N (Deng Chao)={ grandson pari, Zhen Chuan, scoundrel angel, mermaid }.Shown in Fig. 4 B, Same from Deng Chao in next iteration, formed a partnership for Deng Chao → China using the possible traversal order of random multinomial distribution sampling People → Deng Chao → mermaid → Deng Chao → grandson pari → Mi months pass, then N (Deng Chao)={ Chinese partner, mermaid, Sun Li, Mi month Pass }.Carry out iterations iteration sampling always, obtain the sampled result of iterations difference N (Deng Chao)；In the same manner, The sampled result of iterations difference N (n) of other all entities in collection of illustrative plates can be obtained.

3. the vectorization of adjustment entity represents that f (n) parameters maximize characteristic value (target function value)

Target entity set N (n) that each entity is obtained by 2 samplings, calculates the bar of the target entity set of correspondent entity Part probable value P (N (n) | f (n)), and then characteristic value (target function value) is obtained, using stochastic gradient descent (Stochastic Gradient Descent, SGD) algorithm carries out the vectorization of all entities in iterations collection of illustrative plates and represents the iteration of parameter Adjustment and optimize, maximize characteristic value (target function value), make f (n) can architectural features of the presentation-entity n in collection of illustrative plates with adjacent Relationship characteristic between the entity of domain.

By taking entity " Deng Chao " as an example, it is assumed that in an iteration, the target entity collection obtained using 2 samplings is combined into N (Deng Super)={ grandson pari, Zhen Chuan, scoundrel angel, mermaid }, calculate the conditional probability value of each entity in target entity set such as Under：

The conditional probability value of computational entity " Deng Chao " target entity set is as follows：

In the same manner, the conditional probability value P (N (n) | f (n)) of the target entity set of other entities n in collection of illustrative plates can be calculated.

And then obtain object function (function of characteristic value) ∑_n∈Vlog P(N(n)|f(n)).

The vectorization that all entity n in adjustment collection of illustrative plates are continued to optimize using stochastic gradient descent method SGD represents that f (n) makes spy Value indicative (target function value) is maximized.

Carry out iterations iteration optimization and obtain final knowledge mapping vectorization representing model, extract maximum mesh The corresponding first eigenvector of offer of tender numerical value and the vectorial vectorization as entity (initial solid or other entities) of second feature Represent, the completion of classification information or adding for relation information that entity is carried out using the first eigenvector and second feature vector Plus.

Further, complete to optimize knowledge mapping vectorization and represent that the vectorization of model each entity is represented, obtain each After the characteristic vector of entity is represented, the relation information between two entities (u, v) can turn to e (u, v)=f (u) f (v) with vector And/or e (u, v)=(f (u)-f (v))/2 and/or e (u, v)=| f (u)-f (v) | and/or e (u, v)=| f (u)-f (v) | ^2.

It should be noted that for embodiment of the method, in order to be briefly described, therefore which to be all expressed as a series of action group Close, but those skilled in the art should know, the embodiment of the present invention is not limited by described sequence of movement, because according to According to the embodiment of the present invention, some steps can be carried out using other orders or simultaneously.Secondly, those skilled in the art also should Know, embodiment described in this description belongs to preferred embodiment, the involved action not necessarily present invention is implemented Example is necessary.

With reference to Fig. 5, a kind of structural frames of the data processing equipment embodiment of knowledge mapping of the embodiment of the present invention are shown Figure, the knowledge mapping include that initial solid and other entities multiple, the initial solid and other entities multiple have classification Information and relation information, specifically can include such as lower module：

Target entity chosen module 301, for selecting currently processed target entity from other entities described；

First and second characteristic vector acquisition module 302, for obtaining the first eigenvector of the initial solid, with And, obtain the second feature vector of the target entity；

Characteristic value calculating module 303, for according to the first eigenvector and second feature vector, calculating the target The corresponding characteristic value of entity；

Eigenvalue of maximum determining module 304, for determining the eigenvalue of maximum in the characteristic value；

Classification information and relation information update module 305, for be directed to the eigenvalue of maximum, using its corresponding first Characteristic vector and second feature vector update classification information and the relation information of the initial solid and other entities.

Preferably, the target entity chosen module includes：

Preferably, it is characterised in that the characteristic value calculating module includes：

Preferably, the classification information mark submodule includes：

Preferably, the relation information adds submodule and includes：

In a kind of preferred embodiment of the embodiment of the present invention, described device also includes：

Preferably, the eigenvalue of maximum determining module includes：

For device embodiment, due to itself and embodiment of the method basic simlarity, so description is fairly simple, related Part is illustrated referring to the part of embodiment of the method.

Each embodiment in this specification is described by the way of going forward one by one, what each embodiment was stressed be with The difference of other embodiment, between each embodiment identical similar part mutually referring to.

Those skilled in the art are it should be appreciated that the embodiment of the embodiment of the present invention can be provided as method, device or calculate Machine program product.Therefore, the embodiment of the present invention can adopt complete hardware embodiment, complete software embodiment or combine software and The form of the embodiment of hardware aspect.And, the embodiment of the present invention can adopt one or more wherein include computer can With in the computer-usable storage medium (including but not limited to magnetic disc store, CD-ROM, optical memory etc.) of program code The form of the computer program of enforcement.

The embodiment of the present invention is with reference to method according to embodiments of the present invention, terminal device (system) and computer program The flow chart and/or block diagram of product is describing.It should be understood that can be by computer program instructions flowchart and/or block diagram In each flow process and/or square frame and flow chart and/or the flow process in block diagram and/or square frame combination.These can be provided Computer program instructions are set to all-purpose computer, special-purpose computer, Embedded Processor or other programmable data processing terminals Standby processor is producing a machine so that held by the processor of computer or other programmable data processing terminal equipments Capable instruction is produced for realization in one flow process of flow chart or one square frame of multiple flow processs and/or block diagram or multiple square frames The device of the function of specifying.

These computer program instructions may be alternatively stored in and can guide computer or other programmable data processing terminal equipments In the computer-readable memory for working in a specific way so that the instruction being stored in the computer-readable memory produces bag The manufacture of command device is included, the command device is realized in one side of one flow process of flow chart or multiple flow processs and/or block diagram The function of specifying in frame or multiple square frames.

These computer program instructions can be also loaded in computer or other programmable data processing terminal equipments so that On computer or other programmable terminal equipments execute series of operation steps to produce computer implemented process, so as to The instruction executed on computer or other programmable terminal equipments is provided for realization in one flow process of flow chart or multiple flow processs And/or specify in one square frame of block diagram or multiple square frames function the step of.

Although having been described for the preferred embodiment of the embodiment of the present invention, those skilled in the art once know base This creative concept, then can make other change and modification to these embodiments.So, claims are intended to be construed to Including preferred embodiment and fall into the had altered of range of embodiment of the invention and change.

Finally, in addition it is also necessary to explanation, herein, such as first and second or the like relational terms be used merely to by One entity or operation are made a distinction with another entity or operation, and are not necessarily required or implied these entities or operation Between exist any this actual relation or order.And, term " including ", "comprising" or its any other variant meaning Covering including for nonexcludability, so that a series of process, method, article or terminal device including key elements is not only wrapped Those key elements, but also other key elements including being not expressly set out are included, or is also included for this process, method, article Or the key element that terminal device is intrinsic.In the absence of more restrictions, by wanting that sentence "including a ..." is limited Element, it is not excluded that also there is other identical element in process, method, article or the terminal device for including the key element.

Above to a kind of method provided by the present invention and a kind of device, it is described in detail, tool used herein Body example is set forth to principle of the invention and embodiment, and the explanation of above example is only intended to help and understands this Bright method and its core concept；Simultaneously for one of ordinary skill in the art, according to the thought of the present invention, concrete real Apply and will change in mode and range of application, in sum, this specification content should not be construed as the limit to the present invention System.

Claims

1. a kind of data processing method of knowledge mapping, it is characterised in that the knowledge mapping includes initial solid and other realities There is classification information and relation information, methods described to include for body, the initial solid and other entities：

Currently processed target entity is selected from other entities described；

Determine the eigenvalue of maximum in the characteristic value；

For the eigenvalue of maximum, the initial solid is updated using its corresponding first eigenvector and second feature vector And classification information and the relation information of other entities.

2. method according to claim 1, it is characterised in that the initial solid and other entities are respectively provided with corresponding Term vector data, described include the step of select currently processed target entity from other entities described：

Using the term vector data of the initial solid, and, the term vector data of other entities calculate transition probability Value；

3. method according to claim 1 and 2, it is characterised in that described according to the first eigenvector and described Two characteristic vectors, include the step of calculate the target entity corresponding characteristic value：

4. method according to claim 1, it is characterised in that described for the eigenvalue of maximum, corresponding using which First eigenvector and second feature vector, update the classification information and the step of relation information of the initial solid and other entities Suddenly include：

According to the first eigenvector and second feature vector, for the initial solid and other entities mark classification letter Breath.

According to the first eigenvector and second feature vector, add relation information for initial solid and other entities.

5. method according to claim 4, it is characterised in that described according to the first eigenvector and second feature to Amount, includes the step of marking classification information for the initial solid and other entities：

Using the first eigenvector and second feature vector of the initial solid and other entities, the first grader is trained；

6. method according to claim 4, it is characterised in that described according to the first eigenvector and second feature to Amount, includes the step of adding relation information for initial solid and other entities：

Using the first eigenvector and second feature vector of the initial solid and other entities, the second grader is trained；

The relation information is added to the initial solid and other entities.

7. method according to claim 1, it is characterised in that methods described also includes：

8. method according to claim 1, it is characterised in that the step of the eigenvalue of maximum in the determination characteristic value Suddenly include：

Record described according to the first eigenvector and second feature vector, calculate the corresponding characteristic value of the target entity The execution number of times of step；

9. a kind of data processing equipment of knowledge mapping, it is characterised in that the knowledge mapping includes initial solid and other realities There is classification information and relation information, described device to include for body, the initial solid and other entities：

First and second characteristic vector acquisition module, for obtaining the first eigenvector of the initial solid, and, obtain institute State the second feature vector of target entity；

Characteristic value calculating module, for according to the first eigenvector and second feature vector, calculating the target reality The corresponding characteristic value of body；

Classification information and relation information update module, for be directed to the eigenvalue of maximum, using its corresponding fisrt feature to Amount and second feature vector update classification information and the relation information of the initial solid and other entities.

10. device according to claim 9, it is characterised in that the target entity chosen module includes：

Transition probability value calculating sub module, for the term vector data using the initial solid, and, other entities Term vector data, calculate transition probability value；

Target entity determination sub-module, for when the transition probability value is more than the first predetermined threshold value, determining that the transfer is general It is target entity that rate is worth other entities corresponding.

11. devices according to claim 9 or 10, it is characterised in that the characteristic value calculating module includes：

Conditional probability value calculating sub module, for according to the first eigenvector and second feature vector, calculating the target The corresponding conditional probability value of entity；

Logarithm conditional probability value obtains submodule, for for described tired take advantage of conditional probability value to carry out operation of taking the logarithm, acquisition is right Said conditions probable value；

12. devices according to claim 9, it is characterised in that first and second characteristic vector update module described includes：

First and second characteristic vector pickup submodule, for for the eigenvalue of maximum, extracting corresponding fisrt feature Vector sum second feature vector；

Classification information marks submodule, for according to the first eigenvector and second feature vector, for the initial reality Body and other entities mark classification information.

Relation information adds submodule, for according to the first eigenvector and second feature vector, for initial solid and Other entities add relation information.

13. devices according to claim 12, it is characterised in that the classification information mark submodule includes：

First classifier training unit, for first eigenvector and second feature using the initial solid and other entities Vector, trains the first grader；

Classification information computing unit, for using first grader, calculating the classification of the initial solid and other entities Information；

14. devices according to claim 12, it is characterised in that the relation information adds submodule to be included：

Second classifier training unit, for first eigenvector and second feature using the initial solid and other entities Vector, trains the second grader；

Relation information computing unit, for using second grader, calculating the relation of the initial solid and other entities Information；

15. devices according to claim 9, it is characterised in that described device also includes：

First and second vectorial update module, for for off-peak characteristic value, update its corresponding first eigenvector and Second feature vector.

16. methods according to claim 9, it is characterised in that the eigenvalue of maximum determining module includes：

Number of times record sub module is executed, described vectorial according to the first eigenvector and second feature for recording, calculate institute The execution number of times of the step of stating target entity corresponding characteristic value；

Eigenvalue of maximum chooses submodule, for when the number of times is more than the second predetermined threshold value, selecting in the characteristic value Eigenvalue of maximum.