CN109241291A

CN109241291A - Knowledge mapping optimal path inquiry system and method based on deeply study

Info

Publication number: CN109241291A
Application number: CN201810791353.6A
Authority: CN
Inventors: 黄震华
Original assignee: South China Normal University
Current assignee: South China Normal University
Priority date: 2018-07-18
Filing date: 2018-07-18
Publication date: 2019-01-18
Anticipated expiration: 2038-07-18
Also published as: CN109241291B

Abstract

The invention proposes a kind of knowledge mapping optimal path inquiry methods based on deeply study, including two modules, respectively module one and module two, the module one is knowledge mapping optimal path model off-line training module, module two is knowledge mapping optimal path model application on site module, the knowledge mapping optimal path model off-line training module is equipped with deeply and learns component, make the training study of deeply to current entity, obtain next entity, current entity repetition training study is made with next entity again, obtain optimal path model, the optimal path model that module one obtains is input to by starting entity and target entity again, finally obtain optimal path, invention increases the generalization abilities of model, improve accuracy in computation, logical construction of the invention is clear, calculation is flexible, especially intensified learning with Deep learning can improve operation efficiency with distributed computing.

Description

Knowledge mapping optimal path inquiry system and method based on deeply study

Technical field

The present invention relates to computer fields, and in particular to it is a kind of based on deeply study knowledge mapping optimal path look into Ask system and method.

Background technique

Knowledge mapping (Knowledge Graph) is intended to describe and portray various entities present in real world (Entity) relationship (Relation) and between entity, usually come tissue and is indicated with digraph, and the node in figure indicates Entity, and side is then made of relationship, relationship is used to connect two entities, and whether portray between them has described in the relationship Relevance；If illustrating relevant property between them there are a line between two entities, otherwise indicating no relevance.In reality In the application of border, to the additional numerical value added between one 0~1 of each entity relationship (i.e. each edge of figure) in knowledge mapping, instead The correlation degree between entity is reflected；According to different application demands, the numerical value can indicate confidence level, tightness, distance or Cost etc., therefore this knowledge mapping is referred to as probabilistic knowledge map.

Optimal path inquiry retrieves knowledge mapping field the relationship between two entities between probabilistic knowledge map entity It is extremely important, is Knowledge Extraction, entity retrieval, relationship between the knowledge mapping network optimization and knowledge mapping entity One of core technologies of applications such as analysis.Data query and retrieval type for this complexity, need a kind of effective number It precisely could be effectively calculated required for user according to organizational form and efficient inquiry processing method as a result, therefore, improving Search efficiency and reduce processing cost be highly desirable be also it is extremely challenging.The topological structure of probabilistic knowledge map is to add Weigh digraph.

Currently, the figure optimal path inquiry method of mainstream has dijkstra's algorithm, Floyd algorithm and Bellman-Ford Algorithm etc..However, with the arrival of big data era, the search efficiency of these methods can no longer meet people it is acceptable when Between the memory space that can accommodate of range and machine, they for solve data volume greatly optimal road inquired it is incompetent For power.

And it has now been found that, large-scale data network this for probabilistic knowledge map, if it is desired to query time is reduced, Often using the strategy traded space for time, the higher query result of enquiry frequency is stored, the side Landmaeks-BFS Method sorts according to enquiry frequency of the user to probabilistic knowledge map entity, by the optimal path beta pruning between common entity, real Optimal path between body is stored in set, and this method reduces search space, but has ignored point of node in a network Property is dissipated, inquiry accuracy rate is not high.In addition, there are also acceleration technique is used on inquiry data prediction, such as based on two-way Parallel query method, the querying method based on goal directed and the querying method based on layering of search.These technologies are being looked into It askes and meets requirement in efficiency, however, since some intermediate points have been given up in beta pruning, so declined in query accuracy, If beta pruning is improper to may cause inquiry less than shortest path, if beta pruning is very few between two o'clock, being easy to degenerate is width First search, time efficiency is low and poor expandability.It is difficult to the shortest path in accurately inquiry probabilistic knowledge map Need to reach over time and space a balance, it is difficult to should guarantee that query time meets the requirement of user, also to guarantee to look into Ask quality.

Summary of the invention

The present invention in order to overcome at least one of the drawbacks of the prior art described above (deficiency), it is high, general to provide a kind of accuracy Optimal path inquiry method between the probabilistic knowledge map entity that change ability is strong, speed is fast and is easy to extend.

In order to solve the above technical problems, technical scheme is as follows:

A kind of knowledge mapping optimal path inquiry system based on deeply study, including two modules, respectively mould Block one and module two, the module one be knowledge mapping optimal path model off-line training module, module two be knowledge mapping most Shortest path model application on site module, the knowledge mapping optimal path model off-line training module are equipped with deeply study portion Part, the training for making deeply to current entity learn, and obtain next entity, then make current entity repetition training with next entity Study obtains optimal path model, then will originate entity and be input to the optimal path model that module one obtains with target entity, most Obtain optimal path eventually, by being used cooperatively between two modules, reach that accuracy is high, generalization ability is strong, speed is fast and It is easy to the purpose extended.

Further, the deeply study component is made of encoder, network components and logistic regression component, the net Network component includes transition components and training assembly, and the transition components include CNN neural network and FC neural network, the training Component includes intensified learning Policy strategy network and intensified learning value value network.

Further, the intensified learning Policy network is using the five layers of neural network connected entirely composition, intensified learning The preceding four node layers number of Policy neural network reduces step by step, and layer 5 has k neuron, intensified learning Policy nerve net The first layer and the second layer and the second layer and third layer of network, which are all made of dropout technology, prevents over-fitting, and activation primitive uses Tanh function, enhances the generalization ability of model using batch standardized technique between third layer and the 4th layer, activation primitive uses Sigmod function, the 4th layer is connected using complete come the probability for k relationship for obtaining being predicted, as next between layer 5 The action selection of a entity；

And the intensified learning value value network is using the five layers of neural network connected entirely composition, intensified learning The first layer of value value neural network to the 4th layer using the full Connection Neural Network successively decreased step by step, layer 5 only one Neuron is adopted between intensified learning value value neural network first layer and the second layer and between the second layer and third layer Over-fitting is prevented with dropout technology, and the activation primitive of first layer and the second layer is all made of tanh function, and third layer activates letter Number uses sigmod function, enhances the generalization ability of model, activation between third layer and the 4th layer using batch standardized technique Function is all made of relu function, and the 4th layer is connected between layer 5 using complete, and output result is working as Value neural network forecast Preceding state adds up bring income to dbjective state.

And a kind of knowledge mapping optimal path inquiry method based on deeply study proposed by the present invention, this method tool Body the following steps are included:

S1. the entity relationship in probabilistic knowledge map is arranged from big to small by user's visitation frequency in the unit time first Sequence chooses n relationship, generates required set of data samples；

S2. set of data samples is input in deeply study component and is trained study；

S3. it is carried out respectively the stage 1 in deeply study component, the training of the three phases in stage 2 and stage 3 is learned It practises；

Stage 1: entity is converted by initial term vector using encoder, then passes through 1-10 layers of CNN convolutional neural networks Encoded initial term vector is further processed and is converted into the term vector that deeply study component needs；

Stage 2: based on the relationship to be passed through of intensified learning Policy neural network forecast current entity next time；

Stage 3: value calculation is carried out to selected strategy based on intensified learning value network；

S4. after step S3 training study, the optimal path model of inquiry is obtained；

S5. input starting entity and target entity, successively pass through and are converted into term vector, it is defeated then to merge the two term vectors Enter the optimal path model to the inquiry of step S4, until finding target entity, finally obtaining a starting point is that starting is real Body, terminal are the optimal query path of target entity.

Further, n relationship is chosen in the step S1, n is not less than the 1/10 of probabilistic knowledge map entity relationship sum, γ=n/2 relationship is randomly selected in this n relationship, by this corresponding γ relationship and each relationship in probabilistic knowledge map Set of data samples needed for the two entities composition model training connected.

Further, the stage 1 of the step S3 is by the entity e of input₁And e₂Two are converted by encoder and network components A term vector G_θ(e₁) and G_θ(e₂), θ is set of network parameters to be optimized, two term vector G that the stage 1 is obtained_θ(e₁) with G_θ(e₂) similarity calculation is carried out, their COS distance is found out, is shown below:

D_θ(e₁,e₂)=| | G_θ(e₁)-G_θ(e₂)||_cos,

In the training process, the two received data samples are represented by { (F, e₁,e₂), F is each data sample Label be shown below to construct trained loss function:

Wherein n is the sum of training sample.

Further, the loss function L (θ) needs to minimize, and loss function L (θ) can be refined are as follows:

L_sIndicate the loss function between identical entity, and L_uIt indicates the loss function between different entities, needs to make L_uTo the greatest extent May be small, and make L_sIt is as big as possible.

Further, the stage 2 and stage 3 of the step S3 carries out in the training component in deeply study component, The training component includes tactful network and value network, and the stage 2 does Strategies Training, and the stage 3 does value training, and Optimize the parameter sets of the two networks, the i.e. parameter θ of Policy strategy network_pWith the parameter θ of Value value network_v, two In a training, be equipped with four-tuple<state, return, movement, model>, wherein state with the entity in probabilistic knowledge map come It indicates.

Further, what the deeply by tactful network and value network based on target drives learnt obtains strategy Function and cost function: it for strategic function, is fitted by the neural network of nonlinear function estimation, obtaining strategic function is f (e_t,g|θ_p), for cost function, present node is equally fitted to target section by the neural network of nonlinear function estimation The income of point, obtaining cost function is h (e_t,g|θ_v)。

Further, the return that cost function is obtained is multiplied to indicate plan with the estimation of strategy given by strategic function The slightly loss function of network, is shown below:

L_f=log f (e_t,g|θ_p)×((r_t+γh(e_t+1,g|θ_v)-h(e_t,g|θ_v)),

Wherein, γ ∈ (0,1) indicates discount factor, and according to L_fTo parameter θ_pDerivation, and updated in such a way that gradient rises The parameter θ of Policy strategy network_p, obtain following formula:

Indicate derivative operation,Indicate strategic function f (e_t,g|θ_p) entropy item, β ∈ (0,1) be learn Habit rate；

If current strategies are positive with income product brought by the strategy is chosen, then positive update Policy strategy network Parameter θ_pValue so that a possibility that predicting the state next time increase；It is reversed to update Policy strategy if product is negative The parameter θ of network_pValue so that predict that the shape probability of state is as small as possible next time, until current network prediction strategy not Until fluctuating again.

Further, the obtained cost function h (e_t,g|θ_v) and current entity actual gain r_t+γh(e_t+1,g|θ_v) The absolute value for making difference between the two calculates, and obtains the loss function of value network, is shown below:

L_h=| (r_t+γ×h(e_t+1,g|θ_v))-h(e_t,g|θ_v) |,

Wherein, γ ∈ (0,1) indicates discount factor, and according to L_hTo parameter θ_vDerivation, and updated in a manner of gradient decline The parameter θ of Value value network_v, obtain following formula:

Derivative operation is indicated, if the income h (e of prediction_t,g|θ_v) with calculate income r_t+γh(e_t+1,g|θ_v) between accidentally Difference is greater than the threshold value l that user gives, then updating the parameter θ of Value value network_v, so that the income error of prediction is as far as possible It is small, until the income h (e of prediction_t,g|θ_v) with calculate income r_t+γh(e_t+1,g|θ_v) between the threshold that is given in user of error Until no longer being fluctuated in the range of [- l, the l] of value.

Compared with prior art, the beneficial effect of technical solution of the present invention is:

(1) the invention proposes probabilistic knowledge maps, and the randomization carrying out 0~1 to entity relationship is handled, so that knowledge Optimal path inquiry on map more meets actual application demand.

(2) since the present invention is trained by the way of intensified learning, on the one hand reduce existing deep learning method In cause finally to calculate the poor problem of effect due to the irrationality of label design, secondly this mode is by saving each time Current entity reduces search space to the shortest path between a certain entity in iterative process, so that the adaptability of model is more By force, accuracy is higher.

(3) the present invention is based on deep learning technologies, and by two structures, identical, weight is shared and the convolution of pre-training is refreshing Starting term vector and target term vector are merged through network, avoided since the change needs of target entity restart to instruct Practice, increases the generalization ability of model, improve accuracy in computation.

(4) logical construction of each inside modules of the present invention is clear, calculation is flexible, has good loose coupling, Network structure can be flexibly set, the needs of calculating are met, while not being limited by specific developing instrument and programming software, and And can Quick Extended into distributed and parallelization exploitation environment, especially intensified learning and deep learning can be in a distributed manner It calculates, improves operation efficiency.

Detailed description of the invention

Fig. 1 is a kind of technological frame figure of knowledge mapping optimal path inquiry method based on deeply study.

Fig. 2 is that deeply learns component logic structure chart.

Specific embodiment

The attached figures are only used for illustrative purposes and cannot be understood as limitating the patent；

To those skilled in the art, it is to be understood that certain known features and its explanation, which may be omitted, in attached drawing 's.

The following further describes the technical solution of the present invention with reference to the accompanying drawings and examples.Embodiment 1

The invention proposes a kind of knowledge mapping optimal path inquiry systems based on deeply study, as shown in Figure 1, Including two modules, respectively module one and module two, module one is knowledge mapping optimal path model off-line training module, mould Block two is knowledge mapping optimal path model application on site module, and the knowledge mapping optimal path model off-line training module is set There is deeply to learn component, the training for making deeply to current entity learns, and data are carried out dress by module one and change instruction Practice, so that it may obtain the current entity next entity optimal to target entity, then next entity repetition training is learnt, so After obtain a trained optimal path model, then in module two by target entity and starting entity by conversion input To module one generate optimal path model in, realization strengthen again, can finally obtain optimal query path, by two modules it Between be used cooperatively, achieve the purpose that accuracy is high, generalization ability is strong, speed is fast and be easy to extension.

And module one constructs the set of data samples of optimal path model off-line training first, constructs as follows: first to probability Entity relationship in knowledge mapping is sorted from large to small by user's visitation frequency in the nearest m unit time, and then chooses first n Then relationship, n randomly select γ=n/2 not less than the 1/8 of probabilistic knowledge map entity relationship sum in this n relationship Relationship, thus two entity composition models that this corresponding γ relationship and each relationship in probabilistic knowledge map are connected The required set of data samples of training.

On this basis, each data sample constructed is input to deeply as shown in Figure 2 by module one It practises in component and is trained study, search for and obtain the relationship of next maximum probability associated by current entity, obtain and complete Merge the return value of the corresponding next entity of selected relationship later to update deeply study parameters of operating part.It changes in module one For this process, and it is continuously updated deeply study parameters of operating part, until current entity is target entity or iteration time Until number has been more than the greatest iteration threshold value that user gives, a candidate road from starting entity to target entity has been obtained at this time Diameter.Then, module one calculates the Total Return in current candidate path and compares with the fullpath Total Return inquired before, if worked as The income in preceding path be higher than before query path, then obtaining optimal path model, instead as the optimal path of inquiry The above process is executed again, until deeply study parameters of operating part convergence.

The deeply study component of module one is as shown in Fig. 2, by word2vec (word insertion) encoder, CNN (Convolutional Neural Network: convolutional neural networks) neural network, FC (Full Connect is connected entirely) mind Through network, intensified learning Policy (strategy) network, intensified learning value (Value) network and logistic regression component composition. The training process of deeply study component is broadly divided into 3 stages, wherein the stage 1 uses word2vec encoder by entity Be converted into initial term vector, then by multi-layer C NN convolutional neural networks to encoded initial term vector further progress at Reason is converted into the term vector that deeply study component needs；Stage 2 is based on intensified learning Policy (strategy) neural network forecast and works as The relationship to be passed through next time of preceding entity；Stage 3 is based on intensified learning value (Value) network and is worth to selected strategy It calculates.

In the stage 1, the present invention inputs c entity first, real by this c respectively by word2vec word embedded coding device Body converts corresponding c term vector, and the dimension of this c term vector is identical, then, arbitrarily selects from c entity term vector at random 2 term vectors are selected, the two term vectors are input in multi-layer C NN convolutional neural networks, multi-layer C NN convolutional neural networks are total Have 8 layers of structure: first layer carries out process of convolution to 2 entity term vectors of input respectively, the second layer to the convolution of first layer into Row maximum pondization operation, third layer and the 4th layer continue to second layer pond layer obtained data progress process of convolution, then, After the maximum pond layer of layer 5, it is sequentially ingressed into layer 6 and layer 7 and carries out process of convolution, finally by the 8th The average pond layer of layer obtains two final term vectors.Especially, right after the second layer and layer 5 complete maximum pondization operation It exports result and carries out batch standardization.To which the 8th layer of obtained term vector is the output in stage 1.Multi-layer C NN convolution mind Task through network training is to calculate the distance of the 8th layer of two obtained term vector, and the term vector distance for allowing positive sample to obtain is to the greatest extent May be small, and the term vector that negative sample obtains is apart from as big as possible.In addition, two complete phases of multilayer convolutional neural networks structure Together, network weight is shared.

Mainly intensified learning Policy (strategy) network is trained in the stage 2.The present invention is first with current entity Term vector and target entity term vector as input and by the obtained output vector of full articulamentum as Policy net The input term vector of network.Policy network is using the five layers of neural network connected entirely composition, preceding four layers of neural network node number Reduce step by step, layer 5 has k neuron.It is all made of between first layer and the second layer and between the second layer and third layer Dropout technology prevents over-fitting, and activation primitive uses tanh function.Using batch standardized technique between third layer and the 4th layer Enhance the generalization ability of model, meanwhile, activation primitive uses sigmod function.4th layer between layer 5 using full connection The probability of k relationship being predicted is obtained, action selection as next entity.The output of Policy network is probability Maximum relationship, and it as the obtained behavior of Policy network (Action).The selection mode of k relationship is as follows: first First select k₁A highest relationship of confidence level, then randomly chooses k-k from remaining relationship₁It is a, and by them according to confidence level It sorts from large to small, to obtain the maximum relationship of k confidence level of Policy network output.The training mission of Policy network It is to select best strategy as far as possible, so that next entity bring Income Maximum that selected relationship reaches

And the stage 3 is mainly trained intensified learning Value (value) network.The input of Value network and Policy The input of network is identical, i.e., using the term vector of the term vector of current entity and target entity as input and pass through full articulamentum institute Obtained output vector.Value network is used to the 4th layer and is passed step by step using the five layers of neural network connected entirely composition, first layer The full Connection Neural Network subtracted, only one neuron of layer 5.Between first layer and the second layer and the second layer and third layer Between be all made of dropout technology and prevent over-fitting, the activation primitive of first layer and the second layer is all made of tanh function, and third Layer activation primitive uses sigmod function.Enhance the extensive energy of model between third layer and the 4th layer using batch standardized technique Power, activation primitive are all made of relu function.4th layer is connected between layer 5 using complete, and output result is Value network The current state of prediction adds up bring income to dbjective state.The training mission of Value network is to make to predict under current state Income, as far as possible with the error of the sum of the income predicted under the confidence level and NextState of relationship given by Policy network It is small.

Module two in probabilistic knowledge map starting entity and target entity be input, successively by word2vec word it is embedding Enter encoder and 8 layers of CNN convolutional neural networks are converted into one-dimensional term vector respectively, then, merges the two one-dimensional term vectors simultaneously Input as intensified learning Policy strategy network and Value value network.Policy strategy network and Value value network It overlaps each other, and from starting entity, the current entity next entity optimal to target entity is provided every time, until finding Until target entity.Finally obtaining a starting point is starting entity, and terminal is the optimal query path of target entity.

A kind of knowledge mapping optimal path inquiry method based on deeply study that the present invention also proposes, specifically includes Following steps:

S1. first to the entity relationship in probabilistic knowledge map by user's visitation frequency in the nearest m unit time from big To small sequence, and then preceding n relationship is chosen, then n is closed not less than the 1/8 of probabilistic knowledge map entity relationship sum at this n γ=n/2 relationship is randomly selected in system, thus by this corresponding γ relationship in probabilistic knowledge map and each relationship institute Set of data samples needed for two entities composition model training of connection.

S2. then using the word2vec word embedded coding device of *** company respectively by the current entity of input and target Entity is converted into the one-dimensional term vector that two length are 512.

S3. then, carried out respectively the stage 1 in deeply study component, the instruction of the three phases in stage 2 and stage 3 Practice study.

Stage 1: the CNN convolutional neural networks that two structures of construction are identical and weight is shared, construction process are as follows:

The first layer of CNN convolutional neural networks includes 512 neurons, and using 22 × 1 convolution kernels, sliding step is solid It is set to 2, this layer mainly carries out convolution to the one-dimensional term vector (length is equal to 512) that front word2vec word embedded coding device obtains Processing obtains the one-dimensional vector that 2 length are 256.Then, the second layer of CNN convolutional neural networks is directed to the 2 of first layer output A one-dimensional term vector is 2 × 1 using 2 convolution kernel sizes, and the convolution kernel that sliding step is 1 carries out maximum pondization operation, thus Obtain the one-dimensional vector that 2 length are 256.Then on this basis, batch standard operation is executed to this 2 one-dimensional vectors.Then, Export to the second layer using 44 × 1 convolution kernels 2 of the third layer of CNN convolutional neural networks are one-dimensional after criticizing standard Vector carries out process of convolution, and sliding step is fixed as 4, to obtain the one-dimensional vector that 8 length are 64.Then, CNN convolution mind The 4th layer through network using 14 × 1 convolution kernel, sliding step 1, to 8 one-dimensional vectors of third layer output again into Row process of convolution is similarly obtained the one-dimensional vector that 8 length are 64.Then, the layer 5 of CNN convolutional neural networks is to the 4th layer 8 one-dimensional vectors carry out maximum pondization operation again, convolution kernel size is equal to 2 × 1, and convolution kernel number is equal to 4, sliding step It is 2, thus, obtain the one-dimensional vector that 32 length are 32.On this basis, crowd standard behaviour is executed to this 32 one-dimensional vectors Make.Then, the layer 6 of network layer 5 export using 24 × 1 convolution kernels 32 after criticizing standard it is one-dimensional to Amount carries out process of convolution, and sliding step is fixed as 2, thus, obtain the one-dimensional vector that 64 length are 16.Then, network 64 one-dimensional vectors progress process of convolution that layer 7 exports layer 6 using 44 × 1 convolution kernels, sliding step 4, To obtain the one-dimensional vector that 40 length are 512.Finally, the 8th layer of network is operated using average pondization, and finally obtain 256 length are the one-dimensional vector of 4 dimensions, and then, this 256 one-dimensional vectors are connected by connecting entirely with 512 neurons, from And obtain the one-dimensional vector that length is 512.

After the CNN convolutional neural networks construction that two structures are identical and weight is shared finishes, the present invention is logical The entity and relationship crossed in probabilistic knowledge map are trained them and parameter optimization, process are as follows:

The input of the two CNN convolutional neural networks is two entity e respectively₁And e₂, and it is 512 that output, which is two length, One-dimensional vector G_θ(e₁) and G_θ(e₂), wherein θ is set of network parameters to be optimized.Then, to the two one-dimensional vectors into Row similarity calculation finds out their COS distance: D_θ(e₁,e₂)=| | G_θ(e₁)-G_θ(e₂)||_cosIf e₁And e₂This two A physical differences are larger, then D_θ(e₁,e₂) larger, and if e₁With it is same or similar, then D_θ(e₁,e₂) smaller.

Therefore, in the training process, the two CNN convolutional neural networks received data samples be represented by (F, e₁,e₂), wherein F is the label of each data sample, if e₁And e₂Indicate identical entity, then F=1, anyway F=0.From And obtain the loss function of construction training are as follows:

Wherein n is the sum of training sample.

On this basis, L is used_sIndicate the loss function between identical entity, and L_uIndicate the loss letter between different entities Number.In order to achieve the purpose that minimize loss function L (θ), need to make L_uIt is as small as possible, and make L_sIt is as big as possible.To training Loss function L (θ) can be refined are as follows:

In the training process, this by minimize loss function L (θ), may finally allow identical physical distance as far as possible Small, different physical distances is as big as possible, increases the discrimination of sample.In addition, in the training process, choosing 1,000,000 samples This entity therefrom randomly selects 250,000 pairs of identical entities to as positive sample, and randomly selects 250,000 pairs of different entities To as negative sample, it is input to after mixing and goes to train in network.

After calculating by the two CNN convolutional neural networks, length corresponding to current entity and target entity is obtained For 512 one-dimensional vector.Then, the two one-dimensional vectors are subjected to full attended operation again, i.e., one that two length are 512 Dimensional vector is directly connected to obtain the one-dimensional vector that length is 1024, is then linked into the full articulamentum of 512 neurons, Finally obtain the one-dimensional vector that a length is 512.We indicate fused current entity and target entity with it；

Stage 2 and stage 3 are mainly the Policy strategy network and Value value network in training deeply study component Network, and optimize the parameter sets of the two networks, the i.e. parameter θ of Policy strategy network_pWith the parameter θ of Value value network_v。 Next optimal policy and dynamic undated parameter θ are searched in continuous repetitive exercise above-mentioned two stage_pAnd θ_v, until getting Until global optimum's strategy.Each round iteration can find a target entity, and undated parameter θ in fintie number of steps_pAnd θ_v。 Especially, maximum number of iterations c is arranged in module one_maxIf current iteration number is more than to stop iteration.

For this purpose, the present invention, which is primarily based on probabilistic knowledge map, defines required four-tuple in the two network training process <state is returned, movement, and model>, wherein state is indicated with the entity in probabilistic knowledge map, such as current entity e_t, mesh Mark entity g and starting entity s；Current entity e_tTo next entity e_t+1Return r_tIt indicates, r_tEqual to e_tWith e_t+1Between relationship Confidence level；Movement indicated with m, be intelligent body action selection, correspond to probabilistic knowledge map in current entity with it is next Relationship between entity；Finally, model indicates the depth based on target drives in Policy strategy network or Value value network The strategic function or cost function of intensified learning: the neural network estimated for strategic function, the present invention by nonlinear function It is fitted, i.e., strategic function is f (e_t,g|θ_p), and for cost function, the nerve net of the same nonlinear function estimation of the present invention Network is fitted present node to the income of destination node, i.e., cost function is h (e_t,g|θ_v)

Stage 2: first to the parameter sets θ of Policy strategy network_pCarry out random initializtion.Then, Policy strategy Network receives current entity and the corresponding one-dimensional vector of target entity as input.The first layer of Policy strategy network has 256 A neuron is connect entirely with one-dimensional vector corresponding to current entity and target entity (length 512)；The second layer has 64 A neuron；Third layer has 32 neurons；4th layer has 16 neurons；Layer 5 has 10 neurons, represents output The value of 10 entities and the probability for selecting this 10 entities, this 10 entities be by current entity into next layer entity The higher entity of preceding 7 confidence levels is collectively constituted with 3 entities of random selection in remaining entity, if next layer entity number Less than 10,0 filling of so much remaining solid element.First layer, the second layer and third layer are all made of tanh activation letter Number, and the 4th layer uses sigmod activation primitive with layer 5.Meanwhile using dropout technology and implementing to criticize between layers Standardization improves precision of prediction.Finally, 10 neuron outputs of layer 5 is 10 selected by Policy strategy network Then the probability of a relationship obtains selection of the relationship as behavior of maximum probability by softmax function.

Strategy given by the return obtained in the training process in stage 2 based on cost function and current strategies function The loss function that estimation is multiplied to indicate Policy strategy network is to be shown below:

L_f=log f (e_t,g|θ_p)×((r_t+γh(e_t+1,g|θ_v)-h(e_t,g|θ_v)),

Wherein, γ ∈ (0,1) indicates discount factor.Then, according to L_fTo parameter θ_pDerivation, and in such a way that gradient rises Undated parameter θ_p, it can obtain:

Wherein,Indicate derivative operation,Indicate strategic function f (e_t,g|θ_p) entropy item, β ∈ (0, It 1) is learning rate, the purpose that the entropy item is added is to obtain time dominant strategy too early in order to avoid Policy strategy network, and fall into office Portion is optimal.If current strategies are positive with income product brought by the strategy is chosen, forward direction updates θ_pValue, so that next A possibility that secondary prediction state, increases；If product is negative, θ is reversely updated_pValue, so that predicting the shape probability of state next time It is as small as possible, until the strategy of current network prediction no longer fluctuates；

Stage 3: first to the parameter sets θ of Value value network_vCarry out random initializtion.Then, tactful with Policy Network is the same, and Value value network receives current entity and the corresponding one-dimensional vector of target entity as input.Value network First layer have 256 neurons, connected entirely with one-dimensional vector corresponding to current entity and target entity (length 512) It connects；The second layer has 128 neurons；Third layer has 64 neurons；4th layer has 32 neurons；Layer 5 has a nerve Value in the state that member representative is current.It is all used between first layer and the second layer and between the second layer and third layer Dropout technology prevents over-fitting.First layer and the second layer are all made of tanh activation primitive, and third layer is all made of with the 4th layer Sigmod activation primitive.Implement batch standardization between third layer and the 4th layer to enhance the generalization ability of model.4th layer The value of prediction is finally obtained using full Connection Neural Network between layer 5.

In the training process in stage 3, current entity actual gain r is calculated_t+γh(e_t+1,g|θ_v) and predicted income h (e_t,g|θ_v) between difference absolute value, and the loss function as Value value network is shown below:

L_h=| (r_t+γ×h(e_t+1,g|θ_v))-h(e_t,g|θ_v) |,

Wherein, γ ∈ (0,1) indicates discount factor.Then, according to L_hTo parameter θ_vDerivation, and in a manner of gradient decline Undated parameter θ_v, it can obtain::

Wherein,Indicate derivative operation.If the income h (e of prediction_t,g|θ_v) with calculate income r_t+γh(e_t+1,g| θ_v) between error be greater than the given threshold value l of user, then updating θ_v, so that the income error of prediction is as small as possible, until prediction Income h (e_t,g|θ_v) with calculate income r_t+γh(e_t+1,g|θ_v) between error [- l, the l] of threshold value that gives in user Until no longer being fluctuated in range；

S4. in an iterative process, and it is continuously updated deeply study parameters of operating part, until current entity is that target is real Until body or the number of iterations have been more than the greatest iteration threshold value that user gives, obtained at this time from starting entity to target entity A path candidate.Then, mould calculate current candidate path Total Return and with the fullpath Total Return pair inquired before Than if the query path before the income of current path is higher than is held repeatedly as the optimal path model of inquiry The row above process, until deeply study parameters of operating part convergence.

S5. the entity in two probabilistic knowledge maps, i.e. starting entity s and target entity g are inputted, by trained Word2vec word embedded coding device converts them to the one-dimensional vector that length is 512 respectively.Then, the two vectors are merged The one-dimensional vector for being 1024 at length, and using it as the input of trained multi-layer C NN convolutional neural networks, it has respectively obtained The one-dimensional vector that length corresponding to beginning entity and target entity is 512.Then on this basis, then by the two one-dimensional vectors Generate the vector that new length is 1024 by full articulamentum, and as trained intensified learning Policy strategy network and The input of Value value network.Policy strategy network and Value value network overlap each other, and from starting entity, often It is secondary to provide the current entity next entity optimal to target entity, until finding target entity.To finally obtain one Starting point is starting entity s, and terminal is the optimal query path Path (s, g) of target entity g.

Finally, it is stated that the above examples are only used to illustrate the technical scheme of the present invention and are not limiting, although referring to compared with Good embodiment describes the invention in detail, those skilled in the art should understand that, it can be to skill of the invention Art scheme is modified or replaced equivalently, and without departing from the objective and range of technical solution of the present invention, should all be covered at this In the scope of the claims of invention.

Claims

1. a kind of knowledge mapping optimal path inquiry system based on deeply study, which is characterized in that including two modules, Respectively module one and module two, the module one are knowledge mapping optimal path model off-line training module, and module two is to know Know map optimal path model application on site module, it is strong that the knowledge mapping optimal path model off-line training module is equipped with depth Chemistry practises component, and the training for making deeply to current entity learns, and obtains next entity, then make current entity with next entity Repetition training study obtains optimal path model, then is input to the optimal road that module one obtains by starting entity and target entity Diameter model, finally obtains optimal path.

2. the knowledge mapping optimal path inquiry system according to claim 1 based on deeply study, which is characterized in that The deeply study component is made of encoder, network components and logistic regression component, and the network components include conversion Component and training assembly, the transition components include CNN neural network and FC neural network, and the training assembly includes extensive chemical Practise Policy strategy network and intensified learning value value network.

3. the knowledge mapping optimal path inquiry system according to claim 2 based on deeply study, which is characterized in that The neural network that the intensified learning Policy network is connected entirely using five layers forms, before intensified learning Policy neural network Four node layer numbers reduce step by step, and layer 5 has k neuron, the first layer and the second layer of intensified learning Policy neural network And the second layer and third layer are all made of dropout technology prevents over-fitting, activation primitive uses tanh function, third layer and the Enhance the generalization ability of model between four layers using batch standardized technique, activation primitive uses sigmod function, the 4th layer with Action selection between layer 5 using full connection come the probability for k relationship for obtaining being predicted, as next entity；

The intensified learning value value network is using the five layers of neural network connected entirely composition, intensified learning value value mind First layer through network to the 4th layer using the full Connection Neural Network successively decreased step by step, strengthen by only one neuron of layer 5 Dropout skill is all made of between study value value neural network first layer and the second layer and between the second layer and third layer Art prevents over-fitting, and the activation primitive of first layer and the second layer is all made of tanh function, and third layer activation primitive uses Sigmod function, enhances the generalization ability of model using batch standardized technique between third layer and the 4th layer, activation primitive is equal Using relu function, the 4th layer is connected between layer 5 using complete, and output result is the current state of Value neural network forecast Add up bring income to dbjective state.

4. a kind of knowledge mapping optimal path inquiry method based on deeply study, which comprises the following steps:

S1. the entity relationship in probabilistic knowledge map is sorted from large to small first by user's visitation frequency in the unit time, is selected N relationship is taken, required set of data samples is generated；

S3. it is carried out respectively the stage 1 in deeply study component, the training study of the three phases in stage 2 and stage 3；

Stage 1: entity is converted by initial term vector using encoder, then by 1-10 layers of CNN convolutional neural networks to The initial term vector of coding, which is further processed, is converted into the term vector that deeply study component needs；

S5. then input starting entity and target entity merge the two term vectors and are input to successively by being converted into term vector The optimal path model of the inquiry of step S4, until finding target entity, finally obtaining a starting point is starting entity, eventually Point is the optimal query path of target entity.

5. the knowledge mapping optimal path inquiry method according to claim 4 based on deeply study, which is characterized in that Choose n relationship in the step S1,1/10 of n not less than probabilistic knowledge map entity relationship sum, in this n relationship at random Choose γ=n/2 relationship, two realities that this corresponding γ relationship and each relationship in probabilistic knowledge map are connected The required set of data samples of body composition model training.

6. the knowledge mapping optimal path inquiry method according to claim 4 based on deeply study, which is characterized in that The stage 1 of the step S3 is by the entity e of input₁And e₂Two term vector G are converted by encoder and network components_θ(e₁) With G_θ(e₂), θ is set of network parameters to be optimized, two term vector G that the stage 1 is obtained_θ(e₁) and G_θ(e₂) carry out it is similar Degree calculates, and finds out their COS distance, is shown below:

D_θ(e₁,e₂)=| | G_θ(e₁)-G_θ(e₂)||_cos,

In the training process, the two received data samples are represented by { (F, e₁,e₂), F is the mark of each data sample Label, to construct trained loss function, are shown below:

Wherein n is the sum of training sample.

The stage 2 and stage 3 of the step S3 carries out in the training component in deeply study component, and the stage 2 does Strategies Training, the stage 3 do value training, optimize the parameter sets of the two networks, i.e. Policy plan in the training process The slightly parameter θ of network_pWith the parameter θ of Value value network_v, and it is equipped with four-tuple<state, it returns, movement, model>, wherein State is indicated with the entity in probabilistic knowledge map.

7. the knowledge mapping optimal path inquiry method according to claim 6 based on deeply study, which is characterized in that The loss function L (θ) needs to minimize, and loss function L (θ) can be refined are as follows:

L_sIndicate the loss function between identical entity, and L_uIt indicates the loss function between different entities, needs to make L_uAs far as possible It is small, and make L_sIt is as big as possible.

8. the knowledge mapping optimal path inquiry method according to claim 6 based on deeply study, which is characterized in that The deeply study by tactful network and value network based on target drives obtains strategic function and cost function: It for strategic function, is fitted by the neural network of nonlinear function estimation, obtaining strategic function is f (e_t,g|θ_p), for valence Value function is equally fitted present node to the income of destination node by the neural network of nonlinear function estimation, must be worth Function is h (e_t,g|θ_v)。

9. the knowledge mapping optimal path inquiry method according to claim 8 based on deeply study, which is characterized in that The return that cost function is obtained is multiplied to indicate the loss letter of tactful network with the estimation of strategy given by strategic function Number, is shown below:

L_f=logf (e_t,g|θ_p)×((r_t+γh(e_t+1,g|θ_v)-h(e_t,g|θ_v)),

Indicate derivative operation,Indicate strategic function f (e_t,g|θ_p) entropy item, β ∈ (0,1) be study Rate；

If current strategies are positive with income product brought by the strategy is chosen, then the positive ginseng for updating Policy strategy network Number θ_pValue so that a possibility that predicting the state next time increase；It is reversed to update Policy strategy network if product is negative Parameter θ_pValue so that predict that the shape probability of state is as small as possible next time, until the strategy no longer wave of current network prediction Until dynamic.

10. the knowledge mapping optimal path inquiry method according to claim 8 based on deeply study, feature exist In the obtained cost function h (e_t,g|θ_v) and current entity actual gain r_t+γh(e_t+1,g|θ_v) between the two make it is poor The absolute value of value calculates, and obtains the loss function of value network, is shown below:

L_h=| (r_t+γ×h(e_t+1,g|θ_v))-h(e_t,g|θ_v) |,

Derivative operation is indicated, if the income h (e of prediction_t,g|θ_v) with calculate income r_t+γh(e_t+1,g|θ_v) between error it is big In the threshold value l that user gives, then updating the parameter θ of Value value network_v, so that the income error of prediction is as small as possible, directly To the income h (e of prediction_t,g|θ_v) with calculate income r_t+γh(e_t+1,g|θ_v) between the threshold value that gives in user of error [- L, l] in the range of no longer fluctuate until.