CN108304424A

CN108304424A - Text key word extracting method and text key word extraction element

Info

Publication number: CN108304424A
Application number: CN201710203566.8A
Authority: CN
Inventors: 包恒耀; 苏可; 饶孟良; 陈益
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2017-03-30
Filing date: 2017-03-30
Publication date: 2018-07-20
Anticipated expiration: 2037-03-30
Also published as: CN108304424B

Abstract

A kind of text key word extracting method and device, the method in one embodiment include：Obtain text to be extracted；It is scanned in associated keywords database, matches the keyword in the text to be extracted；According to the keyword in the text to be extracted, the text to be extracted matched, all text clause and the combination of corresponding keyword are determined；According to key words probabilities network model, analyzes and determine each text clause and the probability that the synthesis of corresponding crucial phrase is stood；The corresponding keyword combination of the determining maximum probability of probability intermediate value of analysis is determined as the keyword extracted from the text to be extracted combination.This embodiment scheme fast response time, and the difficulty of extraction text key word is simplified, improve the accuracy of text key word.

Description

Text key word extracting method and text key word extraction element

Technical field

The present invention relates to intelligent interaction fields, are carried more particularly to a kind of text key word extracting method and text key word Take device.

Background technology

By taking the intelligent interaction devices such as intelligent sound or intelligent assistant as an example, be typically by talk with form and user into Row interaction, when interacting, by by the speech recognition of user be text after, to the keyword in text (at some In technology application, also referred to as entity word) it extracts.However, in this interaction, interactive text is usually very Short, it is extremely difficult will to extract keyword therein (such as singer's name, song title) for only several words.On the other hand, for short essay For this, relative to long text, a large amount of data can not be crawled from internet, it also can without a large amount of public labeled data To use, the public corpus data in vertical field is considerably less, and developer oneself is needed to go to collect, this is non-to project cold-start phase Chang Buli.Therefore there is an urgent need for can obtain the text key word extracting mode of better result.

At present in the case of no labeled data, the mode for extracting text key word mainly uses maximum matching algorithm With the method based on stencil matching.Maximum matching algorithm is commonly used in Chinese automatic word-cut comprising Forward Maximum Method and inverse It is matched to maximum.It is from left to right by the several continuation characters and entity in text to be segmented by taking Forward Maximum Method as an example The vocabulary matching in library (also referred to as keywords database), if matched, is syncopated as a longest word of length.Such as：Short text For " I wants to listen the song (A, B, C indicate a specific word respectively) of ABC ", the entity library of singer is { " AB ", " ABC " }, then According to maximum match principle, the entity (keyword) extracted is exactly " ABC ", rather than " AB ".Method based on stencil matching It is then to be pre-designed some common masterplates, such as " I wants to listen [song] of [singer] ".If the inquiry string of user is that " I thinks Then the SX " for listening ABC can arrive corresponding entity library inspection again by stencil matching keyword " ABC " and " SX " is extracted It looks into and whether contains the keyword, if there is then returning the result.However, although the speed of maximum matching algorithm is fast, effect is not It is good, and keyword of the same name cannot be distinguished.For example, " kissing goodbye " is both likely to be song, it is also possible to be album.And based on In the method for stencil matching, the saying of user is very strange, relatively good to achieve the effect that, each scene may need tens Ten thousand masterplate, does not only result in that speed is slow in this way, once and user interrogation mode not in masterplate, will there is no keyword It can be extracted.

Invention content

Based on this, the present embodiment provides a kind of text key word extracting method and a kind of text key word extraction element, It can improve the accuracy of text key word, and speed is fast.

A kind of text key word extracting method, including：

Obtain text to be extracted；

It is scanned in associated keywords database, matches the keyword in the text to be extracted；

According to the keyword in the text to be extracted, the text to be extracted matched, all texts are determined Clause and the combination of corresponding keyword；

According to key words probabilities network model, analyzes and determine what each text clause and the synthesis of corresponding crucial phrase were stood Probability；

The corresponding keyword combination of the determining maximum probability of probability intermediate value of analysis is determined as from the text to be extracted The keyword of middle extraction combines.

A kind of text key word extraction element, including：

Text acquisition module, for obtaining text to be extracted；

Keywords matching module is matched for being scanned in associated keywords database in the text to be extracted Keyword；

Determining module is combined, for according to the keyword in the text to be extracted, the text to be extracted matched, Determine all text clause and the combination of corresponding keyword；

Probability analysis module, for according to key words probabilities network model, analysis to determine each text clause and correspondence The vertical probability of crucial phrase synthesis；

Determining module is extracted, the maximum probability of probability intermediate value for determining the probability analysis module analysis is corresponding Keyword combination is determined as the keyword extracted from the text to be extracted combination.

According to the scheme of embodiment as described above, when needing to extract the keyword in complete to be extracted, It is to be based on associated keywords database, is scanned in associated keywords database, match the keyword in text to be extracted, so All text clause and the combination of corresponding keyword are determined based on keyword afterwards, further according to key words probabilities network model point Analysis determines each text clause and the probability that the synthesis of corresponding crucial phrase is stood, and the determining probability intermediate value of analysis is maximum The corresponding keyword combination of probability is determined as the keyword extracted from the text to be extracted combination.Its extract it is to be extracted On the basis of keyword in text, determines all text clause and the combination of corresponding keyword, be then based on keyword Probabilistic Network Model determines the probability of each text clause and the combination of corresponding keyword, not only fast response time, but also The difficulty for simplifying extraction text key word, improves the accuracy of text key word.

Description of the drawings

Fig. 1 is the schematic diagram of the application environment of the scheme in one embodiment；

Fig. 2 is the schematic diagram of the composed structure of the terminal in one embodiment；

Fig. 3 is the schematic diagram of the composed structure of the server in one embodiment；

Fig. 4 is the flow diagram of the text key word extracting method in one embodiment；

Fig. 5 is the principle schematic of the generation key words probabilities network model in one embodiment；

Fig. 6 is the principle schematic of the extraction text key word in one embodiment；

Fig. 7 is the flow diagram of the generation key words probabilities network model in a specific example；

Fig. 8 is the flow diagram of the generation key words probabilities network model in another specific example；

Fig. 9 is the flow diagram of the generation key words probabilities network model in another specific example；

Figure 10 is the structural schematic diagram of the text key word extraction element in one embodiment；

Figure 11 is the structural schematic diagram of the text key word extraction element in another embodiment；

Figure 12 is the structural schematic diagram of the model generation module in a specific example.

Specific implementation mode

To make the objectives, technical solutions, and advantages of the present invention more comprehensible, with reference to the accompanying drawings and embodiments, to this Invention is described in further detail.It should be appreciated that the specific embodiments described herein are only used to explain the present invention, Do not limit protection scope of the present invention.

Fig. 1 shows the working environment schematic diagram in one embodiment of the invention, as shown in Figure 1, its working environment is related to Terminal 101, it is also possible to be related to server 102, terminal 101, server 102 can be communicated by network.Terminal 101 can be with Intelligent interaction is carried out with terminal user, receiving terminal content of text input by user, or by the speech recognition of terminal user For content of text, by being extracted to the keyword in content of text, subsequent related service can be carried out, such as based on carrying The keyword taken from local either network inquiry play corresponding song, based on the keyword of extraction from local or network inquiry Corresponding film, the corresponding weather of the keyword query based on extraction etc..The process for extracting the keyword in content of text, can be with It is carried out in terminal 101, can also be that content of text is sent to after server 102 and is carried out in server 102 by terminal 101.At this In example scheme, when extracting the keyword in content of text, it can be carried out in conjunction with key words probabilities network model, the keyword Probabilistic Network Model can server 102 is stored in locally, after being determined by server 102 to execute subsequent extraction text The process of keyword in content can also be that the key words probabilities network model is sent to terminal 101 by server 102 Afterwards, the process of the keyword in subsequent extraction content of text is executed by terminal 101.On the other hand, the key words probabilities network Model can also be after being determined by terminal 101, to be sent to server 102, and be distributed to other terminals 101 by server 102 It executes.The present embodiments relate to be the keyword that terminal 101 or server 102 extract in content of text scheme.

The structural schematic diagram of terminal 101 in one embodiment is as shown in Figure 2.The terminal 101 includes passing through system bus Processor, storage medium, communication interface, power interface and the memory of connection.Wherein, the storage medium of terminal 101 is stored with one Kind text key word extraction element, the device is for realizing a kind of text key word extracting method.The communication interface of terminal 101 is used It connect and communicates in other servers in server 102 or network, the power interface of terminal 101 is used for and external power supply Connection, external power supply are powered by the power interface to terminal 101.Terminal 101, which can be any type, can realize that intelligence is defeated Enter equipment of output, such as mobile terminal (such as mobile phone, tablet computer etc.), intelligent sound box etc.；Can also be it is other have it is upper State the smart machine of structure.

Server 102 and structural schematic diagram in one embodiment are as shown in Figure 3.It includes being connected by system bus Processor, power supply module, storage medium, memory and communication interface.Wherein, the storage medium of server 102 is stored with operation System, database and a kind of text key word extraction element, the device is for realizing a kind of text key word extracting method.Service The communication interface of device is for being attached and communicating with other servers in terminal 101 and network.

Fig. 4 shows the flow diagram of the text key word extracting method in one embodiment, as shown in figure 4, the reality The text key word extracting method applied in example includes：

Step S401：Obtain text to be extracted；

Step S402：It is scanned in associated keywords database, matches the keyword in the text to be extracted；

Step S403：According to the keyword in the text to be extracted, the text to be extracted matched, institute is determined Some text clause and the combination of corresponding keyword, wherein any one the text clause determined and its corresponding keyword Combination, has collectively constituted above-mentioned text to be extracted；

Step S404：According to key words probabilities network model, analyzes and determine each text clause and corresponding keyword Combine the probability set up；

Step S405：The corresponding keyword combination of the determining maximum probability of probability intermediate value of analysis is determined as waiting for from described The keyword combination extracted in extraction text.

Scheme in embodiment as described above, can execute in terminal, can also be to be executed on server.

For being executed in terminal, above-mentioned text to be extracted can be the text of terminal user's input, such as terminal use The text that family is inputted by user-interactive devices such as keyboard, touch screens can also be by the voice progress to terminal user Identify obtained text.In the present embodiment, the mode for obtaining text to be extracted can be to receive text input by user, Or by voiced translation input by user at text, in other embodiments, can also obtain by other means described to be extracted Text.

On the other hand, for being executed in terminal, above-mentioned key words probabilities network model can be by the pre- Mr. of terminal At can also include step at this point, before above-mentioned acquisition text to be extracted：Generate the key words probabilities network model.This Outside, can also be after server generates key words probabilities network model, terminal obtains the key words probabilities network from server Model.Can also include step at this point, before above-mentioned acquisition text to be extracted：Obtain the keyword that server generates Probabilistic Network Model.

Can receive above-mentioned text to be extracted from terminal, in acquisition, this waits carrying terminal for executing on the server After taking text, which is uploaded to server.The text to be extracted can be the text of terminal user's input, such as The text that terminal user is inputted by user-interactive devices such as keyboard, touch screens can also be by the language to terminal user The text that sound is identified, in other embodiments, the text that can also be obtained by other means.

On the other hand, for executing on the server, above-mentioned key words probabilities network model can be pre- by server It first generates, can also include step at this point, before above-mentioned acquisition text to be extracted：Generate the key words probabilities network mould Type.

In a specific example, when terminal or server generate above-mentioned key words probabilities network model, specific side Formula may include：

Acquisition waits for training text, described to wait for that training text includes each clause rule template and the language material text in each field；

It waits for that training text is trained according to described, obtains the key words probabilities network model.

Wherein, above-mentioned clause rule template shows specific clause rule.Since set clause rule may not All clause, such as some colloquial clause can be included, can also include each field in waiting for training text therefore Language material text, the language material text can be some colloquial texts.In a concrete application realization method, the language in each field Material text can obtain in such a way that reptile crawls.

Wherein, wait for that training text is trained in above-mentioned basis, when obtaining key words probabilities network model, due to waiting training Text includes each clause rule template and the language material text in each field both texts, thus can also be in conjunction with real in training Border technology needs are determined.

In a specific example, according to when training text is trained, it is clause that can not treat training text Rule template or the language material text in each field distinguish, and during training each time, randomly select primary, tool The mode of body may include：

From waiting for that extracting one in training text at random currently waits for training text, this currently waits for that training text is clause rule mould Plate or language material text, i.e., that extracts at this time currently waits for that training text may be clause rule template, it is also possible to language material text This；

Current by extraction waits for that the current network model to be trained of training text input is trained, and waits instructing after being trained Practice network model；

When the language material text of each clause rule template or each field in training text is not extracted and finished, with training The current network model to be trained of network model modification to be trained afterwards, and return and wait for extracting one in training text at random from above-mentioned Current the step of waiting for training text, until the above-mentioned each clause rule template waited in training text, the language material text in each field are equal Extraction finishes；

Network model to be trained after the training of acquisition is determined as above-mentioned key words probabilities network model.

Wherein, it is to wait for each clause rule template of training text or each field in judgement in above-mentioned specific example Language material text does not extract when finishing, just with being carried out for the network model modification to be trained after training currently network model to be trained Illustrate, can also be first with the current network mould to be trained of the network model modification to be trained after trained in particular technique application After type, then treats the language material text of each clause rule template and each field in training text and whether extracts to finish and judged, It is after updating at this point, after the language material text of each clause rule template and each field in waiting for training text extracts Current network model to be trained be determined as above-mentioned key words probabilities network model.

Based on the above-mentioned example for being trained acquisition key words probabilities network model, it is to be understood that due to every time It is from waiting for that extracting one in training text at random currently waits for training text, therefore, adjacent training process twice is from waiting for training text That extracts at random in this current waits for training text, it may be possible to the text of same type, such as be clause rule template or be Language material text, it is also possible to it is different types of text, such as what is once extracted is clause rule template, and another extraction It is language material text.

In another specific example, it can be the number and language material text that will wait for the clause rule template in training text Number be set as identical, can be handed over for the expectation text in clause rule template and each field at this time when being trained For progress, specific mode may include：

A clause rule template is extracted from each clause rule template, and the input of the clause rule template of extraction is worked as Before network model to be trained be trained, the network model to be trained after being trained；

With after above-mentioned training after trained network model modification currently after training network model, from the language material in each field text Extract a language material text in this, and by the above-mentioned updated current network model to be trained of the language material text input of extraction into Row training, the network model to be trained after being trained；

Above-mentioned when the language material text of each clause rule template or each field in training text is not extracted and finished, use After the training after trained network model modification currently wait train network model after, return extract one from each clause rule template The step of a clause rule template, until the above-mentioned each clause rule template waited in training text, the language material text in each field are equal Extraction finishes；

Wherein, in the above description of the specific example, be judgement wait for training text each clause rule template or The language material text in each field does not extract when finishing, just with the current network model to be trained of the network model modification to be trained after training It returns and is illustrated for extracting clause rule template again, can also be first to be waited for after training in particular technique application It trains network model modification currently after training network model, then treats each clause rule template and each field in training text Language material text whether extract to finish and judged, at this point, each clause rule template and each field in waiting for training text It is that updated current network model to be trained is determined as above-mentioned key words probabilities network mould after language material text extracts Type.

Wherein, current in the clause rule template or language material text input that will be extracted in above-mentioned two specific example Can input the clause rule template of extraction or language material text as unit of word when trained network model is trained Current network model to be trained is trained, to obtain preferable generalization ability.

Based on embodiment as described above and its specific example, this embodiment scheme is when particular technique is realized, Ke Yishi It is divided into two processes of text entity extraction on the training of line drag and line.Wherein, it when online drag training, can instructed After practicing data (after training text), by waiting for that training text is trained to acquisition, key words probabilities network to the end is obtained Model can be carried out as shown in figure 5, text entities extract the stage on line with key words probabilities network model obtained above The extraction of keyword, as shown in Figure 6.

When into line drag training, two kinds of training data can be prepared, one kind is that each vertical service is led The rule template in domain, such as by taking music scenario as an example, rule template can be：I want to listen [song] of [singer], [song] is who sings, [album] which inner song, wherein [singer] indicates that singer, [songer] indicate song, [album] indicates album.These rule templates are known as clause rule template in the present embodiment.Wherein, for different vertical For service field, such as music, film, weather etc., there can be different clause rule templates, to for different vertical Service field trains corresponding different key words probabilities network model.

When collecting these clause rule templates, these clause rule templates can be that the user data collected marks out Language material, can also be some simple templates manually write by developer.Incipient stage in each vertical service field, this A little clause rule templates usually can be the rule template write by developer.

Another kind of training data can be the language material text in non-perpendicular field, usually can be some colloquial style language material numbers According to supplement some language materials that may do not have in above-mentioned clause rule template, to improve the key words probabilities network model of training Generalization ability, be referred to as the language material text in each field in the present embodiment.For example, " I wants to hear song ", due to " listening one Under " this saying do not occurred in clause rule masterplate, the generalized poor ability for the model that training obtains can be caused.If Occur " hearing " this word in text to be extracted, may not just can identify keyword or is identified as the keyword of mistake.From And it can be by the way that some colloquial style texts (the language material text in each field) be added, to improve the extensive energy for the model that training obtains Power.In order to not influence the extraction of the keyword in each vertical field, this part language material text can be selected from some non-perpendicular fields, I.e. these language material texts can be adapted for the training of the model in each vertical field.In a particular application, it can be climbed by reptile The mode taken obtains these language material texts, and the number of the language material text specifically crawled can be determined in conjunction with actual needs.

It is above-mentioned when training text obtaining, can be that actual needs is combined to be trained.As described above, based on waiting training The difference of the restriction of the number of clause rule template and the language material text in each field in text, specific training process can be Difference.

Fig. 7 shows the flow diagram of the generation key words probabilities network model in a specific example, this specifically shows It is to be illustrated for not treating the language material text that training text is clause rule template or each field and distinguishing in example.

As shown in fig. 7, obtain the language material text comprising clause rule template and each field first waits for training text, with sound For happy field, clause rule template can be I want to listen [song] of [singer], [song] be it is that who sings, [album] it is inner Which song etc., the language material text in each field can be include texts such as " I want to hear song ", wherein clause rule template It is the text for being only applicable to current area, the language material text in each field is to be applicable not only to current area, can be applicable to it The text in his field.

Then, as shown in fig. 7, specific training process can be：

From waiting for extracting a clause rule template or language material text in training text at random, i.e., that extracts at this time is current Wait for that training text may be clause rule template, it is also possible to language material text；

By the clause rule template of extraction or language material text input, currently network model to be trained is trained, and is instructed Network model to be trained after white silk；

Judge that waiting for whether the language material text of each clause rule template and each field in training text extracts finishes；

It finishes, that is, is waited in the clause rule template in training text or the language material text in each field at least if not extracting When one text does not extract also, then with the current network model to be trained of the network model modification to be trained after above-mentioned training, and return It returns from above-mentioned the step of waiting for extracting a clause rule template or language material text in training text at random, repeats the above process, Until the language material text of the above-mentioned each clause rule template waited in training text, each field is extracted and is finished；

If extraction finishes, that is, wait for having extracted in the language material text of the clause rule template and each field in training text Finish, then the network model to be trained after the training of acquisition is determined as above-mentioned key words probabilities network model, completes above-mentioned training Process.

By the clause rule template of extraction or language material text input currently when trained network model is trained, can To be that the clause rule template of extraction or language material text are inputted current network model to be trained to be trained as unit of word. It is trained by being inputted as unit of word, is obtained so as to avoid when being inputted as unit of word, in the case where language material is fewer The poor situation of the very sparse effect of the result arrived improves to obtain preferable generalization ability and is directed to shorter short text Extraction accuracy and accuracy.Wherein, for different service fields, different corresponding key words probabilities can be trained Network model.

Above-mentioned current network model to be trained can use possible training pattern in conjunction with actual needs, specific at one Using in example, can use LSTM (Long Short-Term Memory, long memory network in short-term) as wait for training pattern into Row training, LSTM can learn long-term Dependency Specification well as a kind of special convolutional neural networks, can be with using LSTM Approximate calculation goes out the probability of syntax establishment well.Due to many of LSTM networks unknown parameter, pass through above-mentioned training Process is estimated that the specific value of these parameters, then in specific keyword extraction to the key in text to be extracted Word extracts.In the training process, LSTM networks are based on, BPTT (Back Propagation Through may be used Time) algorithm is trained.

Fig. 8 shows the flow diagram of the generation key words probabilities network model in another specific example, this is specific It is to wait for that the number of the clause rule template in training text is identical as the number of language material text, is directed to clause rule mould in example Plate and the expectation text in each field illustrate for alternately training.

As shown in figure 8, specific training process can be：

A clause rule template is extracted from each clause rule template；

The current network model to be trained of clause rule template input of extraction is trained, waits instructing after being trained Practice network model；

With the current network model to be trained of the network model modification to be trained after above-mentioned training；

A language material text is extracted from the language material text in each field；

The above-mentioned updated current network model to be trained of the language material text input of extraction is trained, is trained Network model to be trained afterwards；

Judge whether the above-mentioned language material text for waiting for each clause rule template and each field in training text extracts to finish；

It is finished if not extracting, i.e., the above-mentioned language material text for waiting for each clause rule template or each field in training text is not Extraction finishes, then with after the training after trained network model modification currently wait train network model after, return from each clause advise A step of clause rule template is then extracted in template, until the above-mentioned each clause rule template waited in training text, each neck The language material text in domain, which extracts, to be finished；

If extraction finishes, the network model to be trained after the training of acquisition is determined as above-mentioned key words probabilities network mould Type.

It is to be trained first to extract clause rule template to extract the language material text in each field again in above-mentioned specific example For illustrate, can also be first to extract the language material text in each field to be trained and extract clause again in another example Rule template is trained.

It is to wait for each clause rule template of training text or each field in judgement in addition, in above-mentioned specific example Language material text does not extract when finishing, just again with the currently network model return to be trained of the network model modification to be trained after training It is illustrated for extraction clause rule template, can also be first with the network to be trained after training in particular technique application Model modification is currently after training network model, then treats the language material text of each clause rule template and each field in training text Whether this, which extracts to finish, is judged, as shown in figure 9, at this point, each clause rule template in waiting for training text and each field Language material text extract after, be that updated current network model to be trained is determined as above-mentioned key words probabilities network Model.

The other technical characteristics generated in key words probabilities network model in example shown in above-mentioned Fig. 8, Fig. 9, can be with It is identical as in example shown in Fig. 7.

After obtaining key words probabilities network model by training, you can applied, to the key in text to be extracted Word extracts.Can be that server will in the case where being to be trained to obtain key words probabilities network model by server The key words probabilities network model is sent to after terminal, and the extraction of text key word is carried out by terminal, can also be by servicing Device receive terminal transmission text to be extracted after, by server itself carry out text key word extraction.Be by terminal into In the case that row training obtains key words probabilities network model, terminal itself can be based on the key words probabilities network model and carry out The extraction of text key word can also be to be distributed to after the key words probabilities network model is sent to server, by server Other-end, server and each terminal can carry out the extraction of text key word based on the key words probabilities network model.

When specifically carrying out text key word extraction, text to be extracted is first obtained, which can be that terminal is used The text that family is inputted by user-interactive devices such as keyboard, touch screens, can also be by the voice to terminal user into The text that row identification obtains, can also be the text obtained by other means.

In the present embodiment, after obtaining text to be extracted, it can first determine its current affiliated field, then be directed to again Field corresponding keywords database and key words probabilities network model of the affiliated field in conjunction with belonging to this carry out text key word Extraction.When only needing to carry out text key word extraction to a field, such as intelligent sound box, then it can bind directly acquiescence Keywords database and key words probabilities network model carry out the extraction of text key word.It is crucial text may be carried out to multiple fields When word extracts, such as in server execution, then after can first determining affiliated field, in conjunction with the affiliated corresponding key in field Dictionary and key words probabilities network model carry out the extraction of text key word.It is affiliated to have determined that in following examples It is illustrated for field.

After obtaining text to be extracted, according to the text fields to be extracted, according to the associated key of its fields Dictionary scans in associated keywords database, matches the keyword in text to be extracted, to which exhaustion goes out text to be extracted Keyword in this.Then according to the keyword in text to be extracted, the text to be extracted matched, all texts are determined Clause and the combination of corresponding keyword, wherein any one the text clause determined and its combination of corresponding keyword, jointly Constitute above-mentioned text to be extracted.It is matching it will be understood by those skilled in the art that matching the keyword in text to be extracted Go out the word that all words with keywords database in text to be extracted match, and determines all text clause and corresponding pass Keyword combines, and is all possible clause for matching the text to be extracted and the keyword under the clause.Assuming that be extracted Text be " I wants to listen the QLX of ABC ", singer's entity be { " AB ", " ABC " }, song entity library be { " QLX " }, wherein A, B, C, Q, L, X indicate a specific word or character respectively.So, it is " I wants to listen the QLX of ABC " for text to be extracted, according to Singer's entity library { " AB ", " ABC " } and song entity library { " QLX " }, the text key word to be extracted matched are then：AB、 ABC, QLX, and then the possible text clause determined includes：I want to listen the QLX of ABC, I want to listen the QLX of [singer] C, I Want to listen [singer] QLX, I want to listen [song] of [singer] C, I want to listen [song] of [singer], can obtained from The text clause of energy and the combination of corresponding keyword are as shown in table 1 below.

Table 1

Possible combination	[singer]	[song]	Probability
				I wants to listen the QLX of ABC			0.001
I wants to listen the QLX of [singer] C	AB		0.002
				I wants to listen the QLX of [singer]	ABC		0.009
I wants to listen [song] of [singer] C	AB	QLX	0.011
				I wants to listen [song] of [singer]	ABC	QLX	0.051

In conjunction with upper table 1 as it can be seen that clause " I wants to listen the QLX of [singer] C " it is corresponding keyword combination “[singer]：AB " has collectively constituted original text to be extracted " I wants to listen the QLX of ABC ", and " I wants to listen [singer] clause Keyword combination " [singer] corresponding QLX "：ABC " has collectively constituted original text to be extracted, and " I wants to listen ABC's QLX ", the corresponding keyword combination " [singer] of clause " I wants to listen [song] of [singer] C "：AB；[song]： QLX " has collectively constituted original text to be extracted " I wants to listen the QLX of ABC ".

Then, each text clause is inputted into above-mentioned key words probabilities network model, each text clause and correspondence can be obtained The vertical probability of crucial phrase synthesis, as shown in upper table 1 last row.From table 1 it follows that probability value is maximum is 0.051, i.e., the value for the probability that " I wants to listen [QLX] of [ABC] " is set up is maximum, therefore the maximum probability 0.051 of selected value corresponds to Wen Weiben clause and the combination of corresponding keyword, finally the crucial phrase of determining extraction be combined into { [singer]：ABC； [song]：QLX}.

Based on thought same as mentioned above, the present embodiment also provides a kind of text key word extraction element, and Figure 10 shows The structural schematic diagram of the text key word extraction element in one embodiment is gone out.

As shown in Figure 10, the text key word extraction element of the embodiment includes：

Text acquisition module 101, for obtaining text to be extracted；

Keywords matching module 102 matches the text to be extracted for being scanned in associated keywords database In keyword；

Determining module 103 is combined, for according to the key in the text to be extracted, the text to be extracted matched Word determines all text clause and the combination of corresponding keyword, wherein any one text clause for determining and its right The keyword combination answered, has collectively constituted above-mentioned text to be extracted；

Probability analysis module 104, for according to key words probabilities network model, analysis determines each text clause and right The probability that the crucial phrase synthesis answered is stood；

Determining module 105 is extracted, is corresponded to for probability analysis module 104 to be analyzed the maximum probability of determining probability intermediate value The keyword combination keyword combination that is determined as extracting from the text to be extracted.

For being executed in terminal, above-mentioned text to be extracted can be the text of terminal user's input, such as terminal use The text that family is inputted by user-interactive devices such as keyboard, touch screens can also be by the voice progress to terminal user Identify that obtained text can also be in other embodiments the text obtained by other means.

Can receive above-mentioned text to be extracted from terminal, in acquisition, this waits carrying terminal for executing on the server After taking text, which is uploaded to server.The text to be extracted can be the text of terminal user's input, such as The text that terminal user is inputted by user-interactive devices such as keyboard, touch screens can also be by the language to terminal user The text that sound is identified can also be the text obtained by other means.

On the other hand, it is arranged when in terminal or server in the device, above-mentioned key words probabilities network model can be with It is to be generated in advance by terminal or server.Therefore, in a specific example, as shown in figure 11, text keyword extraction Device can also include：

Model generation module 106, for generating the key words probabilities network model.

In addition, the device be arranged when in terminal, can also be server generate key words probabilities network model after, Terminal obtains the key words probabilities network model from server.Therefore, as shown in figure 11, in another embodiment, the text Keyword extracting device can also include：

Model acquisition module 107, the key words probabilities network model for obtaining server generation.

Figure 12 shows the structural schematic diagram of the model generation module 106 in a specific example, as shown in figure 12, the mould Type generation module 106 includes：

Training text acquisition module 1061 waits for training text for obtaining, described to wait for that training text includes each clause rule The language material text in template and each field；

Training module 1062 obtains the key words probabilities network mould for waiting for that training text is trained according to Type.

As shown in figure 12, which can specifically include：Training text extraction unit 10621, training unit 10622, model determination unit 10623.

In a specific example, according to when training text is trained, it is clause that can not treat training text Rule template or the language material text in each field distinguish, and during training each time, randomly select once, this When：

Above-mentioned training text extraction unit 1061, for waiting for extracting one in training text at random currently waiting training from described Text, it is described currently to wait for that training text is clause rule template or language material text, and waited for after training unit is trained After training network model, the language material text for waiting for each clause rule template or each field in training text do not extract and finishes When, it waits for that extracting one in training text at random currently waits for training text from described again, is waited for described in each in training text Clause rule template, each field language material text extract and finish；

Above-mentioned training unit 10622 currently waits for that training text inputs for extract the training text extraction module Current network model to be trained is trained, and the network model to be trained after being trained is used in combination after the training and waits training The current network model to be trained of network model update；

Above-mentioned model determination unit 10623, in each clause rule template waited in training text and each field Language material text extract when finishing, the network model to be trained after training that the training unit obtains is determined as the pass Keyword Probabilistic Network Model.

In another specific example, it can be the number and language material text that will wait for the clause rule template in training text Number be set as identical, can be handed over for the expectation text in clause rule template and each field at this time when being trained For progress, at this time：

Above-mentioned training text extraction unit 10621, in each clause rule template waited in training text or The language material text in each field does not extract when finishing, and a clause rule template is alternately extracted from each clause rule template Or a language material text is extracted from each language material text；

Above-mentioned training unit 10622, clause rule template or language for extracting the training text extraction unit Currently network model to be trained is trained material text input, and the training is used in combination in the network model to be trained after being trained The current network model to be trained of network model modification to be trained afterwards；

Wherein, in above-mentioned two specific example, training unit 10622 is in the clause rule template or language material that will be extracted Text input currently when trained network model is trained, can be the clause rule template that will be extracted or language material text with Word is that the current network model to be trained of unit input is trained, to obtain preferable generalization ability.

One of ordinary skill in the art will appreciate that realizing all or part of flow in above-described embodiment method, being can be with Relevant hardware is instructed to complete by computer program, it is non-volatile computer-readable that the program can be stored in one It takes in storage medium, in the embodiment of the present invention, which can be stored in the storage medium of computer system, and by the calculating At least one of machine system processor executes, and includes the flow such as the embodiment of above-mentioned each method with realization.Wherein, described Storage medium can be magnetic disc, CD, read-only memory (Read-Only Memory, ROM) or random access memory (Random Access Memory, RAM) etc..

Each technical characteristic of embodiment described above can be combined arbitrarily, to keep description succinct, not to above-mentioned reality It applies all possible combination of each technical characteristic in example to be all described, as long as however, the combination of these technical characteristics is not deposited In contradiction, it is all considered to be the range of this specification record.

Several embodiments of the invention above described embodiment only expresses, the description thereof is more specific and detailed, but simultaneously It cannot therefore be construed as limiting the scope of the patent.It should be pointed out that coming for those of ordinary skill in the art It says, without departing from the inventive concept of the premise, various modifications and improvements can be made, these belong to the protection of the present invention Range.Therefore, the protection domain of patent of the present invention should be determined by the appended claims.

Claims

1. a kind of text key word extracting method, which is characterized in that including：

Obtain text to be extracted；

According to the keyword in the text to be extracted, the text to be extracted matched, all text clause are determined And corresponding keyword combination；

According to key words probabilities network model, it is general that analysis determines that each text clause and the synthesis of corresponding crucial phrase are stood Rate；

The corresponding keyword combination of the determining maximum probability of probability intermediate value of analysis is determined as carrying from the text to be extracted The keyword combination taken.

2. text key word extracting method according to claim 1, which is characterized in that before obtaining text to be extracted, It further include step：

Generate the key words probabilities network model.

3. text key word extracting method according to claim 2, which is characterized in that generate the key words probabilities network The mode of model includes：

It waits for that training text is trained to described, obtains the key words probabilities network model.

4. text key word extracting method according to claim 3, which is characterized in that wait for that training text is instructed to described Practice, obtains the key words probabilities network model, including：

Wait for that extracting one in training text at random currently waits for training text from described, it is described currently to wait for that training text is clause rule Template or language material text；

Current by extraction waits for that the current network model to be trained of training text input is trained, and training net is waited for after being trained Network model；

Described when the language material text of each clause rule template or each field in training text is not extracted and finished, with described Currently network model to be trained, return wait for extracting one in training text at random network model modification to be trained after training from described A current the step of waiting for training text, until the language material text of each clause rule template waited in training text, each field Extraction finishes；

Network model to be trained after the training of acquisition is determined as the key words probabilities network model.

5. text key word extracting method according to claim 3, which is characterized in that the clause waited in training text The number of rule template is identical as the number of language material text；

It waits for that training text is trained according to described, obtains the key words probabilities network model, including：

With after the training after trained network model modification currently wait train network model after, carried from each language material text A language material text is taken, and currently network model to be trained is trained by the language material text input of extraction, after being trained Network model to be trained；

Described when the language material text of each clause rule template or each field in training text is not extracted and finished, with the instruction After white silk after trained network model modification currently wait train network model after, return extract one from each clause rule template The step of a clause rule template, until described wait for that each clause rule template, the language material text in each field in training text are equal Extraction finishes；

6. text key word extracting method according to claim 4 or 5, which is characterized in that by the clause rule mould of extraction Plate or language material text are inputted current network model to be trained as unit of word and are trained.

7. text key word extracting method according to claim 1, which is characterized in that before obtaining text to be extracted, It further include step：

Obtain the key words probabilities network model that server generates.

8. a kind of text key word extraction element, which is characterized in that including：

Text acquisition module, for obtaining text to be extracted；

Keywords matching module matches the pass in the text to be extracted for being scanned in associated keywords database Keyword；

Determining module is combined, for according to the keyword in the text to be extracted, the text to be extracted matched, determining Go out all text clause and the combination of corresponding keyword；

Probability analysis module, for according to key words probabilities network model, analyzing and determining each text clause and corresponding pass The probability that keyword combination is set up；

Determining module is extracted, the corresponding key of the maximum probability of probability intermediate value for determining the probability analysis module analysis Word combination is determined as the keyword extracted from the text to be extracted combination.

9. text key word extraction element according to claim 8, which is characterized in that further include：

Model generation module, for generating the key words probabilities network model.

10. text key word extraction element according to claim 9, which is characterized in that the model generation module includes：

Training text acquisition module waits for training text for obtaining, it is described wait for training text include each clause rule template and The language material text in each field；

Training module obtains the key words probabilities network model for waiting for that training text is trained to described.

11. text key word extraction element according to claim 10, which is characterized in that the training module includes：

Training text extraction unit, it is described to work as waiting for that extracting one in training text at random currently waits for training text from described Before wait for that training text is clause rule template or language material text, and the network model to be trained after training unit is trained Afterwards, described when the language material text of each clause rule template or each field in training text is not extracted and finished, again from institute It states and waits for extracting one at random in training text and currently wait for training text, until each clause rule mould waited in training text Plate, each field language material text extract and finish；

Training unit currently waits for the current network to be trained of training text input for extract the training text extraction module Model is trained, the network model to be trained after being trained, and the network model modification to be trained after the training is used in combination to work as Before network model to be trained；

Model determination unit, for being carried in the language material text of each clause rule template waited in training text and each field When taking complete, the network model to be trained after training that the training unit obtains is determined as the key words probabilities network mould Type.

12. text key word extraction element according to claim 10, which is characterized in that the sentence waited in training text The number of formula rule template is identical as the number of language material text；

The training module includes：

Training text extraction unit, for the language material text in each clause rule template waited in training text or each field This is not extracted when finishing, and a clause rule template is alternately extracted from each clause rule template or from each institute's predicate Expect to extract a language material text in text；

Training unit, the clause rule template or language material text input for extracting the training text extraction unit are current Network model to be trained is trained, and the network to be trained after the training is used in combination in the network model to be trained after being trained The current network model to be trained of model modification；

13. text key word extraction element according to claim 11 or 12, which is characterized in that the training unit will carry The clause rule template or language material text taken is inputted current network model to be trained as unit of word and is trained.

14. text key word extraction element according to claim 8, which is characterized in that further include：

Model acquisition module, the key words probabilities network model for obtaining server generation.